
快速了解一下 Codex 的上下文压缩机制
turn 开始前run_sampling_request 时,自动检查发现 hit limitturn 执行进行时,run_sampling_request 执行后,自动检查发现 hit limit/compact,aka Op::CompactPre-turn & Mid-turn触发逻辑在 run_turn 函数中:
// codex-rs/core/src/session/turn.rs
pub(crate) async fn run_turn(...) -> Option<String> {
// Pre-turn
if let Err(err) = run_pre_sampling_compact(...).await { ... }
// ...
loop {
match run_sampling_request(...)
.await
{
// Mid-turn
if token_limit_reached && needs_follow_up { ... }
}
}
}
它们的区别是:
needs_follow_up,所以需要在 Compact 时,需要注入 Context通常情况下,compact 不会注入上下文。但对于 mid-turn 的情况,需要注入:
run_auto_compact(
InitialContextInjection::BeforeLastUserMessage,
CompactionPhase::MidTurn,
)
pub(crate) enum InitialContextInjection {
BeforeLastUserMessage,
DoNotInject,
}
核心函数是:
// codex-rs/core/src/session/mod.rs
pub(crate) async fn build_initial_context(
&self,
turn_context: &TurnContext,
) -> Vec<ResponseItem>
// codex-rs/core/src/session/turn_context.rs
pub struct TurnContext {
pub(crate) sub_id: String,
pub(crate) trace_id: Option<String>,
pub(crate) realtime_active: bool,
pub config: Arc<Config>,
pub(crate) auth_manager: Option<Arc<AuthManager>>,
pub(crate) model_info: ModelInfo,
// ...
}
Compact 分为 Local / Remote,我们就看 Local 的方法,核心函数在:
// codex-rs/core/src/session/turn.rs
async fn run_auto_compact(
sess: &Arc<Session>,
turn_context: &Arc<TurnContext>,
client_session: &mut ModelClientSession,
initial_context_injection: InitialContextInjection,
reason: CompactionReason,
phase: CompactionPhase,
) -> CodexResult<()>
// codex-rs/core/src/compact.rs
pub(crate) async fn run_inline_auto_compact_task(
sess: Arc<Session>,
turn_context: Arc<TurnContext>,
initial_context_injection: InitialContextInjection,
reason: CompactionReason,
phase: CompactionPhase,
) -> CodexResult<()> {
> codex-rs/core/templates/compact/prompt.md
You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task.
Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue
Be concise, structured, and focused on helping the next LLM seamlessly continue the work.
// codex-rs/core/src/compact.rs
async fn run_compact_task_inner_impl() -> CodexResult<String> {
let mut history = sess.clone_history().await;
history.record_items(
&[initial_input_for_turn.into()],
turn_context.truncation_policy,
);
let mut client_session = sess.services.model_client.new_session();
loop {
let attempt_result = drain_to_completed(
&sess,
turn_context.as_ref(),
&mut client_session,
turn_metadata_header.as_deref(),
&prompt,
)
.await;
}
}
// 发起一个新的 session,把压缩结果塞回原来的 session 的末尾
async fn drain_to_completed(...) -> CodexResult<()> {
let mut stream = client_session
.stream(...)
.await?;
loop {
match event {
Ok(ResponseEvent::OutputItemDone(item)) => {
sess.record_into_history(std::slice::from_ref(&item), turn_context)
.await;
}
}
}
}
// codex-rs/core/src/compact.rs
async fn run_compact_task_inner_impl(...) -> CodexResult<String> {
// ...
let history_snapshot = sess.clone_history().await;
let history_items = history_snapshot.raw_items();
let summary_suffix = get_last_assistant_message_from_turn(history_items).unwrap_or_default();
let summary_text = format!("{SUMMARY_PREFIX}\n{summary_suffix}");
let user_messages = collect_user_messages(history_items);
let mut new_history = build_compacted_history(Vec::new(), &user_messages, &summary_text);
// mid-turn
if matches!(
initial_context_injection,
InitialContextInjection::BeforeLastUserMessage
) {
let initial_context = sess.build_initial_context(turn_context.as_ref()).await;
new_history =
insert_initial_context_before_last_real_user_or_summary(new_history, initial_context);
}
let reference_context_item = match initial_context_injection {
InitialContextInjection::DoNotInject => None,
InitialContextInjection::BeforeLastUserMessage => Some(turn_context.to_turn_context_item()),
};
let compacted_item = CompactedItem {
message: summary_text.clone(),
replacement_history: Some(new_history.clone()),
};
sess.replace_compacted_history(new_history, reference_context_item, compacted_item)
.await;
// ..
Ok(summary_suffix)
}
对原有的用户信息,会根据限制做保留:
// codex-rs/core/src/compact.rs
fn build_compacted_history_with_limit(...) -> Vec<ResponseItem> {
if max_tokens > 0 {
let mut remaining = max_tokens;
// 保留部分最后输入的用户信息,
for message in user_messages.iter().rev() {
if remaining == 0 {
break;
}
let tokens = approx_token_count(message);
if tokens <= remaining {
selected_messages.push(message.clone());
remaining = remaining.saturating_sub(tokens);
} else {
let truncated = truncate_text(message, TruncationPolicy::Tokens(remaining));
selected_messages.push(truncated);
break;
}
}
selected_messages.reverse();
}
// ...
history.push(ResponseItem::Message {
id: None,
role: "user".to_string(),
content: vec![ContentItem::InputText { text: summary_text }],
phase: None,
});
}
最后压缩后的上下文大概长这样:
# Role: User
用户过往输入 1
# Role: User
用户过往输入 2
# Turn Context
如果需要,会把 Turn Context 插入到最后一个过往 user msg 的前面
# Role: User
用户过往输入 3
# Role: User
上下文总结
真实例子: