Dive into Codex 1 —— Compact

May 23, 2026 18:46 | #nsfw #codex #sourcecode

快速了解一下 Codex 的上下文压缩机制

Compact 的 3 种策略

  • Pre-turn: 在下一轮 turn 开始前run_sampling_request 时,自动检查发现 hit limit
  • Mid-turn: 在 turn 执行进行时,run_sampling_request 执行后,自动检查发现 hit limit
  • Manual compact: 用户手动触发 /compact,aka Op::Compact

自动检查 limit hit

Pre-turn & Mid-turn触发逻辑在 run_turn 函数中:

// codex-rs/core/src/session/turn.rs
pub(crate) async fn run_turn(...) -> Option<String> {
  // Pre-turn
  if let Err(err) = run_pre_sampling_compact(...).await { ... }
  
  // ...
  
  loop {
    match run_sampling_request(...)
    .await
    {
      // Mid-turn
      if token_limit_reached && needs_follow_up { ... }
    }
  }
}

它们的区别是:

  • Mid-turn 是在一次 LLM Message 调用后,做的检查;此时整个 Agent loop 还没有完成,参考needs_follow_up,所以需要在 Compact 时,需要注入 Context
  • 而 Pre-turn 可以理解成上一轮 Agent loop 已经完成,是在新的一轮 loop 开始前做的检查

ContextInjection

通常情况下,compact 不会注入上下文。但对于 mid-turn 的情况,需要注入:

run_auto_compact(
  InitialContextInjection::BeforeLastUserMessage,
  CompactionPhase::MidTurn,
)
pub(crate) enum InitialContextInjection {
    BeforeLastUserMessage,
    DoNotInject,
}

Turn context 里面有什么?

核心函数是:

// codex-rs/core/src/session/mod.rs
pub(crate) async fn build_initial_context(
    &self,
    turn_context: &TurnContext,
) -> Vec<ResponseItem> 

// codex-rs/core/src/session/turn_context.rs
pub struct TurnContext {
    pub(crate) sub_id: String,
    pub(crate) trace_id: Option<String>,
    pub(crate) realtime_active: bool,
    pub config: Arc<Config>,
    pub(crate) auth_manager: Option<Arc<AuthManager>>,
    pub(crate) model_info: ModelInfo,
    // ...
}

Compact 方法

Compact 分为 Local / Remote,我们就看 Local 的方法,核心函数在

// codex-rs/core/src/session/turn.rs
async fn run_auto_compact(
    sess: &Arc<Session>,
    turn_context: &Arc<TurnContext>,
    client_session: &mut ModelClientSession,
    initial_context_injection: InitialContextInjection,
    reason: CompactionReason,
    phase: CompactionPhase,
) -> CodexResult<()> 

// codex-rs/core/src/compact.rs
pub(crate) async fn run_inline_auto_compact_task(
    sess: Arc<Session>,
    turn_context: Arc<TurnContext>,
    initial_context_injection: InitialContextInjection,
    reason: CompactionReason,
    phase: CompactionPhase,
) -> CodexResult<()> {
  1. 注入 Compact Prompt:
> codex-rs/core/templates/compact/prompt.md

You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task.

Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue

Be concise, structured, and focused on helping the next LLM seamlessly continue the work.
  1. 把过去的所有对话历史 clone 出来,然后起一个新的 client session,让 LLM 对上下文做压缩
// codex-rs/core/src/compact.rs
async fn run_compact_task_inner_impl() -> CodexResult<String> {
    let mut history = sess.clone_history().await;

    history.record_items(
        &[initial_input_for_turn.into()],
        turn_context.truncation_policy,
    );

    let mut client_session = sess.services.model_client.new_session();

     loop {
        let attempt_result = drain_to_completed(
            &sess,
            turn_context.as_ref(),
            &mut client_session,
            turn_metadata_header.as_deref(),
            &prompt,
        )
        .await;
     }
}

// 发起一个新的 session,把压缩结果塞回原来的 session 的末尾
async fn drain_to_completed(...) -> CodexResult<()> {
    let mut stream = client_session
        .stream(...)
        .await?;
    loop {
        match event {
            Ok(ResponseEvent::OutputItemDone(item)) => {
                sess.record_into_history(std::slice::from_ref(&item), turn_context)
                    .await;
            }
        }
    }
}
  1. 对原有 Session 做替换 & 组装。将刚才 compact 完的上下文信息,以及将用户最近的对话、以及根据情况决定是否插入 Turn context 做组装,最后再对原始 Session 做 replace history
// codex-rs/core/src/compact.rs
async fn run_compact_task_inner_impl(...) -> CodexResult<String> {
    // ...
    let history_snapshot = sess.clone_history().await;
    let history_items = history_snapshot.raw_items();
    let summary_suffix = get_last_assistant_message_from_turn(history_items).unwrap_or_default();
    let summary_text = format!("{SUMMARY_PREFIX}\n{summary_suffix}");
    let user_messages = collect_user_messages(history_items);

    let mut new_history = build_compacted_history(Vec::new(), &user_messages, &summary_text);

    // mid-turn
    if matches!(
        initial_context_injection,
        InitialContextInjection::BeforeLastUserMessage
    ) {
        let initial_context = sess.build_initial_context(turn_context.as_ref()).await;
        new_history =
            insert_initial_context_before_last_real_user_or_summary(new_history, initial_context);
    }
    let reference_context_item = match initial_context_injection {
        InitialContextInjection::DoNotInject => None,
        InitialContextInjection::BeforeLastUserMessage => Some(turn_context.to_turn_context_item()),
    };
    let compacted_item = CompactedItem {
        message: summary_text.clone(),
        replacement_history: Some(new_history.clone()),
    };
    sess.replace_compacted_history(new_history, reference_context_item, compacted_item)
        .await;
    // ..
    Ok(summary_suffix)
}

对原有的用户信息,会根据限制做保留:

// codex-rs/core/src/compact.rs
fn build_compacted_history_with_limit(...) -> Vec<ResponseItem> {
    if max_tokens > 0 {
        let mut remaining = max_tokens;
        // 保留部分最后输入的用户信息,
        for message in user_messages.iter().rev() {
            if remaining == 0 {
                break;
            }
            let tokens = approx_token_count(message);
            if tokens <= remaining {
                selected_messages.push(message.clone());
                remaining = remaining.saturating_sub(tokens);
            } else {
                let truncated = truncate_text(message, TruncationPolicy::Tokens(remaining));
                selected_messages.push(truncated);
                break;
            }
        }
        selected_messages.reverse();
    }
    // ...
    history.push(ResponseItem::Message {
        id: None,
        role: "user".to_string(),
        content: vec![ContentItem::InputText { text: summary_text }],
        phase: None,
    });

}

最后压缩后的上下文大概长这样:

# Role: User
用户过往输入 1

# Role: User
用户过往输入 2

# Turn Context 
如果需要,会把 Turn Context 插入到最后一个过往 user msg 的前面

# Role: User
用户过往输入 3

# Role: User
上下文总结

真实例子:


Update at 05/24 17:19
0
79
0 / 500

Copyright © 2026 hong97.ltd.