Dive into Codex 1 —— Compact

May 23, 2026 18:46 | #nsfw #codex #sourcecode

快速了解一下 Codex 的上下文压缩机制

Compact 的 3 种策略

Pre-turn: 在下一轮 turn 开始前run_sampling_request 时，自动检查发现 hit limit
Mid-turn: 在 turn 执行进行时，run_sampling_request 执行后，自动检查发现 hit limit
Manual compact: 用户手动触发 /compact，aka Op::Compact

自动检查 limit hit

Pre-turn & Mid-turn触发逻辑在 run_turn 函数中：

// codex-rs/core/src/session/turn.rs
pub(crate) async fn run_turn(...) -> Option<String> {
  // Pre-turn
  if let Err(err) = run_pre_sampling_compact(...).await { ... }
  
  // ...
  
  loop {
    match run_sampling_request(...)
    .await
    {
      // Mid-turn
      if token_limit_reached && needs_follow_up { ... }
    }
  }
}

它们的区别是：

Mid-turn 是在一次 LLM Message 调用后，做的检查；此时整个 Agent loop 还没有完成，参考needs_follow_up，所以需要在 Compact 时，需要注入 Context
而 Pre-turn 可以理解成上一轮 Agent loop 已经完成，是在新的一轮 loop 开始前做的检查

ContextInjection

通常情况下，compact 不会注入上下文。但对于 mid-turn 的情况，需要注入：

run_auto_compact(
  InitialContextInjection::BeforeLastUserMessage,
  CompactionPhase::MidTurn,
)

pub(crate) enum InitialContextInjection {
    BeforeLastUserMessage,
    DoNotInject,
}

Turn context 里面有什么？

核心函数是：

// codex-rs/core/src/session/mod.rs
pub(crate) async fn build_initial_context(
    &self,
    turn_context: &TurnContext,
) -> Vec<ResponseItem> 

// codex-rs/core/src/session/turn_context.rs
pub struct TurnContext {
    pub(crate) sub_id: String,
    pub(crate) trace_id: Option<String>,
    pub(crate) realtime_active: bool,
    pub config: Arc<Config>,
    pub(crate) auth_manager: Option<Arc<AuthManager>>,
    pub(crate) model_info: ModelInfo,
    // ...
}

Compact 方法

Compact 分为 Local / Remote，我们就看 Local 的方法，核心函数在：

// codex-rs/core/src/session/turn.rs
async fn run_auto_compact(
    sess: &Arc<Session>,
    turn_context: &Arc<TurnContext>,
    client_session: &mut ModelClientSession,
    initial_context_injection: InitialContextInjection,
    reason: CompactionReason,
    phase: CompactionPhase,
) -> CodexResult<()> 

// codex-rs/core/src/compact.rs
pub(crate) async fn run_inline_auto_compact_task(
    sess: Arc<Session>,
    turn_context: Arc<TurnContext>,
    initial_context_injection: InitialContextInjection,
    reason: CompactionReason,
    phase: CompactionPhase,
) -> CodexResult<()> {

注入 Compact Prompt：

> codex-rs/core/templates/compact/prompt.md

You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task.

Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue

Be concise, structured, and focused on helping the next LLM seamlessly continue the work.

把过去的所有对话历史 clone 出来，然后起一个新的 client session，让 LLM 对上下文做压缩

// codex-rs/core/src/compact.rs
async fn run_compact_task_inner_impl() -> CodexResult<String> {
    let mut history = sess.clone_history().await;

    history.record_items(
        &[initial_input_for_turn.into()],
        turn_context.truncation_policy,
    );

    let mut client_session = sess.services.model_client.new_session();

     loop {
        let attempt_result = drain_to_completed(
            &sess,
            turn_context.as_ref(),
            &mut client_session,
            turn_metadata_header.as_deref(),
            &prompt,
        )
        .await;
     }
}

// 发起一个新的 session，把压缩结果塞回原来的 session 的末尾
async fn drain_to_completed(...) -> CodexResult<()> {
    let mut stream = client_session
        .stream(...)
        .await?;
    loop {
        match event {
            Ok(ResponseEvent::OutputItemDone(item)) => {
                sess.record_into_history(std::slice::from_ref(&item), turn_context)
                    .await;
            }
        }
    }
}

对原有 Session 做替换 & 组装。将刚才 compact 完的上下文信息，以及将用户最近的对话、以及根据情况决定是否插入 Turn context 做组装，最后再对原始 Session 做 replace history

// codex-rs/core/src/compact.rs
async fn run_compact_task_inner_impl(...) -> CodexResult<String> {
    // ...
    let history_snapshot = sess.clone_history().await;
    let history_items = history_snapshot.raw_items();
    let summary_suffix = get_last_assistant_message_from_turn(history_items).unwrap_or_default();
    let summary_text = format!("{SUMMARY_PREFIX}\n{summary_suffix}");
    let user_messages = collect_user_messages(history_items);

    let mut new_history = build_compacted_history(Vec::new(), &user_messages, &summary_text);

    // mid-turn
    if matches!(
        initial_context_injection,
        InitialContextInjection::BeforeLastUserMessage
    ) {
        let initial_context = sess.build_initial_context(turn_context.as_ref()).await;
        new_history =
            insert_initial_context_before_last_real_user_or_summary(new_history, initial_context);
    }
    let reference_context_item = match initial_context_injection {
        InitialContextInjection::DoNotInject => None,
        InitialContextInjection::BeforeLastUserMessage => Some(turn_context.to_turn_context_item()),
    };
    let compacted_item = CompactedItem {
        message: summary_text.clone(),
        replacement_history: Some(new_history.clone()),
    };
    sess.replace_compacted_history(new_history, reference_context_item, compacted_item)
        .await;
    // ..
    Ok(summary_suffix)
}

对原有的用户信息，会根据限制做保留：

// codex-rs/core/src/compact.rs
fn build_compacted_history_with_limit(...) -> Vec<ResponseItem> {
    if max_tokens > 0 {
        let mut remaining = max_tokens;
        // 保留部分最后输入的用户信息，
        for message in user_messages.iter().rev() {
            if remaining == 0 {
                break;
            }
            let tokens = approx_token_count(message);
            if tokens <= remaining {
                selected_messages.push(message.clone());
                remaining = remaining.saturating_sub(tokens);
            } else {
                let truncated = truncate_text(message, TruncationPolicy::Tokens(remaining));
                selected_messages.push(truncated);
                break;
            }
        }
        selected_messages.reverse();
    }
    // ...
    history.push(ResponseItem::Message {
        id: None,
        role: "user".to_string(),
        content: vec![ContentItem::InputText { text: summary_text }],
        phase: None,
    });

}

默认的最大用户信息 size

最后压缩后的上下文大概长这样：

# Role: User
用户过往输入 1

# Role: User
用户过往输入 2

# Turn Context 
如果需要，会把 Turn Context 插入到最后一个过往 user msg 的前面

# Role: User
用户过往输入 3

# Role: User
上下文总结

真实例子：

用户信息 | Turn Context | Compacted Context

Update at 05/24 17:19

0 / 500