Hacker News — AI on Front Page · · 5 min read

Codex logging bug may write TBs to local SSDs

Mirrored from Hacker News — AI on Front Page for archival readability. Support the source by reading on the original site.

240 pts · 133 comments on Hacker News

openai / codex Public

Codex SQLite feedback logs can write ~640 TB/year and rapidly consume SSD endurance #28224

Copy link
Copy link
Open
Labels
CLIIssues related to the Codex CLIIssues related to the Codex CLIbugSomething isn't workingSomething isn't workingperformance
@1996fanrui

Description

@1996fanrui
Issue body actions

Codex SQLite feedback logs can write ~640 TB/year and rapidly consume SSD endurance

Issue

Codex is continuously writing a large amount of data to the local SQLite feedback log database:

  • ~/.codex/logs_2.sqlite
  • ~/.codex/logs_2.sqlite-wal
  • ~/.codex/logs_2.sqlite-shm

On my machine, after about 21 days of uptime, the main SSD has written about 37 TB. Process/file-level checks show Codex SQLite logs are the main continuous writer.

That extrapolates to roughly 640 TB/year. On a 1 TB SSD, that is about 640 full-drive writes per year. Some consumer SSDs are rated around 600 TBW, so this could consume roughly a full drive's warranted write endurance in less than a year.

Evidence

Current retained rows in logs_2.sqlite:

metric value
retained rows 681,774
estimated retained log content 1,035.6 MiB

Level distribution:

level estimated MiB byte %
TRACE 732.5 70.7%
INFO 266.5 25.7%
DEBUG 30.6 3.0%
WARN 5.9 0.6%

Largest target+level pairs:

target level estimated MiB
codex_api::endpoint::responses_websocket TRACE 527.4
codex_otel.log_only INFO 141.2
codex_otel.trace_safe INFO 121.2
log TRACE 97.4
codex_client::transport TRACE 60.1
codex_core::stream_events_utils DEBUG 27.5
codex_api::sse::responses TRACE 19.1

The top sources are mostly global TRACE logs, mirrored telemetry logs, and raw websocket/SSE payload logging. TRACE alone is about 70.7% of retained bytes. codex_otel.log_only + codex_otel.trace_safe add another 25.3%. Filtering these categories should remove roughly 96% of retained log bytes in this sample without fully disabling feedback logs.

Sanitized examples from the most frequent TRACE source: target=log

These are high-frequency retained samples. Raw websocket/SSE payload bodies are intentionally not included because they may contain private conversation content.

128,764x TRACE log: inotify event: ... mask: OPEN, name: Some("ld.so.cache")
 37,982x TRACE log: inotify event: ... mask: OPEN, name: Some("locale.alias")
 23,843x TRACE log: inotify event: ... mask: OPEN, name: Some("passwd")
  3,639x TRACE log: <tokio-tungstenite checkout>/src/compat.rs:131 AllowStd.with_context
  3,505x TRACE log: <tokio-tungstenite checkout>/src/lib.rs:245 WebSocketStream.with_context
  3,362x TRACE log: <tokio-tungstenite checkout>/src/compat.rs:154 Read.read
  3,356x TRACE log: <tokio-tungstenite checkout>/src/compat.rs:157 Read.with_context read -> poll_read
  3,230x TRACE log: <tokio-tungstenite checkout>/src/lib.rs:294 Stream.poll_next
  3,227x TRACE log: <tokio-tungstenite checkout>/src/lib.rs:304 Stream.with_context poll_next -> read()
  3,213x TRACE log: inotify event: ... mask: OPEN, name: Some("nsswitch.conf")
  2,001x TRACE log: WouldBlock
  1,217x TRACE log: Masked: false
  1,169x TRACE log: Opcode: Data(Text)
  1,169x TRACE log: First: 11000001
Sanitized examples from frequent INFO sources

The dominant INFO sources are mostly repeated OpenTelemetry mirror events. IDs are redacted.

843x INFO codex_client::custom_ca:
  using system root certificates because no CA override environment variable was selected ...

334x INFO codex_otel.trace_safe:
  session_loop{thread_id=<redacted>}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id=<redacted> codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=<redacted> ...}

333x INFO codex_otel.log_only:
  session_loop{thread_id=<redacted>}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id=<redacted> codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id=<redacted> ...}

332x INFO codex_otel.log_only:
  session_loop{thread_id=<redacted>}:submission_dispatch{otel.name="op.dispatch.user_input_with_turn_context" submission.id=<redacted> codex.op="user_input_with_turn_context"}:turn{otel.name="session_task.turn" thread.id=<redacted> ...}

332x INFO codex_otel.trace_safe:
  session_loop{thread_id=<redacted>}:submission_dispatch{otel.name="op.dispatch.user_input_with_turn_context" submission.id=<redacted> codex.op="user_input_with_turn_context"}:turn{otel.name="session_task.turn" thread.id=<redacted> ...}

Write amplification

The retained DB size hides the real write volume. In a 15-second sample:

metric before after
retained rows 681,774 681,774
max row id 5,003,347,015 5,003,383,226

About 36,211 rows were inserted in 15 seconds, while retained row count stayed flat. This suggests continuous insert-and-prune write amplification: rows are inserted, indexed, written to WAL, then pruned.

Likely cause

The SQLite feedback log sink is installed with a global TRACE default:

Targets::new().with_default(Level::TRACE)

This persists all targets at TRACE level by default, including dependency/internal logs and large raw protocol payloads.

Proposed fix

Keep feedback logs enabled, but narrow what is persisted by default:

  1. Do not use global TRACE for the SQLite feedback log sink.
  2. Drop or raise thresholds for low-value dependency noise, especially target=log, hyper_util, tokio-tungstenite internals, inotify spam, and low-level OpenTelemetry SDK logs.
  3. Avoid persisting full raw websocket/SSE payloads by default. Store summaries instead: event kind, duration, success/error, token usage, and payload byte length.
  4. Avoid persisting mirrored codex_otel.log_only / codex_otel.trace_safe events unless they are explicitly useful for feedback debugging.
  5. Add a global logs DB size/write cap. Per-thread caps are not enough when many threads/processes exist.

An optional escape hatch such as sqlite_logs_enabled = false would still be useful, but the main fix should be better default filtering.

Related issues and discussions

Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    CLIIssues related to the Codex CLIIssues related to the Codex CLIbugSomething isn't workingSomething isn't workingperformance

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Discussion (0)

      Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

      Sign in →

      No comments yet. Sign in and be the first to say something.

      More from Hacker News — AI on Front Page