~/bin/vault is a thin SSH shim into curious@sbl0:~/.vault/vault.sh. Plaintext never lives on sbl1; it stays in age-encrypted .env.age files on sbl0.
Audit
Fleet hygiene · 2026-05-16 · 94 items across 14 categories
The DNA of how we work,
tracked in one place.
Plaintext credentials on disk, conformance to the sbl0-only vault rule, and exfiltration risk from local logs and backups.
Vault is healthy
Architecture preserved as you remediate the rest. 5 systems verified.kimi, kimi-cli, krishna-ask, openclaw-discord-gateway, and every user systemd unit call sbl-secret-env <ENV>=sbl0-<name>. No inline tokens in any service unit or wrapper.
New secrets are encrypted to age1y9qu5gsr8feuraym2682hr3vgqm59f85fslzlr3wppdtjr348eyq5v7nxy before they touch disk on sbl0. The plaintext only exists in RAM on sbl1 and inside the SSH tunnel.
The 'sbl1 is the commit tier' rule is observed at the vault. All live entries are sbl0-*, sbl3-*, sbl4-*, or sbl10-*. Five legacy unprefixed entries remain.
wrangler.jsonc files annotate CF_API_TOKEN as 'set via wrangler secret put' — no inline keys. Repo .gitignore excludes .env, .env.*, .dev.vars, .wrangler/.
Findings
78 matching · 94 total across 14 categories.Shell history leak
1 item~/.bash_history (lines 157-1541)15+ commands echoed literal ghp_* and github_pat_* GitHub PATs (export GITHUB_TOKEN=…, gh auth login --with-token, scp …bashrc, etc.).
Fix steps
- Revoke each leaked PAT at github.com/settings/tokens (start with the most recent two).
- gh auth refresh -h github.com (writes a new oauth_token).
- history -c && shred -u ~/.bash_history && touch ~/.bash_history.
Plaintext credential file
11 items~/.config/gemini/api_key.txtPlaintext AIza* Gemini API key (40 bytes).
Fix steps
- Revoke key at Google AI Studio.
- Create new key.
- sbl-secret put sbl0-google-gemini-api-key <new>.
- rm ~/.config/gemini/api_key.txt; have callers fetch via sbl-secret-env.
~/.config/gh/hosts.ymlGitHub oauth_token in plaintext (standard gh CLI location).
Fix steps
- gh auth refresh -h github.com -s repo,workflow (issues a new token, revokes old).
- Verify: gh auth status.
- Optionally: sbl-secret put sbl0-github-pat <token from gh auth token>.
~/.config/rclone/rclone.confgdrive OAuth { access_token, refresh_token, expiry } in plaintext.
Fix steps
- rclone config reconnect gdrive: (re-runs OAuth, writes new tokens).
- Old refresh_token can be revoked at myaccount.google.com/permissions.
~/.config/gcloud/application_default_credentials.json + legacy_credentials/*/adc.jsongcloud refresh_token + client_secret in plaintext for two identities.
Fix steps
- gcloud auth revoke --all.
- gcloud auth login (writes new credentials.db / ADC).
- gcloud auth application-default login if needed.
~/.kaggle/access_tokenKGAT_* Kaggle API key in plaintext (38 bytes).
Fix steps
- kaggle.com → Account → Expire API Token, then Create New Token.
- Download replaces the file; alternatively store via sbl-secret put sbl0-kaggle-api-key.
~/.cloudflared/5774c5a1-3631-495b-afca-1daa84563fe7.jsonCloudflare tunnel TunnelSecret + AccountTag in plaintext.
Fix steps
- cloudflared tunnel rotate <tunnel-name> — zero-downtime rotation.
- Daemon picks up new <uuid>.json automatically.
~/.cloudflared/cert.pemArgo Tunnel token (-----BEGIN ARGO TUNNEL TOKEN-----).
Fix steps
- Only needed if you manage tunnels regularly; can be regenerated with cloudflared tunnel login.
~/.claude/.credentials.jsonAnthropic OAuth access token (sk-ant-oat01-*) in plaintext.
Fix steps
- Do this LAST — risks interrupting in-flight remediation.
- claude /logout then claude /login.
- The old access_token is short-lived; the refresh path is re-established.
~/.codex/auth.jsonChatGPT OAuth id_token (JWT) + tokens object in plaintext.
Fix steps
- codex logout && codex login.
~/.openclaw/auth-profiles.json (+ agents/krishna/agent/auth-profiles.json + backups/pre-update-*/)Anthropic OAuth token (sk-ant-oat01-*) in plaintext for openclaw.
Fix steps
- systemctl --user stop openclaw-gateway krishna-proxy.
- openclaw auth login (writes new auth-profiles.json).
- systemctl --user start krishna-proxy openclaw-gateway.
- Verify: krishna-ask 'hello' replies normally.
~/.antigravity-server/.15487b30…tokenNumeric session token in plaintext.
Fix steps
- Re-bootstrap antigravity to issue a new token.
SSH key hygiene
3 items~/.ssh/id_ed25519Unencrypted private key (no passphrase). Same key authenticates to sbl0, sbl1, sbl2, sbl3, sbl4, vast-2x5090, and vast.
Fix steps
- Generate ~/.ssh/id_ed25519.new with a passphrase.
- ssh-copy-id -i id_ed25519.new.pub to every fleet host while old key still works.
- Test new key end-to-end against each host before swapping.
- Atomic swap; ssh-add for gcr-ssh-agent.
- Remove old pubkey from each host (sed -i on authorized_keys).
- sbl-secret put sbl0-curious-ssh-ed25519-private < id_ed25519; shred old key.
Reverse order = locks you out of sbl0 = loses vault access.
~/.ollama/id_ed25519Unencrypted private key for ollama identity.
Fix steps
- If unused: shred and remove.
- If used: regenerate with passphrase.
~/.local/dev-tls/key.pemUnencrypted OpenSSH-format private key (dev TLS).
Fix steps
- If mkcert-managed: re-issue.
- If hand-rolled: regenerate or remove if unused.
Active agent session logs
2 items~/.claude/history.jsonl + projects/-home-curious/*.jsonlMultiple Gemini AIza* keys and Anthropic sk-ant-oat01-* tokens appear verbatim inside Claude session transcripts (the agent streamed credential files into its context).
Fix steps
- Only after the underlying keys are rotated and revoked.
- Per-token sed scrub (use the helper script scrub-history).
- Or, rotate the relevant Claude project IDs and discard the older jsonl files.
~/.codex/sessions/2026/*/*/*.jsonlOlder Codex session rollouts (Feb-Apr 2026) contain hf_*, gsk_*, xai-*, sk-ant-*, AIza* fragments.
Fix steps
- Older rollouts are no longer needed for replay — safe to delete after rotation.
- rm -rf ~/.codex/sessions/2026/0[2-4] (after confirming nothing depends on them).
Stale credential dump on disk
2 items~/workspace/antimony-labs-org-nondashboard-20260423-185551/backups/sbl2-migration/sbl2-backup-wipe-prep-20260404/82 GB full disk dump of sbl2. Contains an unencrypted id_ed25519, .bashrc exporting ANTHROPIC_API_KEY=sk-ant-api03-…, an older .claude/.credentials.json, Codex shell snapshots, 20+ openclaw agent auth-profiles.json files, and Downloads/Chase exports with hf_* tokens.
Fix steps
- Pure dead data — safe to delete.
- Move to a quarantine dir first if you want to inspect: mv …/sbl2-backup-wipe-prep-20260404 /tmp/quarantine/.
- If you must keep, re-archive as a single age-encrypted blob on sbl0.
…/sbl2-migration/sbl2-backup-20260418-133603/ (70 MB) + …-133643/ (116 MB)Sibling sbl2 backups; each contains .bash_history + .bashrc with sk-ant-api03-* tokens.
Fix steps
- Same as above — safe to delete.
Vault hygiene
3 itemssbl0 vault: github-token, discord-bot-token, cloudflare-global-api-key, contabo-deploy-ssh-key, ssh-ed25519-privateFive unprefixed entries violate the sbl0-* convention. Wrapper scripts already use `sbl0-…|legacy` fallback chains, so these are alive on purpose.
Fix steps
- Migrate each: sbl-secret get <legacy> | sbl-secret put sbl0-<scope>-<purpose>.
- Edit wrappers to drop the |legacy fallback once consumers verified.
- sbl-secret delete <legacy>.
~/workspace/code/sbl/registry/secrets.tsv (5 rows with name starting `sbl1-`)Names 5 sbl1-* secrets (krishna groq/xai, openclaw discord/gateway/ollama). None of these exist in the live vault, but the file documents the forbidden prefix.
Fix steps
- Delete the 5 sbl1-* rows from secrets.tsv.
~/bin/openclaw-discord-statusReferences sbl1-openclaw-discord-token + sbl1-openclaw-gateway-token. Both keys are absent from vault, so the script fails at runtime.
Fix steps
- Rewrite to use the sbl0-…|fallback pattern used by openclaw-discord-gateway, or delete.
Trust boundary
1 item~/.ssh/config (vast-2x5090, vast)The same SSH key that authenticates to sbl0 (vault host) is also used to log into rented vast.ai root-on-public-IP instances.
Fix steps
- Use a separate key (vast_id_ed25519) when renting vast.ai compute.
- Configure ssh-config with IdentityFile per Host.
Dashboard drift
1 item~/workspace/code/sbl/src/data/platform.ts (line 491)platform.ts asserts 'local OpenClaw/Kaggle/GitHub token files removed'. Audit shows ~/.kaggle/access_token, ~/.config/gh/hosts.yml, and ~/.openclaw/auth-profiles.json still exist.
Fix steps
- After the underlying files are removed or migrated, update the claim or replace it with a live-derived check.
Workspace layout
1 item~/workspace/runs~/workspace/runs is empty (no experiment outputs being captured).
Per CLAUDE.md the runs/ dir is for experiment outputs (safe to delete). Currently empty — either no experiments are producing outputs, or outputs are landing elsewhere (likely under code repos or ~/Downloads).
Fix steps
- When training/inference jobs run, point output dir to ~/workspace/runs/<run-id>/.
- Add to wrapper scripts that currently dump outputs into code dirs.
Oversized code repo
7 items~/workspace/code/too.foo~/workspace/code/too.foo is 18 GB (excluding node_modules).
Per CLAUDE.md convention, code repos must never contain >100 MB of data. 18 GB suggests image-heavy content (Precision Shorts hero+scene gen?) is committed in-repo instead of living under ~/workspace/data/ and being referenced via DATA_ROOT.
Fix steps
- du -shx ~/workspace/code/too.foo/* | sort -h | tail -10 to find the big subtree.
- Most likely culprits: apps/content/assets/, public/images/, generated/.
- Move generated images/datasets to ~/workspace/data/too-foo-content/.
- Add path to .gitignore; reference via env DATA_ROOT=$HOME/workspace/data.
- If already in git history: BFG or git-filter-repo to purge from history.
~/workspace/code/vendor~/workspace/code/vendor is 5.1 GB.
Almost certainly bundled third-party source, model files, or large binaries that belong in ~/workspace/data/ as immutable inputs (so they can be shared across repos rather than duplicated per checkout).
Fix steps
- Identify what's inside (likely git-submodule'd deps or vendored libs).
- Move large immutable assets to ~/workspace/data/vendor-<name>/.
- Keep only thin manifest files (lockfiles, vendor.txt) in the repo.
~/workspace/code/clip~/workspace/code/clip is 4.6 GB.
Per project memory `clip` is the Rust screencast recorder. 4.6 GB almost certainly = compiled artifacts (~/target/) and/or sample recordings.
Fix steps
- du -shx ~/workspace/code/clip/* | sort -h | tail -5.
- If target/ is huge: cargo clean (or set CARGO_TARGET_DIR=~/.cache/cargo).
- Move sample recordings to ~/workspace/data/clip-samples/.
~/workspace/code/hellorobot~/workspace/code/hellorobot is 4.2 GB.
Likely git history + datasets/bag files from robot work. Bag files and recordings should not live in git.
Fix steps
- du -shx ~/workspace/code/hellorobot/* | sort -h | tail -5.
- Move .bag, .mcap, dataset dirs to ~/workspace/data/hellorobot-<purpose>/.
- Update CI/scripts to reference DATA_ROOT.
~/workspace/code/ecad_hello_robot~/workspace/code/ecad_hello_robot is 1.6 GB.
EDA / PCB project — gerbers and 3D step files are large. Tolerable for now, but worth checking what's tracked vs generated.
Fix steps
- Audit what's checked in vs build output; .gitignore any generated dirs.
~/workspace/code/stretch_isaac_sim~/workspace/code/stretch_isaac_sim is 505 MB.
Isaac sim assets are large by nature. Just over the 100MB rule but not egregious. Worth confirming USD/textures live in data/ rather than the repo.
Fix steps
- Identify large USD/texture files; move to ~/workspace/data/stretch-isaac-assets/.
~/workspace/code/core~/workspace/code/core is 634 MB.
Worth a quick audit; under 1 GB is not urgent.
Fix steps
- du -shx ~/workspace/code/core/* | sort -h | tail -5 to identify offenders.
Code outside workspace
1 item~/.fzf~/.fzf has its own .git checkout outside ~/workspace/code/.
fzf installation by the recommended installer keeps a git checkout in $HOME. Not strictly a violation (it's a tool, not a project), but the only repo checkout that lives outside the workspace convention.
Fix steps
- Acceptable as-is. If you want strict adherence: install via apt/snap/brew instead of the git-checkout installer.
Worktree state
1 item~/workspace/code/*-wt/Zero active worktrees across all projects.
No `~/workspace/code/<project>-wt/<slug>/` dirs found and `git worktree list` in each repo shows only the primary checkout. This may be fine (clean state) or signal that the cc helper / operator console isn't being used to spawn agent worktrees the way the architecture intends.
Fix steps
- If intentional (you're between projects): no action.
- Else: confirm `cc` works (cc <project> opens fzf picker) and `+ new` in /worktrees creates the expected dir.
Project naming
1 item~/workspace/code/Multiple too.foo-related repos with inconsistent naming.
Observed: too.foo, amazon-too-foo, compliant.too.foo, gate-too-foo, kaggle-too-foo (CF Pages project name), content-too-foo, etc. Mix of dotted (compliant.too.foo) and hyphenated (gate-too-foo) — not strictly duplicate, but inconsistent.
Fix steps
- Pick one convention (hyphenated is friendlier for tools).
- Rename dotted dirs on disk + update CF Pages project names to match.
Restic health
2 itemsrest:http://100.93.45.3:8000/No record of recent `restic check` (full repo integrity).
`restic snapshots` succeeds but a full `restic check --read-data` is the only way to detect bit rot in the repo on sbl3. Best-practice is monthly. No timer for it currently.
Fix steps
- Run once: sbl1-backup check --read-data-subset 5% (sample-mode is fast).
- Add a monthly timer: ~/.config/systemd/user/restic-check.timer.
- Pipe failure to a notification path (e.g., the existing Discord status webhook).
rest:http://100.93.45.3:8000/Restic snapshot includes /etc/fleet-pusher — unexpected source path.
Latest restic run included /etc/fleet-pusher as a backup source. Either that's intentional (fleet config) and should be documented, or it's stale config left over from a one-off command.
Fix steps
- Check sbl1-backup config (env vars or systemd unit ExecStart).
- Document the rationale, or remove the path if not intended.
sbl3 placement candidates
1 item~/workspace/data27 GB sitting in ~/workspace/data/ — candidate for sbl3-hosted store.
Per fleet rule: 'sbl3 is the canonical database host for any new persistent store.' Existing local data stays where it is (SQLite over network is bad), but bulk datasets are good sbl3 candidates if they're shared across hosts/agents.
Fix steps
- du -shx ~/workspace/data/* | sort -h | tail -10.
- For each large subtree, decide: read-once dataset (keep local) vs shared reference data (move to sbl3 + reference via Tailscale).
- rclone or rsync over Tailscale to /srv/data/ on sbl3.
DB backup placement
1 itemsbl3:consolesbl3 Postgres (console DB) — no nightly logical dump configured.
Per fleet rule: 'Nightly DB backups (logical dumps) target sbl3 even when the live DB lives elsewhere.' The console DB lives ON sbl3 — a dump to a DIFFERENT host (or to its own restic snapshot) is needed for off-host recovery.
Fix steps
- Add a daily pg_dump → ~/backups/postgres/console-YYYY-MM-DD.sql.gz on sbl3.
- Include in sbl3's restic snapshot (so it lands in the restic repo).
- Retention: keep 14 daily + 8 weekly + 12 monthly.
Failed user units
1 item~/.config/systemd/user/gmail-to-memory.servicegmail-to-memory.service is in `failed` state.
Per unit description: 'Refresh Gmail-to-memory index for Krishna semantic search.' Either the Gmail OAuth has expired, the index target moved, or the script has a bug. Krishna's email recall is degraded while this is down.
Fix steps
- systemctl --user status gmail-to-memory.service
- journalctl --user -u gmail-to-memory.service -n 50
- Likely: Gmail OAuth token expired → re-auth via the gmail tool.
- Or: vault key referenced by the script no longer exists.
Failed system units
1 item/etc/systemd/system/fleet-updater.service → update-fleet-agent-from-release.shfleet-updater.service fails with curl 404 on the release-asset URL.
Confirmed 2026-05-22 (after gh auth refresh): unit still fails. Logs: 'curl: (22) The requested URL returned error: 404'. The script update-fleet-agent-from-release.sh is hitting a release-asset URL that no longer exists — release naming likely changed in the upstream fleet-agent repo. NOT a gh-auth issue.
Fix steps
- cat /usr/local/bin/update-fleet-agent-from-release.sh (or wherever it lives) — find the curl URL.
- gh release list --repo <fleet-agent-repo> to see current release naming.
- Update the asset URL pattern in the script.
- sudo systemctl start fleet-updater.service to verify.
Cron coverage
1 itemuser crontabUser crontab has one entry (openclaw cost report at 11:05 daily).
Only ~/.openclaw/cost-report/cost-report.mjs is scheduled via cron. Everything else schedule-driven (backups) is via systemd timers — good. But: that one cron job sends to Discord; if the unit is sensitive (writes tokens), consider migrating to a systemd timer for consistency.
Fix steps
- No urgency. Optional: replace the cron entry with ~/.config/systemd/user/openclaw-cost-report.{service,timer}.
Reclaimable caches
2 items~/.cache~/.cache holds 8.3 GB.
User cache aggregate. Largest sub-trees likely include pip, mozilla, puppeteer, electron, npm logs. Safe to prune; tools rebuild on demand.
Fix steps
- du -shx ~/.cache/* | sort -h | tail -10 to identify culprits.
- rm -rf ~/.cache/pip/* (pip recreates).
- rm -rf ~/.cache/puppeteer ~/.cache/ms-playwright if not in active use.
- Consider adding a monthly cron to vacuum stale subdirs.
~/.npm~/.npm holds 5.8 GB.
npm download/cache. Reclaimable any time; npm refetches as needed.
Fix steps
- npm cache clean --force (or rm -rf ~/.npm).
- Consider setting cache-min in .npmrc to age out entries.
Agent state growth
1 item~/.openclaw~/.openclaw holds 7.7 GB.
OpenClaw agent state, model snapshots, and pre-update backups. Audit security item openclaw-auth-profiles already flags agents/krishna/agent/auth-profiles.json + backups/pre-update-*/ for plaintext tokens.
Fix steps
- du -shx ~/.openclaw/* | sort -h to find the big subtree.
- Likely: ~/.openclaw/backups/pre-update-* — many are obsolete and contain old auth-profiles.json.
- Delete pre-update backups older than 2 weeks after confirming current state is healthy.
Local state growth
1 item~/.local~/.local holds 7.5 GB.
~/.local/share + ~/.local/lib + ~/.local/state aggregate. Often holds pipx installs, model files (ollama before move), language servers, shell histories, and accumulated app state.
Fix steps
- du -shx ~/.local/*/* | sort -h | tail -10 to find concrete subtrees.
- Check ~/.local/share/ollama (model files belong under ~/workspace/data/).
- Check ~/.local/state — old log files can be pruned.
Config dir bloat
1 item~/.config~/.config holds 7.4 GB.
Larger than expected for a config dir — strong signal that one or more tools are violating XDG by storing cache/data there. Browser profiles (brave, chromium, firefox), VS Code, JetBrains state are common culprits.
Fix steps
- du -shx ~/.config/* | sort -h | tail -10.
- Identify offenders; move their large data to ~/.cache or ~/.local as appropriate.
- Some apps respect $XDG_DATA_HOME / $XDG_CACHE_HOME — set these if not already.
Untriaged downloads
1 item~/Downloads~/Downloads holds 2.2 GB.
Browser downloads folder. Per workspace convention, ad-hoc files should move into ~/workspace/scratch or ~/workspace/data; long-term storage should land on sbl3.
Fix steps
- Triage: ls -lt ~/Downloads | head -30 — anything older than 30 days?
- Move keepers to ~/workspace/data/<name>/; delete the rest.
Workspace convention
1 item~/workspace/scratch~/workspace/scratch is empty.
Per CLAUDE.md the convention is to keep throwaway work in ~/workspace/scratch. It is currently empty, while ~/Downloads (2.2GB) holds untriaged files. Either signal: convention not being used, OR scratch was recently cleaned.
Fix steps
- No fix needed if scratch was just cleaned.
- Else: redirect ad-hoc work to ~/workspace/scratch instead of ~/Downloads or ~/tmp.
Filesystem fullness
1 item/media/curious/40C25F80C25F78DC/dev/sda3 (978G external) at 68% used (665G of 978G).
External mount, not the system drive. Not urgent — 314G free — but trending upward eventually pushes to >85%. Consider what lives there and whether any of it belongs on sbl3 instead.
Fix steps
- du -shx /media/curious/40C25F80C25F78DC/* | sort -h | tail -10.
- Identify large stale subtrees; archive to sbl3 or delete.
Tailscale config
2 itemsTailscale SSH enabled but ACL doesn't allow anyone to access sbl1.
`tailscale status` health check: 'Tailscale SSH enabled, but access controls don't allow anyone to access this device. Ask your admin to update your tailnet's ACLs to allow access.' Tailscale SSH on sbl1 is effectively dead — OpenSSH still works, but the Tailscale SSH cert path is unusable.
Fix steps
- Either: disable Tailscale SSH on sbl1 (sudo tailscale set --ssh=false) since OpenSSH already serves the same purpose.
- Or: update tailnet ACL to allow `tag:fleet` → sbl1 ssh.
- Decide which auth path you want and remove the other to reduce surface.
Tailscale reports: can't reach configured DNS servers (MagicDNS may be flaky).
Per `tailscale status` health: 'Tailscale can't reach the configured DNS servers. Internet connectivity may be affected.' Matches the db.mjs comment that MagicDNS (hostname `sbl3`) is unreliable from sbl1 — system DNS resolves `sbl3` to the wrong IP. Workaround: explicit Tailscale IPs in code.
Fix steps
- Investigate /etc/resolv.conf, systemd-resolved status, and tailscale's --accept-dns setting.
- Either fix the DNS path or document the IP-only convention.
- Update ~/.ssh/config so hostnames resolve consistently for tools that rely on system DNS (psql, postgres.js — already in db.mjs comment).
Fleet reachability
1 itemtailscale → sbl10 OFFLINEsbl10 offline 32 days — but referenced in audit data + CLAUDE.md.
User-memory says 'sbl10 does NOT exist' as a fleet member. Tailscale shows a registered node 100.66.19.14, offline 32 days. Audit data (security.ts, security-stages.ts) still lists sbl10 in the SSH key deploy targets — this is convention drift documented in the convention category. Network-side fix: deregister.
Fix steps
- Decide: is sbl10 retired? If yes, tailscale logout on the node and delete from the tailnet admin console.
- Then update the security finding ssh-id-ed25519 to drop sbl10 from the host list.
Fleet topology
3 itemsTwo sbl3 entries in Tailscale: sbl3-1 (online, 100.93.45.3) + sbl3 (offline 32d).
There is an active sbl3 node at 100.93.45.3 (DNS: sbl3-1) AND an offline sbl3 node at 100.88.237.49 (DNS: sbl3). The active one is what db.mjs uses (PGHOST=100.93.45.3). The offline sbl3 entry is stale — probably from a prior reinstall.
Fix steps
- tailscale → admin console → delete the stale `sbl3` node (100.88.237.49).
- After deletion, the active node should become the canonical `sbl3` DNS name.
- Update db.mjs PGHOST comment when this happens.
tailscale status → sbl5 (offline 30d) + sbl5-1 (offline 43d)sbl5 (linux) + sbl5-1 (windows) appear in Tailscale but not in CLAUDE.md fleet table.
Fleet table in CLAUDE.md goes sbl0..sbl4. Two sbl5 nodes registered to the tailnet (Linux + Windows). Both offline >30 days. Either: add to the fleet table (and document role) or deregister.
Fix steps
- Decide if sbl5 is part of the fleet. If yes: add row to CLAUDE.md.
- If no: tailscale logout, delete from tailnet admin.
iPad (last seen 236d) + iPhone (47d) in tailnet — likely stale device entries.
Mobile devices last seen 200+/47 days ago. Phones rotate Tailscale keys when re-installed, so old entries linger.
Fix steps
- Tailscale admin → expire/remove the iPad entry (236d gone is clearly stale).
- iPhone (47d) — keep if still using it, else also remove.
Pending packages
1 item22 apt packages have updates available on sbl1.
unattended-upgrades is active but is configured to apply security updates only by default. Standard package updates accumulate until a manual `apt upgrade`.
Fix steps
- apt list --upgradable to review.
- sudo apt update && sudo apt upgrade — at a time of low activity.
- Optional: enable unattended-upgrades for the `updates` source too in /etc/apt/apt.conf.d/50unattended-upgrades.
Fleet coverage
1 itemUpdate status for sbl0/sbl2/sbl3 not yet inventoried — scan only covers sbl1.
First-pass scan ran locally. sbl0 (Pi) and sbl2 (laptop) and sbl3 (Postgres host) each have their own apt schedule. Need a small wrapper that ssh's to each and reports back.
Fix steps
- Write a ~/bin/fleet-update-status that loops sbl0..sbl3 and reports apt list --upgradable | wc -l, reboot-required, unattended-upgrades status.
- Add an audit item per host as findings.
- Eventually expose this via the api-server as /fleet/updates.
Pinned tools
1 itemcloudflared client is 2026.3.0 — current is 2026.5.0.
Reported by `cloudflared tunnel list`: 'Your version 2026.3.0 is outdated. We recommend upgrading it to 2026.5.0.' Not security-critical, but the newer release may include tunnel ingress bug fixes.
Fix steps
- sudo apt update && sudo apt install --only-upgrade cloudflared
- (or: curl-install from CF's GitHub releases if apt is behind).
- After upgrade, restart all cloudflared-*.service units.
Pages projects
2 items13+ CF Pages projects under the too.foo umbrella — confirm all still referenced.
Pages projects: sbl-console, too-foo-git, power-electronics-too-foo, kaggle-too-foo, content-too-foo, sensors-too-foo, vault-too-foo, munshi-too-foo, chladni-too-foo, helios-too-foo, atlas-too-foo, spice-too-foo, and more truncated. Many have <too.foo> subdomains. Some are likely active sub-apps; others may be dormant experiments.
Fix steps
- List all projects (the scan showed 13+; output was truncated).
- For each: check Pages dashboard for last successful deploy + traffic.
- Inactive projects: delete or archive. Each one carries a DNS record + cert.
sbl-console Pages project: auto-deploys via GitHub Actions (not CF native git).
Push to main → .github/workflows/deploy.yml runs `wrangler pages deploy out --project-name=sbl-console`. Native CF Pages git-connect is unused (would duplicate). `npm run deploy` remains as manual escape. Audit kept at low severity so failures (e.g. expired CLOUDFLARE_API_TOKEN) get noticed.
Fix steps
- Verify last GHA deploy run succeeded: gh run list --repo Shivam-Bhardwaj/sbl --workflow deploy.yml
- If failing: check that GH secrets CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID are current.
Access policies
1 itemCF Access protects console.too.foo, console-api.too.foo, console-pty.too.foo — verify policy still scoped to single email.
Per CLAUDE.md: 'All behind Cloudflare Access (single email policy: curious.antimony@gmail.com).' Worth periodically re-checking the policy list — adds/removes can creep in.
Fix steps
- CF dashboard → Zero Trust → Access → Applications.
- Confirm each app's policy is still {require_email = curious.antimony@gmail.com}.
Workers & R2
1 itemWorkers and R2 buckets not yet inventoried — scan only covered Pages + Tunnels.
Initial scan used `wrangler pages project list` and `cloudflared tunnel list`. Cloudflare also hosts Workers and R2 buckets potentially. Inventory needed to confirm what surface area exists.
Fix steps
- sbl-secret-env CLOUDFLARE_API_TOKEN=sbl0-cloudflare-api-token -- npx wrangler r2 bucket list
- sbl-secret-env … -- npx wrangler deployments list (per known worker name).
- Or use CF API directly to enumerate all worker scripts in the account.
Public history hygiene
1 item14 public repos under Shivam-Bhardwaj (mostly upstream forks) — verify history clean.
Of 14 public repos, 13 are upstream-OSS forks (pytorch, qiskit, rerun, mujoco, tokenizers, keploy, newton, ros2_documentation, stretch_ai, 3dgrut, RAP, neural-robot-dynamics, Standard-Notes). The 14th is `clip` (your Rust screencast recorder). Forks inherit upstream history; `clip` should get a one-time gitleaks pass.
Fix steps
- gitleaks detect -s ~/workspace/code/clip --no-git=false
- If any hits in history (likely none — early commits): BFG/git-filter-repo + force-push + rotate credential.
- For forks: low priority since the upstream history is the dominant signal; if you've never pushed local commits, nothing to leak.
External access
1 itemGitHub deploy keys + Actions secrets not yet inventoried.
When SSH key rotation happens (security stage F), any GitHub deploy keys that mirror the old key need re-deployment. Same for any GHA workflows that reference the leaked PATs from bash_history.
Fix steps
- gh ssh-key list
- gh api /repos/{repo}/actions/secrets per active repo (or per public-deployed repo).
- Coordinate any deploy-key rotation with the SSH key swap (stage F).
Rust-strategy follow-through
1 itemrs/TOOLS/AUTOCRATE deletion pending — waits for crate.too.foo verification.
Phase E ported the AutoCrate Next.js math to apps/crate (Astro+TS). Per the too-foo-rust-strategy memory, the live Rust subdomain (autocrate.too.foo, served by rs/TOOLS/AUTOCRATE) gets removed AFTER the Astro version is verified live at crate.too.foo and DNS cuts over. CF Pages project crate-too-foo was auto-provisioned, but the user has not yet deployed or verified.
Fix steps
- 1. cd ~/workspace/code/too.foo && pnpm --filter @too-foo/crate dev --host 0.0.0.0
- 2. Verify the calculator works against the four AutoCrate scenario presets.
- 3. tools/deploy.sh crate (or push agent/<slug> → CF Pages preview → merge to main).
- 4. Confirm https://crate.too.foo/ is live + correct.
- 5. rm -rf ~/workspace/code/too.foo/rs/TOOLS/AUTOCRATE (per Rust strategy).
- 6. Mark this item resolved.
Failure visibility
1 itemNo pager / notification path for failed Pages deployments.
CF Pages deploys can silently fail (build errors, env-var issues, wrangler.jsonc misconfig). Without a notification path, you only learn when visiting the site or reading the dashboard. The fleet has Discord wired up for openclaw-cost-report; the same channel could carry build alerts.
Fix steps
- Add a small wrangler-deploy wrapper that captures exit code + tails the deploy log and posts failures to the Discord webhook (sbl0 vault key).
- Or: use CF Pages webhooks → CF Worker → Discord post.
Recovery
1 itemNo documented rollback procedure for a bad deploy.
Direct-upload via `npm run deploy` overwrites the live deployment. CF Pages keeps prior deployments accessible via the dashboard, but no scripted rollback path. Worth one-lining 'how to roll back' before you need it.
Fix steps
- Document: `wrangler pages deployment list --project-name sbl-console` → pick previous → `wrangler pages deployment rollback <id>`.
- Add as a README section or to CLAUDE.md.
Auth freshness
1 item~/.config/gemini/api_key.txt~/.config/gemini/api_key.txt last modified 86 days ago.
Gemini API keys don't auto-expire, but rotating every ~90 days is good hygiene — especially for a long-lived plaintext key on disk. This is also tied to the security finding gemini-api-key-file (same path, plaintext).
Fix steps
- Rotate in Google AI Studio.
- Store new key as sbl0-google-gemini-api-key in vault.
- Remove the on-disk file (per security finding gemini-api-key-file).
- Update callers to use sbl-secret-env GEMINI_API_KEY=sbl0-google-gemini-api-key.
Agent disk usage
1 item~/.openclaw~/.openclaw consuming 7.7 GB — disproportionate for agent state.
Cross-referenced with disk-hygiene category. ~/.openclaw/backups is only 68K, so the bulk is elsewhere — probably models, indexes, or accumulated session caches. Worth investigating.
Fix steps
- du -shx ~/.openclaw/* | sort -h to find the big subtree.
- Identify what's there; many openclaw subprojects keep their own state.
- Decide what to prune; check for old gmail-to-memory index if it grew.
Agent degradation
1 itemKrishna's email recall is degraded — gmail-to-memory.service is failed.
Cross-referenced with services category. While that service stays failed, Krishna can't answer 'when did X email arrive' type queries against current inbox state.
Fix steps
- See services-gmail-to-memory-failed in the services category.
- Likely: Gmail OAuth refresh required.
Per-project CLAUDE.md
1 item~/workspace/code/*/CLAUDE.mdSeveral active projects under ~/workspace/code/ still have no CLAUDE.md.
Live as of 2026-05-23: sbl, too-foo, gate-too-foo, core, fleet, e51.org, s3m2p all have project-level CLAUDE.md. The rest (krishna, mcad, clip, etc.) don't. Per the global instructions: 'Read the relevant project's CLAUDE.md (project-specific facts)' — for unrooted projects there's nothing to read.
Fix steps
- Decide which remaining projects WARRANT a CLAUDE.md (active projects only).
- Seed each with the /init skill (claude can generate a starter from the codebase).
- Priority candidates: clip (Rust screencast recorder), hellorobot (work-related).
Dotfile / bin drift
1 item~/bin/secret-store~/bin/secret-store is a broken symlink (orphaned).
Single broken symlink under $HOME. Likely a stale shim from an older vault naming scheme; sbl-secret has replaced it.
Fix steps
- ls -l ~/bin/secret-store — confirm the target.
- If unused: rm ~/bin/secret-store.
- If still expected by some script: grep -r secret-store ~/bin ~/.config to find callers and update.
Fleet topology accuracy
2 items~/workspace/code/sbl/src/data/audit/security{,-stages}.tsAudit data (security.ts + security-stages.ts) lists sbl10 as a key-deploy target — sbl10 is not in the fleet.
User memory: 'sbl10 does NOT exist'. The ssh-id-ed25519 security finding lists sbl10 in its 'authenticates to' set; the security stage F SSH-key-swap step says 'ssh-copy-id new pubkey to sbl0, sbl1, sbl2, sbl3, sbl4, sbl10'. The Tailscale tailnet still has a registered (offline 32d) sbl10 node, but documentation should reflect actual fleet membership.
Fix steps
- Remove sbl10 from the consumer list in security.ts (ssh-id-ed25519).
- Remove sbl10 from the items list in security-stages.ts (stage F).
- Also: deregister the offline sbl10 node from Tailscale admin (see network category).
Tailscale has more registered nodes than CLAUDE.md's fleet table documents.
CLAUDE.md fleet table: sbl0..sbl4. Tailscale tailnet: sbl0..sbl4 PLUS sbl5 (linux, offline 30d), sbl5-1 (windows, offline 43d), sbl10 (linux, offline 32d), ipad (offline 236d), iphone (offline 47d), and a duplicate sbl3 entry. Either CLAUDE.md is incomplete, or stale tailnet nodes should be deregistered.
Fix steps
- Decide policy: tailnet is the source of truth OR CLAUDE.md is.
- Bring them into sync.
Vault prefix rule
1 itemsbl0 vault8 vault entries do not match the sbl{0,3,4,10}-* convention.
Listing: cloudflare-global-api-key, contabo-deploy-ssh-key, discord-bot-token, github-token, ssh-ed25519-private (these 5 are tracked by security finding legacy-vault-entries), PLUS git-email, git-name, SBL0_CF_EDIT (probably a leaked env-var name stored as a key).
Fix steps
- First 5: follow security stage G procedure (migrate to sbl0-* + drop |legacy fallback).
- git-email, git-name: these are git config, not secrets — either move to ~/.gitconfig (not vault) or rename to sbl0-git-email / sbl0-git-name.
- SBL0_CF_EDIT: looks like the literal env var name got stored. Verify what value it holds, then either rename it correctly or delete.
Dashboard claims
1 item~/workspace/code/sbl/src/data/platform.ts (line ~491)src/data/platform.ts asserts local token files removed — re-probe confirms files still exist.
platform.ts secretInventoryTsv label: 'OpenClaw, Kimi, GitHub, and Kaggle wrappers use sbl-secret-env; local OpenClaw/Kaggle/GitHub token files removed.' Re-probe: ~/.kaggle/access_token (38 bytes), ~/.config/gh/hosts.yml (304 bytes), ~/.openclaw/auth-profiles.json (327 bytes) — all still present. Already tracked in security as stale-dashboard-claims. Belongs here too as a convention violation: dashboard data must reflect ground truth.
Fix steps
- Either: rotate + delete the files (per security findings) then leave the claim.
- Or: rewrite the claim to be live-derived (a small script that runs at build time checking those paths).
Prevention rules
1 item~/.bashrc.d/No ~/.bashrc.d/security.sh — HISTIGNORE and trap rules from security stage H not yet installed.
~/.bashrc.d/ exists (single file: posemodel.env). Adding security.sh with HISTIGNORE and a prompt-command trap was deferred to security stage H. Logged here because it's a convention-level expectation (prevention rules should be in version-controlled drop-ins).
Fix steps
- After security keys are rotated (stage A–G done), create ~/.bashrc.d/security.sh.
- Contents per security-stages.ts stage H.
- Shivam-Bhardwaj/dotfiles repo exists (private, 220d stale) — see github-dotfiles-repo-stale; consider reviving it and committing ~/.bashrc.d/ there for cross-host parity.
Staged remediation (security)
Lowest blast-radius first. Each stage is reversible up to the moment a secret is revoked at the provider.Free wins — no rotation needed
blast: nonePure dead data and dead references. No live consumer reads any of these.
- Delete sbl2-backup-wipe-prep-20260404/ (82 GB).
- Delete the two smaller sbl2-migration backups.
- Delete the 5 `sbl1-*` rows from registry/secrets.tsv.
- Fix or delete ~/bin/openclaw-discord-status.
Low-risk API key rotations
blast: single toolVerify the rotation methodology on credentials with one consumer before touching anything cross-cutting.
- Kaggle KGAT_* — only the dashboard wrapper reads it.
- Gemini AIza* — confirm consumers first; replace api_key.txt content via vault-fetched env.
- HuggingFace hf_* if any are live (most appearances were in old backups).
Medium-risk: Cloudflare + Google
blast: deploy scripts; rclone copyCloudflared tunnel rotation is online; gcloud/rclone re-auth is one-shot.
- sbl-secret put new sbl0-cloudflare-api-token, then revoke old in CF dashboard.
- cloudflared tunnel rotate <name> — daemon picks up new secret automatically.
- gcloud auth revoke --all && gcloud auth login.
- rclone config reconnect gdrive:.
GitHub PATs
blast: git push/pull until refresh completesMost painful daily-use disruption; do it deliberately, not at end of day.
- Revoke leaked PATs visible in bash_history (newest first).
- gh auth refresh -h github.com.
- Verify: gh auth status; gh repo list -L 1.
Anthropic / Codex / OpenClaw OAuth
blast: Krishna offline for the openclaw stepEach manager owns its own file; re-auth swaps the token without disrupting other agents.
- claude /logout && claude /login (do this last in the session).
- codex logout && codex login.
- systemctl --user stop openclaw-gateway krishna-proxy → openclaw auth login → restart.
- Re-bootstrap antigravity.
SSH key replacement
blast: one wrong step = locked out of sbl0The vault depends on SSH to sbl0. Deploy new key first, test on every host, then swap.
- ssh-keygen new key with passphrase under ~/.ssh/id_ed25519.new.
- ssh-copy-id new pubkey to sbl0, sbl1, sbl2, sbl3, sbl4, vast hosts.
- Verify new key works against every host.
- Atomic swap locally; ssh-add to gcr-ssh-agent.
- Remove old pubkey from every host's authorized_keys.
- sbl-secret put sbl0-curious-ssh-ed25519-private and shred old key.
Vault hygiene
blast: wrapper falls back to legacy until updatedMigrate the 5 unprefixed entries to sbl0-*. Wrappers already have fallback chains, so this is safe iff scripts are updated first.
- Migrate github-token → sbl0-github-pat; update repo_policy.py.
- discord-bot-token already has sbl0-openclaw-discord-token — drop the |fallback in openclaw-discord-gateway.
- Same pattern for cloudflare-global-api-key and contabo-deploy-ssh-key.
- After confirmation: sbl-secret delete <legacy> for each.
Prevention
blast: noneStop the next leak before it happens.
- Add HISTIGNORE='*ghp_*:*sk-ant-*:*AIza*:*hf_*:*gsk_*:*xai-*:*KGAT_*' to .bashrc.
- Add a DEBUG/PROMPT_COMMAND trap that refuses `export *_TOKEN=…` and `gh auth login --with-token <literal>`.
- Audit every project .gitignore for .env, .dev.vars, credentials.json.
- Optional: install gitleaks or pre-commit secret-detection.
Operating principle
The audit is the to-do list, not the work. Revoke at provider, re-auth via the manager, verify, delete prior copies. The vault is the backup of the rotated value, not the live source.