Audit

Fleet hygiene · 2026-05-16 · 94 items across 14 categories

host / Consolescope / fleet hygiene2026-05-16

fleet hygiene tracker — loading

The DNA of how we work,
tracked in one place.

Plaintext credentials on disk, conformance to the sbl0-only vault rule, and exfiltration risk from local logs and backups.

open75

in-prog00

done18

skip01

§ 01

Findings

75 matching · 94 total · 0 todo

Shell history leak

1 item

criticalsbl1~/.bash_history (lines 157-1541)

15+ commands echoed literal ghp_* and github_pat_* GitHub PATs (export GITHUB_TOKEN=…, gh auth login --with-token, scp …bashrc, etc.).

Manager: bash (no manager — pure history)
Consumers: none — historical record only

note

Fix steps

Revoke each leaked PAT at github.com/settings/tokens (start with the most recent two).
gh auth refresh -h github.com (writes a new oauth_token).
history -c && shred -u ~/.bash_history && touch ~/.bash_history.

Plaintext credential file

11 items

criticalsbl1~/.config/gemini/api_key.txt

Plaintext AIza* Gemini API key (40 bytes).

Manager: manual / Gemini CLI
Consumers: any tool reading GEMINI_API_KEY env or this file

note

Fix steps

Revoke key at Google AI Studio.
Create new key.
sbl-secret put sbl0-google-gemini-api-key <new>.
rm ~/.config/gemini/api_key.txt; have callers fetch via sbl-secret-env.

criticalsbl1~/.config/gh/hosts.yml

GitHub oauth_token in plaintext (standard gh CLI location).

Manager: gh auth login
Consumers: git push/pull via gh auth git-credential (configured in ~/.gitconfig)
every gh CLI call

note

Fix steps

gh auth refresh -h github.com -s repo,workflow (issues a new token, revokes old).
Verify: gh auth status.
Optionally: sbl-secret put sbl0-github-pat <token from gh auth token>.

criticalsbl1~/.config/rclone/rclone.conf

gdrive OAuth { access_token, refresh_token, expiry } in plaintext.

Manager: rclone config
Consumers: manual rclone copy/mount calls

note

Fix steps

rclone config reconnect gdrive: (re-runs OAuth, writes new tokens).
Old refresh_token can be revoked at myaccount.google.com/permissions.

criticalsbl1~/.config/gcloud/application_default_credentials.json + legacy_credentials/*/adc.json

gcloud refresh_token + client_secret in plaintext for two identities.

Manager: gcloud auth login
Consumers: gcloud CLI, any GCP SDK that loads ADC

note

Fix steps

gcloud auth revoke --all.
gcloud auth login (writes new credentials.db / ADC).
gcloud auth application-default login if needed.

criticalsbl1~/.kaggle/access_token

KGAT_* Kaggle API key in plaintext (38 bytes).

Manager: Kaggle CLI / Kaggle website download
Consumers: ~/bin/run-kaggle-dashboard.sh
kaggle CLI

note

Fix steps

kaggle.com → Account → Expire API Token, then Create New Token.
Download replaces the file; alternatively store via sbl-secret put sbl0-kaggle-api-key.

criticalsbl1~/.cloudflared/5774c5a1-3631-495b-afca-1daa84563fe7.json

Cloudflare tunnel TunnelSecret + AccountTag in plaintext.

Manager: cloudflared tunnel create
Consumers: the live cloudflared tunnel routing rahul.dhruvgaba.com → ssh://localhost:22

note

Fix steps

cloudflared tunnel rotate <tunnel-name> — zero-downtime rotation.
Daemon picks up new <uuid>.json automatically.

criticalsbl1~/.cloudflared/cert.pem

Argo Tunnel token (-----BEGIN ARGO TUNNEL TOKEN-----).

Manager: cloudflared tunnel login
Consumers: cloudflared tunnel create/delete commands

note

Fix steps

Only needed if you manage tunnels regularly; can be regenerated with cloudflared tunnel login.

criticalsbl1~/.claude/.credentials.json

Anthropic OAuth access token (sk-ant-oat01-*) in plaintext.

Manager: claude /login
Consumers: this CLI session
every future Claude Code launch

note

Fix steps

Do this LAST — risks interrupting in-flight remediation.
claude /logout then claude /login.
The old access_token is short-lived; the refresh path is re-established.

criticalsbl1~/.codex/auth.json

ChatGPT OAuth id_token (JWT) + tokens object in plaintext.

Manager: codex login
Consumers: codex CLI
codex-screenshot-bridge.service
codex-project-screenshot-sync.service

note

Fix steps

codex logout && codex login.

criticalsbl1~/.openclaw/auth-profiles.json (+ agents/krishna/agent/auth-profiles.json + backups/pre-update-*/)

Anthropic OAuth token (sk-ant-oat01-*) in plaintext for openclaw.

Manager: openclaw auth
Consumers: openclaw-gateway.service
krishna-proxy.service
krishna-ask wrapper

note

Fix steps

systemctl --user stop openclaw-gateway krishna-proxy.
openclaw auth login (writes new auth-profiles.json).
systemctl --user start krishna-proxy openclaw-gateway.
Verify: krishna-ask 'hello' replies normally.

criticalsbl1~/.antigravity-server/.15487b30…token

Numeric session token in plaintext.

Manager: antigravity bootstrap
Consumers: antigravity runtime

note

Fix steps

Re-bootstrap antigravity to issue a new token.

SSH key hygiene

3 items

criticalsbl1~/.ssh/id_ed25519

Unencrypted private key (no passphrase). Same key authenticates to sbl0, sbl1, sbl2, sbl3, sbl4, vast-2x5090, and vast.

Manager: ssh-keygen / manual
Consumers: vault SSH calls (sbl-secret get/put → curious@sbl0)
every fleet shell + git remote (git@github.com:…)
sbl-vault-store

note

Fix steps

Generate ~/.ssh/id_ed25519.new with a passphrase.
ssh-copy-id -i id_ed25519.new.pub to every fleet host while old key still works.
Test new key end-to-end against each host before swapping.
Atomic swap; ssh-add for gcr-ssh-agent.
Remove old pubkey from each host (sed -i on authorized_keys).
sbl-secret put sbl0-curious-ssh-ed25519-private < id_ed25519; shred old key.

Reverse order = locks you out of sbl0 = loses vault access.

criticalsbl1~/.ollama/id_ed25519

Unencrypted private key for ollama identity.

Manager: ollama install (legacy)
Consumers: likely unused — verify with grep -r '.ollama/id_ed25519' ~

note

Fix steps

If unused: shred and remove.
If used: regenerate with passphrase.

criticalsbl1~/.local/dev-tls/key.pem

Unencrypted OpenSSH-format private key (dev TLS).

Manager: manual / mkcert?
Consumers: verify with grep -r 'dev-tls' ~

note

Fix steps

If mkcert-managed: re-issue.
If hand-rolled: regenerate or remove if unused.

Active agent session logs

2 items

highsbl1~/.claude/history.jsonl + projects/-home-curious/*.jsonl

Multiple Gemini AIza* keys and Anthropic sk-ant-oat01-* tokens appear verbatim inside Claude session transcripts (the agent streamed credential files into its context).

Manager: claude harness (append-only)
Consumers: none for auth — these are transcripts

note

Fix steps

Only after the underlying keys are rotated and revoked.
Per-token sed scrub (use the helper script scrub-history).
Or, rotate the relevant Claude project IDs and discard the older jsonl files.

highsbl1~/.codex/sessions/2026/*/*/*.jsonl

Older Codex session rollouts (Feb-Apr 2026) contain hf_*, gsk_*, xai-*, sk-ant-*, AIza* fragments.

Manager: codex harness
Consumers: none for auth

note

Fix steps

Older rollouts are no longer needed for replay — safe to delete after rotation.
rm -rf ~/.codex/sessions/2026/0[2-4] (after confirming nothing depends on them).

Stale credential dump on disk

2 items

highsbl1~/workspace/antimony-labs-org-nondashboard-20260423-185551/backups/sbl2-migration/sbl2-backup-wipe-prep-20260404/

82 GB full disk dump of sbl2. Contains an unencrypted id_ed25519, .bashrc exporting ANTHROPIC_API_KEY=sk-ant-api03-…, an older .claude/.credentials.json, Codex shell snapshots, 20+ openclaw agent auth-profiles.json files, and Downloads/Chase exports with hf_* tokens.

Manager: (snapshot — no live owner)
Consumers: none

note

Fix steps

Pure dead data — safe to delete.
Move to a quarantine dir first if you want to inspect: mv …/sbl2-backup-wipe-prep-20260404 /tmp/quarantine/.
If you must keep, re-archive as a single age-encrypted blob on sbl0.

highsbl1…/sbl2-migration/sbl2-backup-20260418-133603/ (70 MB) + …-133643/ (116 MB)

Sibling sbl2 backups; each contains .bash_history + .bashrc with sk-ant-api03-* tokens.

Manager: (snapshot)
Consumers: none

note

Fix steps

Same as above — safe to delete.

Vault hygiene

3 items

mediumsbl0sbl0 vault: github-token, discord-bot-token, cloudflare-global-api-key, contabo-deploy-ssh-key, ssh-ed25519-private

Five unprefixed entries violate the sbl0-* convention. Wrapper scripts already use `sbl0-…|legacy` fallback chains, so these are alive on purpose.

Manager: sbl-secret put / sbl-secret delete
Consumers: openclaw-discord-gateway (discord-bot-token fallback)
repo_policy.py (github-token)
any future deploy script (contabo-deploy-ssh-key, ssh-ed25519-private)

note

Fix steps

Migrate each: sbl-secret get <legacy> | sbl-secret put sbl0-<scope>-<purpose>.
Edit wrappers to drop the |legacy fallback once consumers verified.
sbl-secret delete <legacy>.

mediumsbl1~/workspace/code/sbl/registry/secrets.tsv (5 rows with name starting `sbl1-`)

Names 5 sbl1-* secrets (krishna groq/xai, openclaw discord/gateway/ollama). None of these exist in the live vault, but the file documents the forbidden prefix.

Manager: manual TSV edit
Consumers: dashboard data — informational only

note

Fix steps

Delete the 5 sbl1-* rows from secrets.tsv.

mediumsbl1~/bin/openclaw-discord-status

References sbl1-openclaw-discord-token + sbl1-openclaw-gateway-token. Both keys are absent from vault, so the script fails at runtime.

Manager: manual script
Consumers: interactive use only

note

Fix steps

Rewrite to use the sbl0-…|fallback pattern used by openclaw-discord-gateway, or delete.

Trust boundary

1 item

lowsbl1~/.ssh/config (vast-2x5090, vast)

The same SSH key that authenticates to sbl0 (vault host) is also used to log into rented vast.ai root-on-public-IP instances.

Manager: ssh-keygen / manual
Consumers: see ssh-id-ed25519

note

Fix steps

Use a separate key (vast_id_ed25519) when renting vast.ai compute.
Configure ssh-config with IdentityFile per Host.

Dashboard drift

1 item

lowsbl1~/workspace/code/sbl/src/data/platform.ts (line 491)

platform.ts asserts 'local OpenClaw/Kaggle/GitHub token files removed'. Audit shows ~/.kaggle/access_token, ~/.config/gh/hosts.yml, and ~/.openclaw/auth-profiles.json still exist.

Manager: dashboard data
Consumers: dashboard home view

note

Fix steps

After the underlying files are removed or migrated, update the claim or replace it with a live-derived check.

Workspace layout

1 item

lowsbl1~/workspace/runs

~/workspace/runs is empty (no experiment outputs being captured).

Per CLAUDE.md the runs/ dir is for experiment outputs (safe to delete). Currently empty — either no experiments are producing outputs, or outputs are landing elsewhere (likely under code repos or ~/Downloads).

note

Fix steps

When training/inference jobs run, point output dir to ~/workspace/runs/<run-id>/.
Add to wrapper scripts that currently dump outputs into code dirs.

Oversized code repo

7 items

highsbl1~/workspace/code/too.foo

~/workspace/code/too.foo is 18 GB (excluding node_modules).

Per CLAUDE.md convention, code repos must never contain >100 MB of data. 18 GB suggests image-heavy content (Precision Shorts hero+scene gen?) is committed in-repo instead of living under ~/workspace/data/ and being referenced via DATA_ROOT.

note

Fix steps

du -shx ~/workspace/code/too.foo/* | sort -h | tail -10 to find the big subtree.
Most likely culprits: apps/content/assets/, public/images/, generated/.
Move generated images/datasets to ~/workspace/data/too-foo-content/.
Add path to .gitignore; reference via env DATA_ROOT=$HOME/workspace/data.
If already in git history: BFG or git-filter-repo to purge from history.

mediumsbl1~/workspace/code/vendor

~/workspace/code/vendor is 5.1 GB.

Almost certainly bundled third-party source, model files, or large binaries that belong in ~/workspace/data/ as immutable inputs (so they can be shared across repos rather than duplicated per checkout).

note

Fix steps

Identify what's inside (likely git-submodule'd deps or vendored libs).
Move large immutable assets to ~/workspace/data/vendor-<name>/.
Keep only thin manifest files (lockfiles, vendor.txt) in the repo.

mediumsbl1~/workspace/code/clip

~/workspace/code/clip is 4.6 GB.

Per project memory `clip` is the Rust screencast recorder. 4.6 GB almost certainly = compiled artifacts (~/target/) and/or sample recordings.

note

Fix steps

du -shx ~/workspace/code/clip/* | sort -h | tail -5.
If target/ is huge: cargo clean (or set CARGO_TARGET_DIR=~/.cache/cargo).
Move sample recordings to ~/workspace/data/clip-samples/.

mediumsbl1~/workspace/code/hellorobot

~/workspace/code/hellorobot is 4.2 GB.

Likely git history + datasets/bag files from robot work. Bag files and recordings should not live in git.

note

Fix steps

du -shx ~/workspace/code/hellorobot/* | sort -h | tail -5.
Move .bag, .mcap, dataset dirs to ~/workspace/data/hellorobot-<purpose>/.
Update CI/scripts to reference DATA_ROOT.

lowsbl1~/workspace/code/ecad_hello_robot

~/workspace/code/ecad_hello_robot is 1.6 GB.

EDA / PCB project — gerbers and 3D step files are large. Tolerable for now, but worth checking what's tracked vs generated.

note

Fix steps

Audit what's checked in vs build output; .gitignore any generated dirs.

lowsbl1~/workspace/code/stretch_isaac_sim

~/workspace/code/stretch_isaac_sim is 505 MB.

Isaac sim assets are large by nature. Just over the 100MB rule but not egregious. Worth confirming USD/textures live in data/ rather than the repo.

note

Fix steps

Identify large USD/texture files; move to ~/workspace/data/stretch-isaac-assets/.

lowsbl1~/workspace/code/core

~/workspace/code/core is 634 MB.

Worth a quick audit; under 1 GB is not urgent.

note

Fix steps

du -shx ~/workspace/code/core/* | sort -h | tail -5 to identify offenders.

Code outside workspace

1 item

lowsbl1~/.fzf

~/.fzf has its own .git checkout outside ~/workspace/code/.

fzf installation by the recommended installer keeps a git checkout in $HOME. Not strictly a violation (it's a tool, not a project), but the only repo checkout that lives outside the workspace convention.

note

Fix steps

Acceptable as-is. If you want strict adherence: install via apt/snap/brew instead of the git-checkout installer.

Worktree state

1 item

lowsbl1~/workspace/code/*-wt/

Zero active worktrees across all projects.

No `~/workspace/code/<project>-wt/<slug>/` dirs found and `git worktree list` in each repo shows only the primary checkout. This may be fine (clean state) or signal that the cc helper / operator console isn't being used to spawn agent worktrees the way the architecture intends.

note

Fix steps

If intentional (you're between projects): no action.
Else: confirm `cc` works (cc <project> opens fzf picker) and `+ new` in /worktrees creates the expected dir.

Project naming

1 item

lowsbl1~/workspace/code/

Multiple too.foo-related repos with inconsistent naming.

Observed: too.foo, amazon-too-foo, compliant.too.foo, gate-too-foo, kaggle-too-foo (CF Pages project name), content-too-foo, etc. Mix of dotted (compliant.too.foo) and hyphenated (gate-too-foo) — not strictly duplicate, but inconsistent.

note

Fix steps

Pick one convention (hyphenated is friendlier for tools).
Rename dotted dirs on disk + update CF Pages project names to match.

Restic health

2 items

mediumsbl1rest:http://100.93.45.3:8000/

No record of recent `restic check` (full repo integrity).

`restic snapshots` succeeds but a full `restic check --read-data` is the only way to detect bit rot in the repo on sbl3. Best-practice is monthly. No timer for it currently.

note

Fix steps

Run once: sbl1-backup check --read-data-subset 5% (sample-mode is fast).
Add a monthly timer: ~/.config/systemd/user/restic-check.timer.
Pipe failure to a notification path (e.g., the existing Discord status webhook).

lowsbl1rest:http://100.93.45.3:8000/

Restic snapshot includes /etc/fleet-pusher — unexpected source path.

Latest restic run included /etc/fleet-pusher as a backup source. Either that's intentional (fleet config) and should be documented, or it's stale config left over from a one-off command.

note

Fix steps

Check sbl1-backup config (env vars or systemd unit ExecStart).
Document the rationale, or remove the path if not intended.

sbl3 placement candidates

1 item

mediumsbl1~/workspace/data

27 GB sitting in ~/workspace/data/ — candidate for sbl3-hosted store.

Per fleet rule: 'sbl3 is the canonical database host for any new persistent store.' Existing local data stays where it is (SQLite over network is bad), but bulk datasets are good sbl3 candidates if they're shared across hosts/agents.

note

Fix steps

du -shx ~/workspace/data/* | sort -h | tail -10.
For each large subtree, decide: read-once dataset (keep local) vs shared reference data (move to sbl3 + reference via Tailscale).
rclone or rsync over Tailscale to /srv/data/ on sbl3.

DB backup placement

1 item

highsbl3sbl3:console

sbl3 Postgres (console DB) — no nightly logical dump configured.

Per fleet rule: 'Nightly DB backups (logical dumps) target sbl3 even when the live DB lives elsewhere.' The console DB lives ON sbl3 — a dump to a DIFFERENT host (or to its own restic snapshot) is needed for off-host recovery.

note

Fix steps

Add a daily pg_dump → ~/backups/postgres/console-YYYY-MM-DD.sql.gz on sbl3.
Include in sbl3's restic snapshot (so it lands in the restic repo).
Retention: keep 14 daily + 8 weekly + 12 monthly.

Cron coverage

1 item

lowsbl4user crontab

User crontab has one entry (openclaw cost report at 11:05 daily).

Only ~/.openclaw/cost-report/cost-report.mjs is scheduled via cron. Everything else schedule-driven (backups) is via systemd timers — good. But: that one cron job sends to Discord; if the unit is sensitive (writes tokens), consider migrating to a systemd timer for consistency.

note

Fix steps

No urgency. Optional: replace the cron entry with ~/.config/systemd/user/openclaw-cost-report.{service,timer}.

Reclaimable caches

2 items

mediumsbl1~/.cache

~/.cache holds 8.3 GB.

User cache aggregate. Largest sub-trees likely include pip, mozilla, puppeteer, electron, npm logs. Safe to prune; tools rebuild on demand.

Manager: various tools

note

Fix steps

du -shx ~/.cache/* | sort -h | tail -10 to identify culprits.
rm -rf ~/.cache/pip/* (pip recreates).
rm -rf ~/.cache/puppeteer ~/.cache/ms-playwright if not in active use.
Consider adding a monthly cron to vacuum stale subdirs.

mediumsbl1~/.npm

~/.npm holds 5.8 GB.

npm download/cache. Reclaimable any time; npm refetches as needed.

Manager: npm

note

Fix steps

npm cache clean --force (or rm -rf ~/.npm).
Consider setting cache-min in .npmrc to age out entries.

Agent state growth

1 item

highsbl1~/.openclaw

~/.openclaw holds 7.7 GB.

OpenClaw agent state, model snapshots, and pre-update backups. Audit security item openclaw-auth-profiles already flags agents/krishna/agent/auth-profiles.json + backups/pre-update-*/ for plaintext tokens.

Manager: openclaw runtime

note

Fix steps

du -shx ~/.openclaw/* | sort -h to find the big subtree.
Likely: ~/.openclaw/backups/pre-update-* — many are obsolete and contain old auth-profiles.json.
Delete pre-update backups older than 2 weeks after confirming current state is healthy.

Local state growth

1 item

mediumsbl1~/.local

~/.local holds 7.5 GB.

~/.local/share + ~/.local/lib + ~/.local/state aggregate. Often holds pipx installs, model files (ollama before move), language servers, shell histories, and accumulated app state.

Manager: XDG-conformant tools

note

Fix steps

du -shx ~/.local/*/* | sort -h | tail -10 to find concrete subtrees.
Check ~/.local/share/ollama (model files belong under ~/workspace/data/).
Check ~/.local/state — old log files can be pruned.

Config dir bloat

1 item

lowsbl1~/.config

~/.config holds 7.4 GB.

Larger than expected for a config dir — strong signal that one or more tools are violating XDG by storing cache/data there. Browser profiles (brave, chromium, firefox), VS Code, JetBrains state are common culprits.

Manager: XDG-non-conformant tools

note

Fix steps

du -shx ~/.config/* | sort -h | tail -10.
Identify offenders; move their large data to ~/.cache or ~/.local as appropriate.
Some apps respect $XDG_DATA_HOME / $XDG_CACHE_HOME — set these if not already.

Untriaged downloads

1 item

lowsbl1~/Downloads

~/Downloads holds 2.2 GB.

Browser downloads folder. Per workspace convention, ad-hoc files should move into ~/workspace/scratch or ~/workspace/data; long-term storage should land on sbl3.

Manager: browser

note

Fix steps

Triage: ls -lt ~/Downloads | head -30 — anything older than 30 days?
Move keepers to ~/workspace/data/<name>/; delete the rest.

Workspace convention

1 item

lowsbl1~/workspace/scratch

~/workspace/scratch is empty.

Per CLAUDE.md the convention is to keep throwaway work in ~/workspace/scratch. It is currently empty, while ~/Downloads (2.2GB) holds untriaged files. Either signal: convention not being used, OR scratch was recently cleaned.

Manager: user habit

note

Fix steps

No fix needed if scratch was just cleaned.
Else: redirect ad-hoc work to ~/workspace/scratch instead of ~/Downloads or ~/tmp.

Filesystem fullness

1 item

lowsbl1/media/curious/40C25F80C25F78DC

/dev/sda3 (978G external) at 68% used (665G of 978G).

External mount, not the system drive. Not urgent — 314G free — but trending upward eventually pushes to >85%. Consider what lives there and whether any of it belongs on sbl3 instead.

Manager: manual

note

Fix steps

du -shx /media/curious/40C25F80C25F78DC/* | sort -h | tail -10.
Identify large stale subtrees; archive to sbl3 or delete.

Tailscale config

2 items

highsbl1

Tailscale SSH enabled but ACL doesn't allow anyone to access sbl1.

`tailscale status` health check: 'Tailscale SSH enabled, but access controls don't allow anyone to access this device. Ask your admin to update your tailnet's ACLs to allow access.' Tailscale SSH on sbl1 is effectively dead — OpenSSH still works, but the Tailscale SSH cert path is unusable.

note

Fix steps

Either: disable Tailscale SSH on sbl1 (sudo tailscale set --ssh=false) since OpenSSH already serves the same purpose.
Or: update tailnet ACL to allow `tag:fleet` → sbl1 ssh.
Decide which auth path you want and remove the other to reduce surface.

mediumsbl1

Tailscale reports: can't reach configured DNS servers (MagicDNS may be flaky).

Per `tailscale status` health: 'Tailscale can't reach the configured DNS servers. Internet connectivity may be affected.' Matches the db.mjs comment that MagicDNS (hostname `sbl3`) is unreliable from sbl1 — system DNS resolves `sbl3` to the wrong IP. Workaround: explicit Tailscale IPs in code.

note

Fix steps

Investigate /etc/resolv.conf, systemd-resolved status, and tailscale's --accept-dns setting.
Either fix the DNS path or document the IP-only convention.
Update ~/.ssh/config so hostnames resolve consistently for tools that rely on system DNS (psql, postgres.js — already in db.mjs comment).

Fleet reachability

1 item

mediumsbl1tailscale → sbl10 OFFLINE

sbl10 offline 32 days — but referenced in audit data + CLAUDE.md.

User-memory says 'sbl10 does NOT exist' as a fleet member. Tailscale shows a registered node 100.66.19.14, offline 32 days. Audit data (security.ts, security-stages.ts) still lists sbl10 in the SSH key deploy targets — this is convention drift documented in the convention category. Network-side fix: deregister.

note

Fix steps

Decide: is sbl10 retired? If yes, tailscale logout on the node and delete from the tailnet admin console.
Then update the security finding ssh-id-ed25519 to drop sbl10 from the host list.

Fleet topology

3 items

mediumsbl3

Two sbl3 entries in Tailscale: sbl3-1 (online, 100.93.45.3) + sbl3 (offline 32d).

There is an active sbl3 node at 100.93.45.3 (DNS: sbl3-1) AND an offline sbl3 node at 100.88.237.49 (DNS: sbl3). The active one is what db.mjs uses (PGHOST=100.93.45.3). The offline sbl3 entry is stale — probably from a prior reinstall.

note

Fix steps

tailscale → admin console → delete the stale `sbl3` node (100.88.237.49).
After deletion, the active node should become the canonical `sbl3` DNS name.
Update db.mjs PGHOST comment when this happens.

lowsbl1tailscale status → sbl5 (offline 30d) + sbl5-1 (offline 43d)

sbl5 (linux) + sbl5-1 (windows) appear in Tailscale but not in CLAUDE.md fleet table.

Fleet table in CLAUDE.md goes sbl0..sbl4. Two sbl5 nodes registered to the tailnet (Linux + Windows). Both offline >30 days. Either: add to the fleet table (and document role) or deregister.

note

Fix steps

Decide if sbl5 is part of the fleet. If yes: add row to CLAUDE.md.
If no: tailscale logout, delete from tailnet admin.

lowsbl1

iPad (last seen 236d) + iPhone (47d) in tailnet — likely stale device entries.

Mobile devices last seen 200+/47 days ago. Phones rotate Tailscale keys when re-installed, so old entries linger.

note

Fix steps

Tailscale admin → expire/remove the iPad entry (236d gone is clearly stale).
iPhone (47d) — keep if still using it, else also remove.

Pending packages

1 item

mediumsbl1

22 apt packages have updates available on sbl1.

unattended-upgrades is active but is configured to apply security updates only by default. Standard package updates accumulate until a manual `apt upgrade`.

note

Fix steps

apt list --upgradable to review.
sudo apt update && sudo apt upgrade — at a time of low activity.
Optional: enable unattended-upgrades for the `updates` source too in /etc/apt/apt.conf.d/50unattended-upgrades.

Fleet coverage

1 item

mediumfleet

Update status for sbl0/sbl2/sbl3 not yet inventoried — scan only covers sbl1.

First-pass scan ran locally. sbl0 (Pi) and sbl2 (laptop) and sbl3 (Postgres host) each have their own apt schedule. Need a small wrapper that ssh's to each and reports back.

note

Fix steps

Write a ~/bin/fleet-update-status that loops sbl0..sbl3 and reports apt list --upgradable | wc -l, reboot-required, unattended-upgrades status.
Add an audit item per host as findings.
Eventually expose this via the api-server as /fleet/updates.

Pinned tools

1 item

lowsbl1

cloudflared client is 2026.3.0 — current is 2026.5.0.

Reported by `cloudflared tunnel list`: 'Your version 2026.3.0 is outdated. We recommend upgrading it to 2026.5.0.' Not security-critical, but the newer release may include tunnel ingress bug fixes.

note

Fix steps

sudo apt update && sudo apt install --only-upgrade cloudflared
(or: curl-install from CF's GitHub releases if apt is behind).
After upgrade, restart all cloudflared-*.service units.

Pages projects

2 items

lowfleet

13+ CF Pages projects under the too.foo umbrella — confirm all still referenced.

Pages projects: sbl-console, too-foo-git, power-electronics-too-foo, kaggle-too-foo, content-too-foo, sensors-too-foo, vault-too-foo, munshi-too-foo, chladni-too-foo, helios-too-foo, atlas-too-foo, spice-too-foo, and more truncated. Many have <too.foo> subdomains. Some are likely active sub-apps; others may be dormant experiments.

note

Fix steps

List all projects (the scan showed 13+; output was truncated).
For each: check Pages dashboard for last successful deploy + traffic.
Inactive projects: delete or archive. Each one carries a DNS record + cert.

lowsbl4

sbl-console Pages project: auto-deploys via GitHub Actions (not CF native git).

Push to main → .github/workflows/deploy.yml runs `wrangler pages deploy out --project-name=sbl-console`. Native CF Pages git-connect is unused (would duplicate). `npm run deploy` remains as manual escape. Audit kept at low severity so failures (e.g. expired CLOUDFLARE_API_TOKEN) get noticed.

note

Fix steps

Verify last GHA deploy run succeeded: gh run list --repo Shivam-Bhardwaj/sbl --workflow deploy.yml
If failing: check that GH secrets CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID are current.

Access policies

1 item

lowfleet

CF Access protects console.too.foo, console-api.too.foo, console-pty.too.foo — verify policy still scoped to single email.

Per CLAUDE.md: 'All behind Cloudflare Access (single email policy: curious.antimony@gmail.com).' Worth periodically re-checking the policy list — adds/removes can creep in.

note

Fix steps

CF dashboard → Zero Trust → Access → Applications.
Confirm each app's policy is still {require_email = curious.antimony@gmail.com}.

Workers & R2

1 item

lowfleet

Workers and R2 buckets not yet inventoried — scan only covered Pages + Tunnels.

Initial scan used `wrangler pages project list` and `cloudflared tunnel list`. Cloudflare also hosts Workers and R2 buckets potentially. Inventory needed to confirm what surface area exists.

note

Fix steps

sbl-secret-env CLOUDFLARE_API_TOKEN=sbl0-cloudflare-api-token -- npx wrangler r2 bucket list
sbl-secret-env … -- npx wrangler deployments list (per known worker name).
Or use CF API directly to enumerate all worker scripts in the account.

Public history hygiene

1 item

lowsbl1

14 public repos under Shivam-Bhardwaj (mostly upstream forks) — verify history clean.

Of 14 public repos, 13 are upstream-OSS forks (pytorch, qiskit, rerun, mujoco, tokenizers, keploy, newton, ros2_documentation, stretch_ai, 3dgrut, RAP, neural-robot-dynamics, Standard-Notes). The 14th is `clip` (your Rust screencast recorder). Forks inherit upstream history; `clip` should get a one-time gitleaks pass.

note

Fix steps

gitleaks detect -s ~/workspace/code/clip --no-git=false
If any hits in history (likely none — early commits): BFG/git-filter-repo + force-push + rotate credential.
For forks: low priority since the upstream history is the dominant signal; if you've never pushed local commits, nothing to leak.

External access

1 item

lowsbl1

GitHub deploy keys + Actions secrets not yet inventoried.

When SSH key rotation happens (security stage F), any GitHub deploy keys that mirror the old key need re-deployment. Same for any GHA workflows that reference the leaked PATs from bash_history.

note

Fix steps

gh ssh-key list
gh api /repos/{repo}/actions/secrets per active repo (or per public-deployed repo).
Coordinate any deploy-key rotation with the SSH key swap (stage F).

Rust-strategy follow-through

1 item

lowsbl1

rs/TOOLS/AUTOCRATE deletion pending — waits for crate.too.foo verification.

Phase E ported the AutoCrate Next.js math to apps/crate (Astro+TS). Per the too-foo-rust-strategy memory, the live Rust subdomain (autocrate.too.foo, served by rs/TOOLS/AUTOCRATE) gets removed AFTER the Astro version is verified live at crate.too.foo and DNS cuts over. CF Pages project crate-too-foo was auto-provisioned, but the user has not yet deployed or verified.

note

Fix steps

1. cd ~/workspace/code/too.foo && pnpm --filter @too-foo/crate dev --host 0.0.0.0
2. Verify the calculator works against the four AutoCrate scenario presets.
3. tools/deploy.sh crate (or push agent/<slug> → CF Pages preview → merge to main).
4. Confirm https://crate.too.foo/ is live + correct.
5. rm -rf ~/workspace/code/too.foo/rs/TOOLS/AUTOCRATE (per Rust strategy).
6. Mark this item resolved.

Failure visibility

1 item

mediumsbl1

No pager / notification path for failed Pages deployments.

CF Pages deploys can silently fail (build errors, env-var issues, wrangler.jsonc misconfig). Without a notification path, you only learn when visiting the site or reading the dashboard. The fleet has Discord wired up for openclaw-cost-report; the same channel could carry build alerts.

note

Fix steps

Add a small wrangler-deploy wrapper that captures exit code + tails the deploy log and posts failures to the Discord webhook (sbl0 vault key).
Or: use CF Pages webhooks → CF Worker → Discord post.

Recovery

1 item

lowsbl1

No documented rollback procedure for a bad deploy.

Direct-upload via `npm run deploy` overwrites the live deployment. CF Pages keeps prior deployments accessible via the dashboard, but no scripted rollback path. Worth one-lining 'how to roll back' before you need it.

note

Fix steps

Document: `wrangler pages deployment list --project-name sbl-console` → pick previous → `wrangler pages deployment rollback <id>`.
Add as a README section or to CLAUDE.md.

Auth freshness

1 item

mediumsbl4~/.config/gemini/api_key.txt

~/.config/gemini/api_key.txt last modified 86 days ago.

Gemini API keys don't auto-expire, but rotating every ~90 days is good hygiene — especially for a long-lived plaintext key on disk. This is also tied to the security finding gemini-api-key-file (same path, plaintext).

Consumers: tools fetching GEMINI_API_KEY env
scripts that read this file directly

note

Fix steps

Rotate in Google AI Studio.
Store new key as sbl0-google-gemini-api-key in vault.
Remove the on-disk file (per security finding gemini-api-key-file).
Update callers to use sbl-secret-env GEMINI_API_KEY=sbl0-google-gemini-api-key.

Agent disk usage

1 item

highsbl4~/.openclaw

~/.openclaw consuming 7.7 GB — disproportionate for agent state.

Cross-referenced with disk-hygiene category. ~/.openclaw/backups is only 68K, so the bulk is elsewhere — probably models, indexes, or accumulated session caches. Worth investigating.

note

Fix steps

du -shx ~/.openclaw/* | sort -h to find the big subtree.
Identify what's there; many openclaw subprojects keep their own state.
Decide what to prune; check for old gmail-to-memory index if it grew.

Per-project CLAUDE.md

1 item

mediumsbl1~/workspace/code/*/CLAUDE.md

Several active projects under ~/workspace/code/ still have no CLAUDE.md.

Live as of 2026-05-23: sbl, too-foo, gate-too-foo, core, fleet, e51.org, s3m2p all have project-level CLAUDE.md. The rest (krishna, mcad, clip, etc.) don't. Per the global instructions: 'Read the relevant project's CLAUDE.md (project-specific facts)' — for unrooted projects there's nothing to read.

note

Fix steps

Decide which remaining projects WARRANT a CLAUDE.md (active projects only).
Seed each with the /init skill (claude can generate a starter from the codebase).
Priority candidates: clip (Rust screencast recorder), hellorobot (work-related).

Dotfile / bin drift

1 item

lowsbl1~/bin/secret-store

~/bin/secret-store is a broken symlink (orphaned).

Single broken symlink under $HOME. Likely a stale shim from an older vault naming scheme; sbl-secret has replaced it.

note

Fix steps

ls -l ~/bin/secret-store — confirm the target.
If unused: rm ~/bin/secret-store.
If still expected by some script: grep -r secret-store ~/bin ~/.config to find callers and update.

Fleet topology accuracy

2 items

mediumsbl1~/workspace/code/sbl/src/data/audit/security{,-stages}.ts

Audit data (security.ts + security-stages.ts) lists sbl10 as a key-deploy target — sbl10 is not in the fleet.

User memory: 'sbl10 does NOT exist'. The ssh-id-ed25519 security finding lists sbl10 in its 'authenticates to' set; the security stage F SSH-key-swap step says 'ssh-copy-id new pubkey to sbl0, sbl1, sbl2, sbl3, sbl4, sbl10'. The Tailscale tailnet still has a registered (offline 32d) sbl10 node, but documentation should reflect actual fleet membership.

note

Fix steps

Remove sbl10 from the consumer list in security.ts (ssh-id-ed25519).
Remove sbl10 from the items list in security-stages.ts (stage F).
Also: deregister the offline sbl10 node from Tailscale admin (see network category).

mediumfleet

Tailscale has more registered nodes than CLAUDE.md's fleet table documents.

CLAUDE.md fleet table: sbl0..sbl4. Tailscale tailnet: sbl0..sbl4 PLUS sbl5 (linux, offline 30d), sbl5-1 (windows, offline 43d), sbl10 (linux, offline 32d), ipad (offline 236d), iphone (offline 47d), and a duplicate sbl3 entry. Either CLAUDE.md is incomplete, or stale tailnet nodes should be deregistered.

note

Fix steps

Decide policy: tailnet is the source of truth OR CLAUDE.md is.
Bring them into sync.

Vault prefix rule

1 item

highsbl0sbl0 vault

8 vault entries do not match the sbl{0,3,4,10}-* convention.

Listing: cloudflare-global-api-key, contabo-deploy-ssh-key, discord-bot-token, github-token, ssh-ed25519-private (these 5 are tracked by security finding legacy-vault-entries), PLUS git-email, git-name, SBL0_CF_EDIT (probably a leaked env-var name stored as a key).

note

Fix steps

First 5: follow security stage G procedure (migrate to sbl0-* + drop |legacy fallback).
git-email, git-name: these are git config, not secrets — either move to ~/.gitconfig (not vault) or rename to sbl0-git-email / sbl0-git-name.
SBL0_CF_EDIT: looks like the literal env var name got stored. Verify what value it holds, then either rename it correctly or delete.

Dashboard claims

1 item

mediumsbl1~/workspace/code/sbl/src/data/platform.ts (line ~491)

src/data/platform.ts asserts local token files removed — re-probe confirms files still exist.

platform.ts secretInventoryTsv label: 'OpenClaw, Kimi, GitHub, and Kaggle wrappers use sbl-secret-env; local OpenClaw/Kaggle/GitHub token files removed.' Re-probe: ~/.kaggle/access_token (38 bytes), ~/.config/gh/hosts.yml (304 bytes), ~/.openclaw/auth-profiles.json (327 bytes) — all still present. Already tracked in security as stale-dashboard-claims. Belongs here too as a convention violation: dashboard data must reflect ground truth.

note

Fix steps

Either: rotate + delete the files (per security findings) then leave the claim.
Or: rewrite the claim to be live-derived (a small script that runs at build time checking those paths).

Prevention rules

1 item

lowsbl1~/.bashrc.d/

No ~/.bashrc.d/security.sh — HISTIGNORE and trap rules from security stage H not yet installed.

~/.bashrc.d/ exists (single file: posemodel.env). Adding security.sh with HISTIGNORE and a prompt-command trap was deferred to security stage H. Logged here because it's a convention-level expectation (prevention rules should be in version-controlled drop-ins).

note

Fix steps

After security keys are rotated (stage A–G done), create ~/.bashrc.d/security.sh.
Contents per security-stages.ts stage H.
Shivam-Bhardwaj/dotfiles repo exists (private, 220d stale) — see github-dotfiles-repo-stale; consider reviving it and committing ~/.bashrc.d/ there for cross-host parity.

§ 02

Vault is healthy

Architecture preserved as you remediate the rest. 5 systems verified.

Vault is server-side on sbl0

~/bin/vault is a thin SSH shim into curious@sbl0:~/.vault/vault.sh. Plaintext never lives on sbl1; it stays in age-encrypted .env.age files on sbl0.

All wrapper scripts pull at exec time

kimi, kimi-cli, krishna-ask, openclaw-discord-gateway, and every user systemd unit call sbl-secret-env <ENV>=sbl0-<name>. No inline tokens in any service unit or wrapper.

sbl-vault-store uses age + SSH stream

New secrets are encrypted to age1y9qu5gsr8feuraym2682hr3vgqm59f85fslzlr3wppdtjr348eyq5v7nxy before they touch disk on sbl0. The plaintext only exists in RAM on sbl1 and inside the SSH tunnel.

Vault listing has zero sbl1-* entries

The 'sbl1 is the commit tier' rule is observed at the vault. All live entries are sbl0-*, sbl3-*, sbl4-*, or sbl10-*. Five legacy unprefixed entries remain.

too.foo workers correctly defer to wrangler secret put

wrangler.jsonc files annotate CF_API_TOKEN as 'set via wrangler secret put' — no inline keys. Repo .gitignore excludes .env, .env.*, .dev.vars, .wrangler/.

§ 03

Staged remediation (security)

Lowest blast-radius first. Each stage is reversible up to the moment a secret is revoked at the provider.

A
Free wins — no rotation needed
blast: none
Pure dead data and dead references. No live consumer reads any of these.
- Delete sbl2-backup-wipe-prep-20260404/ (82 GB).
- Delete the two smaller sbl2-migration backups.
- Delete the 5 `sbl1-*` rows from registry/secrets.tsv.
- Fix or delete ~/bin/openclaw-discord-status.
B
Low-risk API key rotations
blast: single tool
Verify the rotation methodology on credentials with one consumer before touching anything cross-cutting.
- Kaggle KGAT_* — only the dashboard wrapper reads it.
- Gemini AIza* — confirm consumers first; replace api_key.txt content via vault-fetched env.
- HuggingFace hf_* if any are live (most appearances were in old backups).
C
Medium-risk: Cloudflare + Google
blast: deploy scripts; rclone copy
Cloudflared tunnel rotation is online; gcloud/rclone re-auth is one-shot.
- sbl-secret put new sbl0-cloudflare-api-token, then revoke old in CF dashboard.
- cloudflared tunnel rotate <name> — daemon picks up new secret automatically.
- gcloud auth revoke --all && gcloud auth login.
- rclone config reconnect gdrive:.
D
GitHub PATs
blast: git push/pull until refresh completes
Most painful daily-use disruption; do it deliberately, not at end of day.
- Revoke leaked PATs visible in bash_history (newest first).
- gh auth refresh -h github.com.
- Verify: gh auth status; gh repo list -L 1.
E
Anthropic / Codex / OpenClaw OAuth
blast: Krishna offline for the openclaw step
Each manager owns its own file; re-auth swaps the token without disrupting other agents.
- claude /logout && claude /login (do this last in the session).
- codex logout && codex login.
- systemctl --user stop openclaw-gateway krishna-proxy → openclaw auth login → restart.
- Re-bootstrap antigravity.
F
SSH key replacement
blast: one wrong step = locked out of sbl0
The vault depends on SSH to sbl0. Deploy new key first, test on every host, then swap.
- ssh-keygen new key with passphrase under ~/.ssh/id_ed25519.new.
- ssh-copy-id new pubkey to sbl0, sbl1, sbl2, sbl3, sbl4, vast hosts.
- Verify new key works against every host.
- Atomic swap locally; ssh-add to gcr-ssh-agent.
- Remove old pubkey from every host's authorized_keys.
- sbl-secret put sbl0-curious-ssh-ed25519-private and shred old key.
G
Vault hygiene
blast: wrapper falls back to legacy until updated
Migrate the 5 unprefixed entries to sbl0-*. Wrappers already have fallback chains, so this is safe iff scripts are updated first.
- Migrate github-token → sbl0-github-pat; update repo_policy.py.
- discord-bot-token already has sbl0-openclaw-discord-token — drop the |fallback in openclaw-discord-gateway.
- Same pattern for cloudflare-global-api-key and contabo-deploy-ssh-key.
- After confirmation: sbl-secret delete <legacy> for each.
H
Prevention
blast: none
Stop the next leak before it happens.
- Add HISTIGNORE='*ghp_*:*sk-ant-*:*AIza*:*hf_*:*gsk_*:*xai-*:*KGAT_*' to .bashrc.
- Add a DEBUG/PROMPT_COMMAND trap that refuses `export *_TOKEN=…` and `gh auth login --with-token <literal>`.
- Audit every project .gitignore for .env, .dev.vars, credentials.json.
- Optional: install gitleaks or pre-commit secret-detection.

Operating principle

The audit is the to-do list, not the work. Revoke at provider, re-auth via the manager, verify, delete prior copies. The vault is the backup of the rotated value, not the live source.