Decommission a host
Decommission a host
When an endpoint is retired — a laptop returned to IT, a server pulled out
of the rack, a VM destroyed — Mimir doesn’t know about it automatically.
The host keeps appearing in your fleet list as offline, contributing to
“hosts that have not reported in 24h” alerts and skewing posture metrics.
Decommissioning is the explicit operator action that says “this endpoint
is gone for good”: it revokes the agent’s certificate so it can’t reconnect,
and pins the host’s status to offline in the UI.
What it does
Decommission performs three coordinated operations as one click:
- Pins status to
offline. Thehostsrow is updated tostatus = 'offline'immediately, regardless of the actuallast_seentimestamp. - Revokes the agent certificate. The
agent_enrollmentrow getsrevoked = TRUEandrevoked_at = NOW(). If the agent process happens to still be running and tries to reconnect, gRPC mTLS rejects it. - Notifies all server replicas. A Postgres
pg_notifyfires on the agent-revocation channel so every replica refreshes its in-memory revocation cache. Without the notify, a replica could trust the cached “valid” entry for up to 60 seconds — long enough for a decommissioned agent to slip in one last reconnect.
History is preserved: every query result, every alert, every audit event attributed to the host stays in the database. Decommissioning is not a delete; it’s a “this endpoint is retired” annotation plus a cert revoke.
Why use it
Without decommission, retired hosts pile up in the offline bucket and the fleet metric (“N of M hosts online”) drifts further from reality every quarter. Operators usually decommission as part of:
- Asset offboarding. A laptop is reclaimed, wiped, and re-issued. The original cert is invalidated; the new enrollment gets a fresh row.
- Forensic isolation. A host is suspected of compromise and is being pulled off the network. Revoking the cert ensures the attacker can’t use the legitimate agent channel to exfiltrate data through Mimir.
- Quota cleanup. Some commercial Mimir tiers gate features on active host count. Decommissioning trims the count without losing history.
How to use it
- Open the Hosts page from the left nav.
- Find the host. Use the search box (hostname, IP, OS) or scroll the table.
- Click the red Decommission button at the right of the row. A confirmation modal appears showing the hostname and warning that the certificate will be revoked.
- Click Decommission in the modal to confirm. The row disappears
from the table immediately (client-side prune) and a toast confirms
”
decommissioned”.
If you decommissioned the wrong host or the endpoint is coming back online (e.g. a temporarily-recalled laptop being re-issued), an admin can re-enable it with the re-enable endpoint:
mimir-cli host re-enable <host-id># or, directly:curl -X POST -H "Authorization: Bearer <admin-token>" \ https://mimir.example.com/api/v1/hosts/<host-id>/re-enableRe-enabling clears the revocation. The agent has to be re-enrolled separately — the original cert was revoked irrevocably; the new enrollment mints a fresh one.
Who can do it
Decommission and re-enable are admin-only. They’re wrapped in
withAdminAuth, which rejects any session that doesn’t carry the
admin role. Non-admin operators see the Decommission button on the
Hosts page — clicking it returns a 403 from the API and a “permission
denied” toast.
Troubleshooting
The decommission API call returned 200 but the agent is still reconnecting. Two possibilities:
- Another server replica has a stale revocation cache. The
pg_notifyshould have refreshed it; if you see this consistently, check that the PostgresLISTENworker is alive in your replica logs. - The agent has multiple enrollments (rare, only happens during
re-enrollment). Decommissioning the host row revokes the enrollment
tied to that
host_id; an older orphaned enrollment can survive. Themimir-cli enrollment list <host-id>command reveals every cert issued to the same host; decommissioning revokes only the linked one.
I want to permanently delete the host record (GDPR / employee
offboarding). Decommission keeps the row for audit purposes. A
hard delete is an operator-level escalation that lives outside the UI;
see the operator runbook for the SQL pattern (typically DELETE FROM hosts WHERE id = $1 followed by cascading deletes on the linked
tables, all wrapped in a transaction and audit-logged).
Why is decommission immediate, not soft-deleted-with-undo? The cert revocation has to be instant — a compromised endpoint should not be able to slip in one more query while a UI undo timer counts down. The re-enable endpoint is the explicit reversal path; it just isn’t a single-click affordance because reversing a cert revocation should be a deliberate act.