Decommission a host

When an endpoint is retired — a laptop returned to IT, a server pulled out of the rack, a VM destroyed — Mimir doesn’t know about it automatically. The host keeps appearing in your fleet list as offline, contributing to “hosts that have not reported in 24h” alerts and skewing posture metrics. Decommissioning is the explicit operator action that says “this endpoint is gone for good”: it revokes the agent’s certificate so it can’t reconnect, and pins the host’s status to offline in the UI.

What it does

Decommission performs three coordinated operations as one click:

Pins status to offline. The hosts row is updated to status = 'offline' immediately, regardless of the actual last_seen timestamp.
Revokes the agent certificate. The agent_enrollment row gets revoked = TRUE and revoked_at = NOW(). If the agent process happens to still be running and tries to reconnect, gRPC mTLS rejects it.
Notifies all server replicas. A Postgres pg_notify fires on the agent-revocation channel so every replica refreshes its in-memory revocation cache. Without the notify, a replica could trust the cached “valid” entry for up to 60 seconds — long enough for a decommissioned agent to slip in one last reconnect.

History is preserved: every query result, every alert, every audit event attributed to the host stays in the database. Decommissioning is not a delete; it’s a “this endpoint is retired” annotation plus a cert revoke.

Why use it

Without decommission, retired hosts pile up in the offline bucket and the fleet metric (“N of M hosts online”) drifts further from reality every quarter. Operators usually decommission as part of:

Asset offboarding. A laptop is reclaimed, wiped, and re-issued. The original cert is invalidated; the new enrollment gets a fresh row.
Forensic isolation. A host is suspected of compromise and is being pulled off the network. Revoking the cert ensures the attacker can’t use the legitimate agent channel to exfiltrate data through Mimir.
Quota cleanup. Some commercial Mimir tiers gate features on active host count. Decommissioning trims the count without losing history.

How to use it

Open the Hosts page from the left nav.
Find the host. Use the search box (hostname, IP, OS) or scroll the table.
Click the red Decommission button at the right of the row. A confirmation modal appears showing the hostname and warning that the certificate will be revoked.
Click Decommission in the modal to confirm. The row disappears from the table immediately (client-side prune) and a toast confirms ” decommissioned”.

If you decommissioned the wrong host or the endpoint is coming back online (e.g. a temporarily-recalled laptop being re-issued), an admin can re-enable it with the re-enable endpoint:

mimir-cli host re-enable <host-id>
# or, directly:
curl -X POST -H "Authorization: Bearer <admin-token>" \
  https://mimir.example.com/api/v1/hosts/<host-id>/re-enable

Re-enabling clears the revocation. The agent has to be re-enrolled separately — the original cert was revoked irrevocably; the new enrollment mints a fresh one.

Who can do it

Decommission and re-enable are admin-only. They’re wrapped in withAdminAuth, which rejects any session that doesn’t carry the admin role. Non-admin operators see the Decommission button on the Hosts page — clicking it returns a 403 from the API and a “permission denied” toast.

Troubleshooting

The decommission API call returned 200 but the agent is still reconnecting. Two possibilities:

Another server replica has a stale revocation cache. The pg_notify should have refreshed it; if you see this consistently, check that the Postgres LISTEN worker is alive in your replica logs.
The agent has multiple enrollments (rare, only happens during re-enrollment). Decommissioning the host row revokes the enrollment tied to that host_id; an older orphaned enrollment can survive. The mimir-cli enrollment list <host-id> command reveals every cert issued to the same host; decommissioning revokes only the linked one.

I want to permanently delete the host record (GDPR / employee offboarding). Decommission keeps the row for audit purposes. A hard delete is an operator-level escalation that lives outside the UI; see the operator runbook for the SQL pattern (typically DELETE FROM hosts WHERE id = $1 followed by cascading deletes on the linked tables, all wrapped in a transaction and audit-logged).

Why is decommission immediate, not soft-deleted-with-undo? The cert revocation has to be instant — a compromised endpoint should not be able to slip in one more query while a UI undo timer counts down. The re-enable endpoint is the explicit reversal path; it just isn’t a single-click affordance because reversing a cert revocation should be a deliberate act.