Your workers show as running. Your queue keeps growing. Nobody knows why. Kanari catches the silent failures generic monitoring misses — and tells you what to do about them.
The silent failures
Your error tracker fires when a task throws. It stays silent when the worker dies mid-task, when the scheduler stops, when the queue quietly backs up. Those are the ones that page you at 3am.
A worker keeps its process up but stops consuming after a broker reconnect. Your dashboard shows it green. The queue tells the real story.
WORKER_OFFLINE200 tasks pending — but are they 2 seconds old or 20 minutes old? Depth alone lies. Kanari tracks how long the oldest task has really waited.
QUEUE_SLA_BREACHA worker crash mid-execution with late-acks off means the task is gone — no retry, no error, no trace. You find out from the missing data.
CRITICAL configOne task running far past its expected runtime holds a worker slot hostage and starves everything behind it. Throughput collapses with no clear cause.
STUCK_TASKTasks pile up faster than workers drain them. By the time you notice, you're thousands behind with no idea when you'll catch up.
QUEUE_BACKLOGWorkers look idle while queues grow. The real cause: prefetch hoarding tasks that can't run yet, blocking faster ones behind them.
HIGH_SATURATIONHow it works
Every finding comes with the probable cause, the command to confirm it, and a safe fix. Not "queue depth is 847" — but why it happened and what to do next.
One command, no config to start. Run it against any Celery + Redis setup in about 30 seconds.
When an anomaly trips, Kanari turns raw signals into a plain-language explanation and a concrete next step.
Goes beyond depth to measure how long tasks actually wait, so SLA breaches surface before users feel them.
Catches risky settings like task_acks_late=False, weak eviction policies, and single points of failure.
Clean exit codes and JSON output. Drop it into a pipeline as a pre- or post-deploy gate.
Continuous monitoring that messages you the moment something breaks — and stays quiet when it doesn't.
Privacy-first by design
The agent runs where your workers run. It reads health signals, not contents. Task arguments, payloads, and results are never accessed — not sampled, not hashed, never touched. What reaches Kanari is sanitized metrics, and nothing else.
args and kwargsAlerts
Kanari fits your on-call setup instead of asking you to watch one more dashboard. Start with Slack and email; wire in your full escalation stack as you grow.
Slack & email come with Pro. PagerDuty, Opsgenie, webhooks and on-call escalation come with Team.
Your stack
Kanari understands Celery's queue model, worker pool, and acknowledgment semantics natively — that depth is the whole point. The same engine is expanding to the rest of the async world.
Full detection, config analysis, and diagnosis. The deepest support we offer.
Celery on RabbitMQ, plus native AMQP queue health.
Queue depth, age, and dead-letter visibility for SQS-backed workers.
Consumer lag and partition health for streaming workloads.
Plans
The agent is open source and free forever. The paid tiers add what needs a service running for you around the clock.
audit & live watchEarly access
We're opening Kanari to a small first group. Tell us where you are with Celery and we'll reach out as spots free up.