Posted on Feb 2, 18:309 viewed

Staff Software Engineer, Product

United States of America , San Francisco

On-site

Engineers can't keep up with their production environments: dozens of services, dashboards everywhere, alerts firing constantly. The information to diagnose most issues already exists, but finding it takes longer than fixing the problem. And the more AI-generated code ships, the more services get deployed by people who won't be around to debug them.

Cleric connects to your existing observability stack, autonomously investigates production incidents, and tells engineers what's wrong. We're well funded with years of runway, a small team of AI and infrastructure veterans in SF, growing quickly. Stack: Python, Go, LLMs, Kubernetes.

Some of the problems we work on:

There's no test suite in production. When the AI says "the root cause is X," how do you verify that? You can't A/B test diagnoses. Ground truth labels don't exist. We build evaluation systems that track resolution outcomes over weeks and correlate fixes with diagnoses to build statistical confidence.
When something breaks, everything looks broken. Database latency spikes, five services throw errors, CPU goes up, logs explode. When an agent sees 47 anomalies at once, it needs to figure out which one is the root cause and which are symptoms, across systems with feedback loops, hidden dependencies, and non-obvious temporal relationships.
A single investigation might need six hours of metrics across 50 services, 10GB of logs, 10,000 distributed traces, the last 30 deployments, and the relevant runbooks. LLMs have finite context windows. What's relevant isn't known until you investigate. Getting retrieval wrong means wrong conclusions or exploding costs.