LithosAI — Agents That Learn in Prod. Any Model. No Lock-in.
Agents that learn in prod.
With Motus, agents get measurably better over time. Without it, they degrade.
Building agents has never been easier. But the real problems begin the moment they hit production at scale. Capability degrades without clear signal on what's failing or why. Costs scale faster than results when every step hits a frontier model. Latency stacks with every step the agent takes. None of it improves on its own.
Meet Motus. It helps your agents learn in prod. Motus extracts signal from every production trace: task outcomes, latency, cost. It improves the agent harness based on what's working and what's not. It orchestrates across any model, open or closed. It tailors context memory to your workload and cuts latency by parallelizing execution. When the next frontier model drops, swap it in. Your learned optimizations carry over. No lock-in. No manual tuning. Just agents that keep getting better.
Deploy straight from any coding tool you already use: Claude Code, Codex, Cursor. Run motus serve to self-host on your own infrastructure, or motus deploy for the cloud. No Dockerfiles, no Kubernetes configs, no infrastructure code. Motus agent serving is open source. You choose how it runs.
Ship your agents today. The tokens are on us during early preview. Continuous learning is live for a limited cohort of early partners. Join us on Slack and let's build together.
Install plugin
curl -fsSL https://www.lithosai.com/motus/install.sh | shBuild your agent, then serve or deploy
motus deployMotus in action.
Interactive Agents
Terminal-Bench 2.0
Terminal-Bench 2.0 evaluates agent interactions with live terminal environments in a sandbox, executing commands, managing files, and recovering from errors in real time. Getting the most out of these tasks requires jointly optimizing both the agent harness and model orchestration.
Motus learns from agent signals to continuously optimize the agent harness and orchestrate across models. Starting from Opus 4.6 at 64% accuracy, Motus first optimizes the agent harness to reach 77.5%, then further improves accuracy to 80.1% through model orchestration, cutting cost to 2.4x lower than Opus 4.6 alone.
Terminal-Bench 2.0 — bottom-right is better (higher accuracy, lower cost)
Harness: Terminus 2
Software Engineering
SWE-bench Verified
SWE-bench Verified tests end-to-end software engineering: writing patches, fixing bugs, and resolving real GitHub issues. No single model consistently wins across all task types, and static model choices leave accuracy and cost on the table.
Motus orchestrates models into a single system that outperforms any one alone. Opus 4.6 reaches 75.8% and GPT-5.3-Codex 72.6%. Motus pushes accuracy to 79%, surpassing both frontier models, at 2.3x lower cost than Opus alone.
SWE-bench Verified — bottom-right is better (higher accuracy, lower cost)
Harness: mini-swe-agent-v2
Long Context Memory
LoCoMo
Long-running agents need context memory, but every application has different needs. A coding assistant, a customer support agent, and a research workflow each demand different strategies for what to remember and what to discard. There is no one-size-fits-all solution.
Motus tailors your agent's context memory strategy to your specific workload. On LoCoMo, a long-term conversational memory benchmark, Motus reaches 81% accuracy, a 56% improvement over compaction and 45% over RAG.
LoCoMo accuracy — higher is better
Judge: GPT-5.4 mini
Agent Latency
Financial Workflow
Agent latency compounds across multi-step workflows. Sequential tool calls, redundant context, and unoptimized execution ordering turn seconds into minutes. For long-horizon agents, these inefficiencies add up fast.
Motus detects parallelizable steps and reorders execution to cut end-to-end latency. On a deep financial agent benchmark, Motus reduces latency by up to 52%.
End-to-end latency for a financial workflow — lower is better
About us.
LithosAI was founded by Dimitrios Skarlatos and Zhihao Jia, professors at Carnegie Mellon University, whose award-winning and impactful research on systems and machine learning sits at the company's core. Our team brings together CMU and Stanford researchers and engineers who have shipped production infrastructure at AWS, Google, Meta, and NVIDIA. Join us!


