What Makes Vertical AI Compound
We are entering a phase where every enterprise wants to put AI to work—not in general, but in context. Not to summarize or autocomplete, but to reason through regulatory change, assess operational risk, guide investment planning, etc. That is the promise of vertical AI: systems built not simply with domain data, but for domain reasoning.
The typical vertical AI startup begins with a familiar idea: gather proprietary data, fine-tune a model, and let scale drive improvement. More customers bring more data. More data strengthens the model. A stronger model produces a more capable product. The loop should compound.
In practice, the system often flattens early. The model ceases to improve in meaningful ways. Outputs begin to converge, shaped less by learning than by templates that already reflect its limits. Each new deployment adds surface area, but not depth. The system expands its reach without increasing its capacity for reasoning. Over time, what was intended to be a learning engine settles into a static interface. The product stops improving, customers notice, and eventually churn.
This is because vertical AI systems do not improve because they collect data. They improve because they are designed to learn from use. That requires feedback, structure, and trust. Not as separate components, but as interdependent systems that shape how the product evolves.
The strongest products in this space will be those whose decisions are trusted to carry weight, and whose performance improves with every interaction. That improvement comes through two loops: a learning loop, where agents refine their reasoning through structured feedback, and a trust loop, where usefulness earns permission to take on increasingly important tasks. One loop compounds technical capability. The other compounds institutional acceptance.
Not every problem requires this. Domains such as logistics routing, risk scoring, and dynamic pricing reward data access and statistical learning. Others may be won through distribution or compliance. But in domains where ambiguity matters, where judgment is at the center of the task, these loops will define whether the system compounds or simply accretes.
Much of the below is based on trying to understand the implications of the bitter lesson and the era of experience. And so are part of my continuing series ‘this investor generated superior risk adjusted returns with one simple trick’, looking to find that one paper that answers it all.
The Learning Loop
In the most serious domains such as law, medicine, or finance there is a baseline level of data required for a model to function. One needs access to precedent, records, and the structural logic of the field. Below that threshold, the system cannot perform. But beyond it, simply adding more data does not reliably lead to improved reasoning.
For a model to improve, it needs more than exposure. It needs a structured environment in which its actions carry consequence, where it can take steps, observe results, and adjust accordingly. This is what we mean by a world model.
This is not a schema, or a database, or a graph. It is a structured representation of the domain; its entities, relationships, causal structures, and potential failure modes. It is the terrain through which an agent learns to navigate. It resembles what Palantir describes as an ontology but applied beyond a customer’s data, ideally to a whole industry or market.
The goal is not to reproduce the expert’s workflow. It is to solve the underlying problem. The agent does not learn by imitation. It learns by doing. It explores the world model, tests its reasoning, encounters friction, and refines its policy accordingly.
That is what creates the learning loop:
A better world model allows for smarter actions
Smarter actions generate clearer feedback
Feedback reshapes the model
The system improves
What compounds is not the amount of data. It is the structure that allows the model to learn from use.
The Trust Loop
Even when a system is capable of learning, it rarely begins with permission to act. In high-stakes domains decision-making is gated and trust must be earned.
That process does not begin with autonomy. It begins with constrained responsibility; tasks that require reasoning, sit close to judgment, but remain insulated from operational risk. This is what makes deep research such a common starting point. Almost every major lab has led with some version of it, not because research is easy or marginal, but because it is a broadly applicable tool that allows the system to demonstrate value without requiring trust in execution. It allows the model to prove its ability to reason long before it is asked to decide.
The same pattern holds across verticals. The challenge is not to automate the core decision on day one. The challenge is to identify a narrow, high-context task, close to the center of the judgment loop but shielded from consequence, and use that as a foothold. If the system performs well, it earns the right to move upward: from synthesis to recommendation, from drafting to action.
This is how the trust loop begins to form. Performance earns confidence. Confidence opens the door to higher-value tasks. Those tasks, in turn, generate better feedback, which sharpens the model further. Over time, what begins as a research assistant becomes a decision-making partner.
Why SaaS Metrics Do Not Apply
Some problems will continue to be solved by traditional ML workflows, especially those that live inside closed loops with abundant data and clear outcomes. These domains reward prediction and pattern recognition. They do not require structured reasoning, nor do they depend on trust.
But the most high value domains operate differently: they are ambiguous; they rely on institutional context; and they demand interpretation, not just recall.
SaaS products were built for another kind of problem. Their defensibility rests not on how much better they get with use, but on how deeply they are embedded in process. They succeed by capturing workflows, structuring data, and becoming operationally central. Integration becomes inertia. And inertia becomes retention.
Vertical AI systems are different. They do not aim to become indispensable by owning the workflow. They aim to become indispensable by exercising good judgment. Over time, the system is trusted with more decisions, first at the edge, then at the core. And as that trust accumulates, the product becomes harder to remove. Not because it holds the data. Because it holds more of the customer’s reasoning.
What matters most is delegation. The measure of success is not how much of the system is used, but how much of the outcome it shapes.
This shift is visible in the architecture:
[ Data Layer ]
- Centralized infrastructure (Snowflake, Databricks)
↓
[ World Model/ontology ]
- Structured domain representation built to support learning
↓
[ Agent Layer ]
- Adaptive systems trained through interaction, not imitation
↓
[ Application Interface ]
- Where judgment is surfaced and trust is earned
The strongest vertical AI systems will not be the ones that manage the most workflows. They will be the ones whose outputs carry the most weight because they have earned the right to reason on the customer’s behalf.
A Playbook
If learning and trust are the conditions under which vertical AI compounds, then building in this space requires a different orientation, one that treats product, data, and customer relationships as part of the same architecture.
Identify modelable problems - Focus on areas with structure, variation, and consequence. Look for reasoning loops that sit upstream of important decisions, even if they do not look like traditional workflows.
Build a world model - Start from what is minimally necessary to structure the domain. Prioritize clarity over completeness. The model should support exploration, not just indexing.
Deploy at the boundary of action - Enter through reasoning tasks that matter but do not carry risk. Let the model demonstrate capability in low-friction contexts.
Climb the decision stack - Use performance to earn permission. Use permission to unlock feedback. Let each level improve the system’s ability to reason through the next.
Track what matters - Do not measure integration. Measure influence:
(a) How much judgment has been delegated
(b) How visible improvement is to the user
(c) How central the system has become to decision-making
What matters is not how often the system is used. It is how much of the outcome the system shapes.
The Two Loops
Vertical AI will compound because it learns. That learning depends on two interlocking systems.
The first is the learning loop. It relies on structure. It turns action into feedback, and feedback into progress. The second is the trust loop. It relies on performance. It turns usefulness into permission, and permission into responsibility.
Together, they determine whether the system remains a tool or becomes a decision-maker.
The strongest vertical AI companies will not be those with the most integrations
They will be those whose outputs carry weight and whose performance improves with every use.

