§ 01 / Now / Case Study / Building · Live

LLMNarrative.

A platform that evaluates how language models reference your brand, product, and industry context across multiple LLMs — surfacing strengths and risks in generative model outputs, weekly.

Status

Live · day 47

Role

Architect & sole engineer

Stack

Python · Next · Postgres

Started

Q1 · 2026

§ Problem

Brands are invisible to the systems answering their customers.

When a buyer asks ChatGPT "what's the best CRM for a 50-person SaaS," the answer shapes the next purchase. Today, brands have no visibility into how often they show up, in what context, or against which competitors — across GPT, Claude, Gemini, Llama, and Mistral.

SEO solved this for search. No one has solved it for generative AI. That's the gap.

§ System

Five-stage pipeline, run on a weekly cadence.

Each tenant configures a brand profile and target query set. The pipeline fans queries across LLM providers, normalizes outputs, classifies mention sentiment, scores share-of-voice against competitors, and writes deltas to a tenant warehouse for dashboards and weekly digests.

fig.A · pipeline architecture

LLM providers

~40k

prompts / tenant / week

eval cadence

§ Decisions

Three calls that shaped the build.

[ 01 ]

Decision

Cache LLM outputs aggressively.

Considered

Live querying every week for freshness vs. content-hash caching with weekly invalidation.

Picked · why

Caching. Cuts cost ~70%; weekly cadence makes invalidation deterministic and auditable.

[ 02 ]

Decision

Score with a smaller, fine-tuned classifier.

Considered

GPT-4o-as-judge vs. a fine-tuned distilled model on labeled mentions.

Picked · why

Distilled classifier. 8× cheaper, deterministic, and removes the "judge sees self" bias.

[ 03 ]

Decision

Per-tenant warehouse, not shared OLAP.

Considered

Single ClickHouse cluster with row-level tenant ID vs. per-tenant Postgres schemas.

Picked · why

Per-tenant. Simpler isolation contract; enterprise buyers ask "where is my data" and the answer is one schema.

§ Outcome

Shipping into design partners now.

Three brands are running weekly evaluations. The first surprise: brands with strong SEO are sometimes weakest in LLM mention rate — the engines aren't crawling their press, they're inferring from documentation and Reddit. That mismatch is the wedge.

Next: open the platform, ship the API, and add longitudinal sentiment trends.

LLMNarrative.

Brands are invisible to the systems answering their customers.

Five-stage pipeline, run on a weekly cadence.

Three calls that shaped the build.

Cache LLM outputs aggressively.

Score with a smaller, fine-tuned classifier.

Per-tenant warehouse, not shared OLAP.

Shipping into design partners now.

Want a deeper walkthrough — or to build something like this?