A plan-execute-reflect loop over the public web. Researches a company across Tavily, HackerNews, and the live site; merges findings into a Pydantic-validated CompanyProfile; extracts the pain point + hiring signal + recent news hook; writes a personalized outreach email that's scored by Claude Opus on personalization, accuracy, and CTA clarity. Batched at 8 workers, 100 YC companies in 10 minutes.
Sales tools that send one query to an LLM with "Write me an email to Acme Corp" produce templates with the name swapped. The difference between 1.4 and 4.5 on a 5-point personalization scale is whether the recipient replies.
Without a reflection step, a single Tavily search returns 8 generic hits. The model synthesizes whatever's at the top — usually a press release from two years ago. The email reads "I saw your Series B" when the company just closed Series D.
Worse: the model never asks itself "is this enough?" So it never goes back for the hiring page, the recent HN thread, the careers post.
Planner writes a research plan listing what to look for (funding, hiring, recent product, technical posture).
Executor runs tool calls in parallel: Tavily for news, HN for technical signal, website fetch for primary sources.
Reflector reads the harvest and asks: "Do I know enough? Are there contradictions? Is the recent news from this quarter or 2023?" If gaps remain, it sends refined queries back to the Executor. Max 3 loops keeps cost bounded.
Net result: 4.4 accuracy + 4.6 personalization for $0.04 per email — beats the single-search baseline by +1.9 points of personalization.
Loop body is a LangGraph cycle. The Reflector decides whether to refine or proceed. Profile + Email are Pydantic models — invalid outputs are retried, not silently coerced.
End-to-end: env config, dependency install, YC dataset load, 8-worker parallel batch through plan-execute-reflect with up to 3 loops per company, LLM-as-judge scoring with Claude Opus, persistence to Postgres, traces pushed to LangSmith, Next.js gallery launch.
Watch Tavily / HN / website queries fire on the left, the CompanyProfile fill field-by-field in the middle, and the personalized outreach stream into the right pane. Then the LLM judge scores it. Anthropic processed first, Linear second.