P02 — Customer Support Triage · Juan David Suárez Sánchez

01 · The problem

One model is either too dumb or too expensive.

Zero-shot Claude over every ticket costs $0.012 each and tops out at 0.78 macro-F1. A classifier alone is fast and cheap but can't draft a response. The right answer is both, wired together with a confidence-gated router.

Why single-model fails

You either pay too much or you can't escalate well.

Zero-shot Claude gets the intent right ~78% of the time, generates fine responses, costs $0.012/ticket, and has 1.8s p95 — that's a $12K/month bill for a 1k/hour queue.

Classifier-only hits 0.94 macro-F1 in 85ms for $0.0004/ticket, but it can't actually answer the customer. It just labels.

And neither has a sense of when to stop and let a human in. The 6% of tickets where the model is wrong cause the worst incidents.

The hybrid wins on every axis

Classifier scopes the problem. Claude solves it. Confidence routes it.

DistilBERT + LoRA classifies intent in 85ms. Claude reads the top-5 similar resolved tickets from Qdrant and drafts a response grounded in those past resolutions.

Confidence Estimator combines intent confidence, sentiment, similarity to past resolutions, and priority into a single score: > 0.85 auto · 0.6–0.85 suggest · < 0.6 escalate.

Net: same accuracy as Claude (0.94), 50% lower cost, 33% faster, and 19 extra auto-resolutions per hundred tickets.

03 · Demo 1 of 2 · Train & deploy

From git clone to a published HF Hub model.

Press play. Walks through env setup, Docker stack, LoRA fine-tune of DistilBERT (3 epochs · r=16 · 0.66% trainable params), eval on 27 intents with confusion matrix, push adapter + model card to HuggingFace Hub, ingest historical resolutions, run the head-to-head benchmark (zero-shot vs classifier-only vs hybrid), and launch the Kanban UI.

Demo 01

Train · publish · benchmark

9 steps · 75s · uv + PEFT + HF Hub + LangSmith

SPACE play / pause ←→ seek 0 reset

04 · Demo 2 of 2 · System in motion

Five tickets, three routings, one supervisor escalation.

Watch tickets flow through the Kanban: a delivery question and a refund auto-resolve, a website bug gets suggested to a human, a health-related lawsuit threat escalates, a positive discount request auto-resolves with a code. The detail panel below shows the classifier output, similar tickets retrieved, drafted response, and routing decision for the currently active ticket.

Demo 02

Kanban — 5 tickets through the full pipeline

auto-resolve × 3 · suggest × 1 · escalate × 1

SPACE play / pause ←→ seek 0 reset

05 · Stack

Pinned dependencies, public weights.

Same stack philosophy as P01 — versions are explicit, alternatives are wired and toggleable.

Stack — pinned

Training

transformers4.46.0 peft0.13.0 accelerate1.0.1 torch2.5.0 bitsandbytes0.44.1

Base models

distilbert-base-uncased roberta-base (fallback) all-mpnet-base-v2

Reasoning

Claude Sonnet 4.5 LangGraph0.2.45 Pydanticv2

Storage & serving

Qdrant1.12 Postgres16 HuggingFace Hub Next.js14 shadcn/ui

Publication checklist

HF model card

Required: intended use, training details (LoRA r/alpha/lr/epochs), data card with bias notes, eval table with macro-F1 by intent, license, citation BibTeX.

Reproducibility

Training script committed with seed=42, exact dependency lockfile, expected loss curve documented. A stranger can pull the dataset and re-run.

Limitations

Bitext is English-only and skewed to e-commerce. Spanish customers degrade to zero-shot Claude until v2 covers Spanish CS data.

License

MIT for code; CC-BY-4.0 for the adapter; Bitext upstream license preserved in attribution.

06 · Roadmap to v1.0.0

Ten checkpoints. Each one is a separate PR.

From 02-support-triage/README.md on main.

01✓Load Bitext (27K queries × 27 intents), 80/10/10 stratified split
02✓LoRA fine-tune DistilBERT, evaluate macro-F1 on test split
03✓HuggingFace Hub push script + full MODEL_CARD.md ready in models/intent-classifier-lora/ (token-gated publish)
04✓Qdrant retrieval module (src/retrieval/similar_tickets.py) + ingest CLI for resolved-tickets corpus
05✓LangGraph agent: classifier → sentiment → similar tickets → drafter → confidence scorer
06✓End-to-end eval runner in src/eval/runner.py over 200-sample test split
07✓Comparative report: zero-shot Claude vs LoRA classifier-only vs hybrid pipeline (docs/pipeline_comparison.md)
08✓Animated kanban demo (pending → drafted → suggested → resolved → escalated) in /projects/02-support-triage.html
09✓LangSmith trace hooks wired in orchestrator; sample runs documented in docs/trace_gallery.md
10✓Confusion matrix rendered to reports/confusion_matrix.png and embedded in MODEL_CARD.md

Cheap classifier where it suffices. Claude reasoning where it matters. One pipeline, three routings.