Juan David Suárez Sandoval — AI Engineer Portfolio

01 ◯ Planned

Conversational E-commerce Assistant

Hybrid retrieval · Reranking · Multi-turn cart agent

Customers search a 50K-product catalog in natural language, manage carts, request refunds. The system decides when to escalate to a human.

Use case: Rappi, Mercado Libre, Walmart, Instacart, Amazon

QdrantpgvectorChromaBM25 + RRFCohere RerankClaude Sonnet 4.5LangGraphRAGASStreamlit

Instacart Market Basket (Kaggle, 3.4M orders)

→ Live demo → Repo on GitHub → Full spec

02 ◯ Planned

Customer Support Triage Agent

DistilBERT fine-tuned with LoRA · Similar-ticket retrieval · Confidence-gated auto-resolve

Tickets arrive via email/Slack/chat. The system classifies intent and priority, retrieves similar resolved tickets, drafts a solution, and decides: auto-resolve, suggest, or escalate.

Use case: Intercom, Zendesk, Freshdesk, HubSpot

DistilBERT + PEFTHuggingFace HubQdrantClaude Sonnet 4.5LangGraphNext.js + shadcnLangSmith

Bitext (27K intents) + Twitter Customer Support (3M)

→ Live demo → Repo on GitHub → Full spec

03 ◯ Planned

B2B Sales Intelligence Agent

Planner-executor-reflector agent loop · Web search · Personalized outreach

Receives a list of target companies, researches each on public web and news, builds a structured profile, generates personalized cold outreach. Measures lift over template-only and single-pass baselines.

Use case: Apollo.io, Clay.com, Outreach.io

Claude Sonnet 4.5TavilyHackerNews APIselectolaxPydantic v2LangGraphNext.js

YC Companies (~5K with metadata)

→ Live demo → Repo on GitHub → Full spec

04 ◯ Planned

Document Intelligence Pipeline

Layout analysis · OCR · Claude Vision fallback · Per-field confidence

Extracts structured data from complex PDFs: contracts, financial reports, medical forms, scanned forms with tables and multi-column layouts. Auto-approves high-confidence; routes low-confidence to human review.

Use case: Hyperscience, Rossum, Klarity

unstructured 0.16Tesseract / PaddleOCRClaude VisionCamelotTable TransformerPydantic v2Next.js + PDF viewer

FUNSD (199 forms) + DocVQA (12.7K docs) + PubLayNet (360K pages)

→ Live demo → Repo on GitHub → Full spec

05 ◯ Planned

Computer Use Agent

Anthropic Computer Use API · Virtualized Ubuntu VM · Action-verification loop

Operates a virtualized desktop by reading screenshots and emitting clicks/keystrokes. Automates back-office workflows in legacy systems that don't expose APIs.

Use case: RPA for banking/insurance, legacy-system extraction

Claude Sonnet 4.5 (computer_use tool)Ubuntu 22.04 + XvfbxdotoolVNCLangGraphNext.js + VNC viewer

Custom eval (20 tasks) + OSWorld + WebArena benchmarks

→ Live demo → Repo on GitHub → Full spec

06 ◯ Planned

Code Review Agent

tree-sitter AST · Multi-aspect parallel analyzers · GitHub Action

Reviews pull requests inline: detects bugs, flags security patterns, identifies missing tests, suggests performance improvements. Filters by severity to avoid drowning the developer.

Use case: Cursor, Codium, Sourcegraph Cody, Codacy

Claude Sonnet 4.5tree-sitterruff + mypysemgrepPyGithubLangGraphNext.js + diff viewer

SWE-bench Lite (300 issues) + CodeReviewer (642K diff/review pairs)

→ Live demo → Repo on GitHub → Full spec

07 ◯ Planned

AI Safety & Red Teaming Framework

OWASP LLM Top 10 coverage · Adversarial attack suite · Guardrails layer

Evaluates other LLM-based systems for vulnerabilities: prompt injection, jailbreaks, PII leakage, hallucinations. Implements guardrails and produces security audit reports.

Use case: Robust Intelligence, Lakera, Protect AI, HiddenLayer

guardrails-ai / NeMoPresidio (PII)garak (NVIDIA)giskardprotectai/debertaNext.js dashboard

HarmBench (400) + JailbreakBench (100) + ToxicChat (10K)

→ Live demo → Repo on GitHub → Full spec

08 ◯ Planned

GraphRAG over SEC EDGAR

Knowledge graph from 10-K filings · Cypher traversal · Hybrid graph + vector retrieval

Answers complex multi-hop questions over S&P 500 ecosystem: who supplies whom, who sits on competing boards, which companies share regulatory exposure. Microsoft GraphRAG technique applied to public financials.

Use case: Visible Alpha, Tegus, AlphaSense, M&A advisory

Neo4j 5 + APOC + GDSneo4j-graphragClaude Sonnet 4.5Voyage AIPydantic v2Next.js + react-force-graph-2d

SEC EDGAR 10-K filings (S&P 500, last 5 years, ~10K docs)

→ Live demo → Repo on GitHub → Full spec

09 ◯ Planned

Voice AI Conversational Agent

Whisper STT · Claude reasoning · ElevenLabs TTS · Sub-second turn latency

Telephony customer service: caller talks, system transcribes, reasons, retrieves from KB, synthesizes natural voice response. Target: under 800ms end-to-end per turn to feel conversational.

Use case: Bland AI, Vapi, Retell, Hume — booking/support/commerce by voice

Whisper-large-v3Claude Sonnet 4.5ElevenLabs / XTTS-v2LiveKit / Twiliosilero-vadLangGraphNext.js + WebRTC

Mozilla Common Voice + LibriSpeech + Spoken-SQuAD + MultiWOZ 2.4

→ Live demo → Repo on GitHub → Full spec

I build AI systems that survive Monday morning in production.

What separates production AI from a demo

Reliability under failure beats flashy demos.

The right small model beats the big one every time.

Retrieval quality beats vector similarity.

Eval loops beat vibes.

Six axes of production AI

Capability map · self-assessed

Engineered for the boring parts of AI.

Reliability under failure

Fine-tuning when it pays

Advanced retrieval

Agent loops & planning

LLM safety & red-teaming

Production observability

Nine projects covering AI Engineering 2026

Conversational E-commerce Assistant

Customer Support Triage Agent

B2B Sales Intelligence Agent

Document Intelligence Pipeline

Computer Use Agent

Code Review Agent

AI Safety & Red Teaming Framework

GraphRAG over SEC EDGAR

Voice AI Conversational Agent

What I actually reach for

AI · Agents · Orchestration

LLMs · Embeddings · Reranking

Backend · Data

Frontend · Infra

The person behind the repos

How I learn · recent papers I keep returning to

Let's build something that holds up in production.