Founding AI Engineer
TunableLabs, LLC · San Francisco, USA (Remote)
Nov 2024 – Apr 2025
TunableLabs is an early-stage San Francisco startup building AI tooling for legal professionals. As the founding AI engineer, I designed and built the full-stack Legal-AI platform from the ground up — replacing a Gradio prototype with a production-grade system serving 200+ legal professionals across multiple law firms.
Platform Architecture
- –Built the entire backend in FastAPI — async-first, with Pydantic v2 for strict schema validation across all API boundaries.
- –Frontend in Next.js (App Router) with real-time streaming UI — SSE for LLM token streams, WebSocket for live agent status updates.
- –Supabase (PostgreSQL) as the primary database — multi-tenant schema with row-level security for chat sessions, document collections, and user workspaces.
- –Weaviate as the vector store — class-per-tenant schema design for document retrieval with hybrid (BM25 + vector) search.
LangGraph Entity Extraction Agent
- –Designed a multi-node LangGraph agent for entity extraction from legal documents — contracts, briefs, filings, depositions.
- –Agent nodes: document chunking → per-chunk extraction → schema validation → deduplication → structured output → human review queue.
- –Used Pydantic models as output schemas for each extraction pass — LLM outputs are coerced and validated before storage.
- –Achieved 92% extraction accuracy across 5,000+ documents (measured against manually annotated ground truth).
- –Built retry logic and fallback LLM routing — if primary model fails schema validation after 2 retries, routes to a more capable (slower) model.
Multi-LLM Orchestration
- –Implemented a model routing layer that selects LLMs based on task type, document size, and cost budget — fast/cheap models for classification, capable models for extraction and generation.
- –LLMs used: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro — each with strengths on different legal document types.
- –Streaming responses piped from LLM → FastAPI → SSE → Next.js UI with token-level latency under 150ms.
- –Integrated Tavily and Exa APIs for live legal web search — agent can pull case law and statutes in real-time during analysis.
From Prototype to Production
- –Inherited a Gradio monolith that embedded all business logic in UI callbacks — completely unusable for API consumers or multi-tenant deployments.
- –Extracted all logic into a layered FastAPI architecture: routers → services → repositories → database. Zero coupling between layers.
- –New architecture enabled three law firm integrations in the last month of the engagement — each connecting via API keys with full tenant isolation.
- –Deployed on Railway with GitHub Actions CI — zero-downtime deploys with database migrations gated behind feature flags.
Scale & Reliability
- –System served 200+ legal professionals in production with 99.9% uptime over the engagement period.
- –Handled 100+ concurrent users during peak filing seasons — async FastAPI + connection pooling kept p95 response times under 400ms.
- –Document ingestion pipeline processed 5,000+ legal documents — chunked, embedded, and indexed without a single data loss incident.