← back

Founding AI Engineer

TunableLabs, LLC · San Francisco, USA (Remote)

Nov 2024 – Apr 2025

TunableLabs is an early-stage San Francisco startup building AI tooling for legal professionals. As the founding AI engineer, I designed and built the full-stack Legal-AI platform from the ground up — replacing a Gradio prototype with a production-grade system serving 200+ legal professionals across multiple law firms.

Platform Architecture

  • Built the entire backend in FastAPI — async-first, with Pydantic v2 for strict schema validation across all API boundaries.
  • Frontend in Next.js (App Router) with real-time streaming UI — SSE for LLM token streams, WebSocket for live agent status updates.
  • Supabase (PostgreSQL) as the primary database — multi-tenant schema with row-level security for chat sessions, document collections, and user workspaces.
  • Weaviate as the vector store — class-per-tenant schema design for document retrieval with hybrid (BM25 + vector) search.

LangGraph Entity Extraction Agent

  • Designed a multi-node LangGraph agent for entity extraction from legal documents — contracts, briefs, filings, depositions.
  • Agent nodes: document chunking → per-chunk extraction → schema validation → deduplication → structured output → human review queue.
  • Used Pydantic models as output schemas for each extraction pass — LLM outputs are coerced and validated before storage.
  • Achieved 92% extraction accuracy across 5,000+ documents (measured against manually annotated ground truth).
  • Built retry logic and fallback LLM routing — if primary model fails schema validation after 2 retries, routes to a more capable (slower) model.

Multi-LLM Orchestration

  • Implemented a model routing layer that selects LLMs based on task type, document size, and cost budget — fast/cheap models for classification, capable models for extraction and generation.
  • LLMs used: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro — each with strengths on different legal document types.
  • Streaming responses piped from LLM → FastAPI → SSE → Next.js UI with token-level latency under 150ms.
  • Integrated Tavily and Exa APIs for live legal web search — agent can pull case law and statutes in real-time during analysis.

From Prototype to Production

  • Inherited a Gradio monolith that embedded all business logic in UI callbacks — completely unusable for API consumers or multi-tenant deployments.
  • Extracted all logic into a layered FastAPI architecture: routers → services → repositories → database. Zero coupling between layers.
  • New architecture enabled three law firm integrations in the last month of the engagement — each connecting via API keys with full tenant isolation.
  • Deployed on Railway with GitHub Actions CI — zero-downtime deploys with database migrations gated behind feature flags.

Scale & Reliability

  • System served 200+ legal professionals in production with 99.9% uptime over the engagement period.
  • Handled 100+ concurrent users during peak filing seasons — async FastAPI + connection pooling kept p95 response times under 400ms.
  • Document ingestion pipeline processed 5,000+ legal documents — chunked, embedded, and indexed without a single data loss incident.
← back to experience