Backend AI Engineer
ezAIx, Inc · Thousand Oaks, USA (Remote)
May 2025 – Present
ezAIx is a US-based enterprise AI company trusted by Fortune 500 companies and government agencies — building a suite of production-grade platforms across voice AI (VoiceCoreIQ), Teams observability (TeamsCoreIQ), UC + network monitoring (NetmonIQ), Copilot ROI (CLA), and modular AI agents (ezAgents). I joined as the sole backend engineer and owned the entire server-side architecture of VoiceCoreIQ — the flagship product handling real-time voice, 100+ language translation, omnichannel messaging, and intelligent automation at enterprise scale.
System Architecture & Real-Time Presence
- –Designed and built VoiceCoreIQ from scratch — a Django ASGI monolith split across 18 internal apps: call_manager, queue_engine, agent_portal, analytics, omnichannel, rag_service, billing, and more.
- –Built a node-based visual workflow engine for dynamic call routing, IVR, and omnichannel orchestration.
- –Used Django Channels for all real-time features with 5 distinct WebSocket consumers handling different event streams (agent sessions, live transcripts, supervisor feeds, queue updates, voicebot state).
- –Engineered a real-time Agent Presence Dashboard powered by a custom ASGI SSE (Server-Sent Events) middleware and Redis Pub/Sub, giving supervisors live visibility into agent states, active calls, and queue metrics.
Production AI Voicebot & Multilingual Intelligence
- –Architected the real-time voicebot by bridging Azure OpenAI GPT-4o Realtime API with Azure Communication Services (ACS) media streams over WebSocket.
- –Implemented real-time PCM audio resampling (16↔24 kHz), tool-calling (order lookup, CSAT collection), configurable personality profiles, and session-level token/cost tracking.
- –Integrated Azure OpenAI GPT-4o-mini for real-time sentiment analysis, post-call summarization, and intelligence report emails.
- –Built a 3-tier TTS caching strategy (in-memory LRU → Redis → Azure Speech) to cut TTS API costs by 85% and call setup latency by 70%.
RAG Pipeline & Knowledge Integration
- –Built a full document ingestion pipeline: Azure Blob Storage → Azure Document Intelligence (layout + OCR) & Firecrawl → 512-token chunk splitting → llama-text-embed-v2 embeddings → Pinecone namespaces per tenant.
- –Supported PDF, DOCX, PPTX, and plain text. Document Intelligence handled complex tables, headers, and multi-column layouts that naive text extraction would butcher.
- –Designed namespace-based multi-tenancy in Pinecone so each client's knowledge base is fully isolated — no cross-contamination on retrieval.
- –Maintained a separate index for call transcript semantic search, allowing supervisors to query historical conversations.
Event-Driven Azure Architecture & Integrations
- –Architected a highly scalable, event-driven system using Azure Event Grid to handle incoming call webhooks, call state changes, and media streams.
- –Implemented robust Call Recording Webhooks to process, encrypt, and store call recordings (AES-256 at rest, TLS 1.2/1.3 in transit) seamlessly via Azure Blob Storage.
- –Designed Human Agent Handoff with Redis-coordinated stop/lock flags and Celery tasks that cancel active media, stop DTMF recognition, and route calls to the next workflow node — preserving full conversation context.
- –Implemented Microsoft Graph API for presence webhooks, calendar scheduling, and Teams notifications.
Infrastructure, DevOps & Scale
- –Containerized the entire stack with Docker multi-stage builds (bytecode-only secure Dockerfile for IP protection) and nginx reverse proxy with WebSocket/SSE passthrough.
- –Designed multi-tenant MongoDB architecture with MongoEngine ODM, lazy per-tenant connection aliases, and complex aggregation pipelines.
- –Built analytics dashboards powering SLA thresholds, sentiment, multi-question CSAT, agent productivity, and smart disposition with PDF export via ReportLab.
- –Owned backend architecture end-to-end and participated in client calls as Product Lead to scope features and drive iterative delivery.
Company Kickoff — Dubai
- –Flew to Dubai (Dec 17–22, 2025) for the company kickoff — selected engineers and founders gathered in person for a week of strategy and planning.
- –Worked on-site with the founding team to align on product vision, roadmap priorities, and 2025 delivery strategy.
Impact & Business Outcomes
- –Reduced TTS API costs by 85% through tiered caching — from ~$0.018/call to under $0.003/call at scale.
- –Cut call setup latency by 70% — voicebot first response dropped from ~2.8s to under 0.85s.
- –Platform processes thousands of calls daily for enterprise clients.