M
mvchen's photo
Michael Chen
From United States 03:47 PM (GMT-07:00)
$150,000/yr

Active over a week ago


Member since Jan 2026

Share this profile:

Senior AI/ML Engineer

Artificial Intelligence Engineer
Available for hire
Years of experience
7+ years
Experience level
Senior
Available for
Full-time, Part-time, Contract
Download Resume / CV

Senior Al Engineer with 7+ years of experience building production-grade ML and LLM systems across healthcare, enterprise, and high-growth startups. Deep expertise in Agentic Al, RAG architecture, and LLM reliability — including hybrid retrieval, multi-agent orchestration, provider fallbacks, evaluation-first releases, and hallucination mitigation. Proven track record of delivering high-traffic, regulated systems on AWS and Kubernetes, with strong discipline in MLOps, CI/CD, observability, and safe rollout. Known for designing Al systems that hold up under real-world conditions: latency spikes, partial failures, audits, and continuous iteration.

Skills

No skills.

Employment History

Senior AI Engineer at Abridge Current 2024 - Now
• Architected Python-based multi-agent workflows using LangGraph and LangChain with explicit state, retries, and checkpointing for agentic clinical automation. • Built Agentic RAG pipelines with Hybrid retrieval (dense + sparse) and multi-index routing to improve grounding and reduce hallucinations in clinical outputs. • Deployed high-traffic LLM services on AWS EKS behind ALB and WAF, with autoscaling and graceful degradation paths for upstream model latency spikes. • Implemented provider fallback strategies and structured tool-calling error handling to prevent cascading failures in multi-step agent executions. • Managed infrastructure-as-code with Terraform for AWS EKS services, enabling repeatable environment setup and controlled configuration changes. • Versioned prompts, tools, and policies in GitHub Actions CI/CD so releases are gated by automated evaluation suites and scenario-based tests. • Instrumented end-to-end observability with Prometheus and Grafana dashboards for agent step latency, tool error rates, and retrieval quality metrics. • Added Distributed tracing across agent steps and downstream services to accelerate root-cause analysis during incidents and regressions. • Implemented Safety filters and safe-refusal behaviors to enforce traceable, audit-friendly outputs in a regulated healthcare setting. • Built a small Rust CLI/tool to validate and normalize RAG artifacts (chunking/metadata) used in Agentic RAG pipelines, improving reliability and reducing runtime parsing failures.
Senior Machine Learning Engineer at Deloitte 2022 - 2024
• Built Python + FastAPI enterprise RAG services using LangChain for retrieval/tool orchestration and LangGraph for stateful agent flows. • Deployed production APIs on AWS EKS with autoscaling and multi-AZ patterns, integrating Amazon Bedrock model endpoints where applicable. • Integrated Amazon SageMaker for managed ML workflows (training/hosting) alongside LLM components to support hybrid ML + GenAI solutions. • Implemented Redis caching for embeddings, retrieval results, and session state to improve latency and reduce repeated LLM calls under load. • Provisioned infrastructure with Terraform and delivered progressive rollouts via ArgoCD (GitOps) across client environments. • Established GitHub Actions pipelines for build/test/deploy, including canary releases and automated rollback triggers tied to quality + health checks. • Operationalized evaluation-first release gates with RAGAS scoring (faithfulness/relevance) to detect hallucinations and regressions before rollout. • Instrumented services with Prometheus metrics and Grafana dashboards for P95 latency, error rates, dependency health, and cost signals. • Implemented circuit breakers, timeouts, and provider/model fallback tiers to sustain reliability during traffic spikes and upstream outages. • Built a lightweight Go utility/microservice to support document ingestion/processing for RAG pipelines, focusing on throughput and operational reliability.
Machine Learning Engineer at Cohere 2020 - 2022
• Developed Python Transformers models in PyTorch for NLP classification and retrieval workloads serving enterprise use cases. • Applied Model distillation to create smaller student models that preserved quality while improving serving throughput. • Implemented Quantization paths for inference to reduce latency and compute cost in production deployments. • Built Semantic search pipelines combining learned embeddings with Hybrid retrieval (BM25 + vector) to improve relevance and robustness on diverse queries. • Owned a high-traffic Inference API with rate limiting, request prioritization, and reliability patterns for bursty workloads. • Ran training and evaluation workflows on AWS, including cost-aware scaling strategies for large experiments. • Created Regression testing harnesses spanning offline quality metrics and online latency/throughput checks to prevent risky model rollouts. • Partnered with product and platform teams to ship retrieval/model updates with staged deployments and rollback readiness.
AI Engineer at Palantir Technologies 2018 - 2020
• Built Python + PySpark pipelines for Forecasting and Anomaly detection over large operational datasets. • Productionized ML services on Kubernetes with autoscaling and fallback models to handle data delays and resource constraints. • Implemented ingestion and feature pipelines using AWS S3, Kinesis, and Lambda with schema validation and quarantine handling. • Monitored system health and model behavior via CloudWatch dashboards/alarms and defined SLIs/SLOs for reliability. • Implemented Drift detection on key feature distributions with triggers for retraining and refresh workflows. • Shipped CI/CD for training and inference containers, including unit tests for feature transforms and automated rollback on metric regression. • Optimized PySpark feature engineering jobs (partitioning, caching, join strategy) to improve throughput for batch scoring.

Education

Master of Computer Science at Stanford University 2016 - 2018