This role is for an LLM Ops Engineer focused on Serverless & CI/CD within AWS, specifically for Agentic AI systems that enhance enterprise SaaS.
The position involves engineering the infrastructure that supports conversational interfaces, dynamic UIs, and intelligent agents on AWS Serverless infrastructure.
The engineer will build and maintain CI/CD infrastructure for Agentic AI solutions using Terraform on AWS.
Responsibilities include developing, deploying, and debugging intelligent agents and their associated tools to ensure scalable and cost-effective delivery in production environments.
The role requires designing, implementing, and maintaining CI/CD pipelines for Agentic AI applications using tools like AWS CodePipeline and CodeBuild.
The engineer will collaborate with ML/NLP engineers to develop modular AI agents and create debuggable agent architectures.
Monitoring and managing cost metrics associated with agentic operations is also a key responsibility.
The position emphasizes collaboration with product, backend, and AI teams to improve agentic infrastructure design and tool orchestration workflows.
Requirements:
A minimum of 2 years of experience in DevOps, MLOps, or Cloud Infrastructure with exposure to AI/ML systems is required.
Deep expertise in AWS serverless architecture, including hands-on experience with AWS Lambda, Amazon API Gateway, and Step Functions.
Strong proficiency in Terraform for building and managing serverless AWS environments.
Experience deploying and managing CI/CD pipelines for serverless applications using AWS CodePipeline, CodeBuild, or GitHub Actions is necessary.
Hands-on experience with agent and tool development in Python, including debugging and performance tuning.
A solid understanding of IAM roles and policies, VPC configuration, and least-privilege access control is essential.
Knowledge of monitoring, alerting, and distributed tracing systems is required.
The ability to manage environment parity across development, staging, and production using automated infrastructure pipelines is necessary.
Excellent debugging, documentation, and cross-team communication skills are essential.
Benefits:
The position offers an equity participation program.
Health insurance, paid time off (PTO), and leave time are provided.
Ongoing paid professional training and certifications are available.
The role allows for fully remote work opportunities.
A strong onboarding and training program is included to support new hires.