Remote AI NLP Ops Engineer at Expedite Commerce

Description:

The AI NLP Ops Engineer will manage, deploy, and fine-tune NLP models and large language model (LLM) agents to address business challenges, primarily using AWS Sagemaker and Bedrock technologies.
This role focuses on ensuring the smooth operation, scalability, and reliability of AI products, emphasizing automation, performance monitoring, and agent lifecycle management.
Responsibilities include conceptualizing and designing robust NLP solutions and LLM agents tailored to specific business needs, with a focus on user experience, interactivity, latency, failover, and functionality.
The engineer will write, test, and maintain clean, efficient, and scalable code for NLP models and LLM agents, with a strong emphasis on Python programming.
Performance monitoring, optimization, and maintenance of NLP solutions and LLM agents will be key tasks, including implementing model explainability and handling model drift.
The role involves developing comprehensive monitoring and logging solutions for LLM agents to track performance, errors, and usage patterns, as well as setting up alerting mechanisms for anomalies.
Proactive identification, diagnosis, and resolution of issues related to LLM models, including model inaccuracies and performance bottlenecks, will be required.
Staying updated with advancements in NLP and LLM technologies, experimenting with new techniques, and maintaining a proactive approach to learning are essential aspects of the role.

Candidates must have 1-2 years of experience in fine-tuning LLMs and deploying LLM agents, with practical experience in AWS Bedrock, OpenAI Function Calling, and other relevant platforms.
A proven track record of developing high-quality, efficient Python code is required, including experience with advanced Python features and best practices.
Experience in integrating open-source and commercial NLP models and LLM agents, as well as developing and evaluating prompt engineering techniques, is necessary.
Strong skills in developing models and agents on cloud platforms, particularly AWS, and implementing serverless architectures are essential.
Expertise in debugging and fixing issues related to LLMs, including identifying root causes of errors and optimizing system performance, is required.
Strong development experience in production implementation of LLM-based agent monitoring is necessary.
Excellent written and verbal communication skills in English are required, with the ability to present technical concepts clearly to teams and clients.
Experience using CI/CD pipelines with AWS services for automated testing and deployment of NLP solutions is essential.
Working experience in CI/CD solutions utilizing AWS services such as Code Commit, Code Build, and Code Pipeline is required.

The position offers health insurance, paid time off (PTO), and leave time.
Ongoing paid professional training and certifications are provided to enhance skills.
There is a fully remote work opportunity available for this role.
A strong onboarding and training program is in place to support new employees.