Description:

Tiger Analytics is seeking a skilled and innovative Machine Learning Engineer with hands-on experience in Google Cloud Platform (GCP) and Vertex AI to design, build, and deploy scalable ML solutions.
The role involves operationalizing machine learning models and managing the end-to-end ML lifecycle, from data ingestion to model serving and monitoring.
Key responsibilities include developing, training, and optimizing ML models using Vertex AI, designing and building scalable ML pipelines, and deploying models to production using Vertex AI endpoints.
The engineer will collaborate with data scientists, data engineers, and MLOps teams to ensure reproducible and reliable ML workflows.
Monitoring model performance and setting up alerting, retraining triggers, and drift detection mechanisms are also essential tasks.
The role requires utilizing GCP services such as BigQuery, Dataflow, Cloud Functions, Pub/Sub, and GCS in ML workflows.
CI/CD principles will be applied to ML models using Vertex AI Pipelines, Cloud Build, and GitOps practices.
The engineer will implement model governance, versioning, explainability, and security best practices within Vertex AI.
Documentation of architecture decisions, workflows, and model lifecycle for internal stakeholders is necessary.

Requirements:

Candidates must have advanced knowledge in Generative AI, including advanced RAG and multimodal agents, as well as deep knowledge of ADK and Langchain Agentic Frameworks.
Expertise in Python is required, with strong OOP and functional programming skills, and proficiency in ML/DL libraries such as TensorFlow, PyTorch, scikit-learn, pandas, NumPy, and PySpark.
Experience with production-grade code, testing, and performance optimization is essential.
Proficiency in GCP services, including Vertex AI, BigQuery, Cloud Storage, Cloud Run, Cloud Functions, Pub/Sub, Dataproc, and Dataflow, is necessary, along with an understanding of IAM and VPC.
Candidates should have experience in designing and building RESTful APIs using FastAPI or Flask, integrating ML models into APIs for real-time inference, and implementing authentication, logging, and performance optimization.
The role requires skills in designing end-to-end AI systems with scalability and fault tolerance, as well as hands-on experience in developing distributed systems, microservices, and asynchronous processing.

The position offers significant career development opportunities as the company grows.
It provides a unique chance to be part of a small, fast-growing, challenging, and entrepreneurial environment, with a high degree of individual responsibility.

Skills