The company is a full spectrum cloud integrator that helps hundreds of companies realize the value, efficiency, and productivity of the cloud.
They are looking for a seasoned Machine Learning Operations (MLOps) Engineer to build and optimize machine learning platforms.
This role requires deep expertise in machine learning engineering and infrastructure, focusing on developing scalable inference systems.
Proven experience in building and deploying ML platforms in production environments is essential.
The position is remote and requires excellent communication skills and the ability to independently tackle complex challenges with innovative solutions.
Responsibilities include building and optimizing ML platforms, collaborating with cross-functional teams, developing CI/CD workflows for ML models, automating model training and deployment, monitoring and maintaining ML models in production, ensuring reproducibility and traceability of experiments, managing model versioning, optimizing model inference infrastructure, implementing data and model governance policies, and staying current with evolving GCP MLOps practices.
Requirements:
A Bachelor's degree in computer science, Information Technology, or a related field is required.
Candidates must have 3+ years of relevant industry experience.
A proven track record in designing and implementing cost-effective, scalable machine learning inference systems is necessary.
Hands-on experience with leading deep learning frameworks such as TensorFlow, PyTorch, Hugging Face, and LangChain is required.
Proven experience in implementing MLOps solutions on Google Cloud Platform (GCP) using services such as Vertex AI, Cloud Storage, BigQuery, Cloud Functions, and Dataflow is essential.
A solid understanding of machine learning algorithms, natural language processing (NLP), and statistical modeling is needed.
Candidates should have a solid understanding of core computer science concepts, including algorithms, distributed systems, data structures, and database management.
Strong problem-solving skills and the ability to tackle complex challenges using critical thinking are required.
Effective communication skills in remote work environments are necessary, along with the ability to collaborate with team members and stakeholders.
Expertise in public cloud platforms, particularly Google Cloud Platform (GCP) and Vertex AI, is essential.
Proven experience in building and scaling agentic AI systems in production environments is required.
An in-depth understanding of large language model (LLM) architectures, parameter scaling, optimization strategies, and deployment trade-offs is necessary.
The position is remote in Egypt.
Benefits:
The company is committed to the professional and personal growth of its employees.
Employees will have the opportunity to work with cutting-edge technology and solve complex business problems.
The role offers a chance to collaborate with cross-functional teams and contribute to impactful technology solutions.
The remote work environment provides flexibility and the ability to work from anywhere in Egypt.