Articul8 AI is a leader in Generative AI innovation, providing advanced SaaS products that enhance business operations.
The company is looking for a Senior Site Reliability Engineer (SRE) to ensure the reliability, performance, and scalability of their GenAI SaaS platform.
The SRE will act as a bridge between development and operations, focusing on automation and best practices to meet service reliability objectives while supporting rapid innovation.
Key responsibilities include architecting and maintaining scalable infrastructure, designing monitoring and observability solutions, automating deployment and management of cloud-native infrastructure, and defining and improving Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
The role involves participating in on-call rotations, collaborating with development teams, leading incident response efforts, optimizing infrastructure for performance and cost-effectiveness, enforcing security best practices, and creating comprehensive documentation.
Requirements:
A Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience is required.
Candidates must have 5+ years of experience in DevOps, SRE, or similar roles.
Strong experience with cloud platforms such as AWS, GCP, or Azure is essential.
Proficiency in at least one programming or scripting language, such as Python, Go, or Bash, is required.
Hands-on experience with infrastructure as code tools like Terraform or CloudFormation is necessary.
A solid background in containerization technologies, including Docker and Kubernetes, is required.
Proven experience with monitoring and observability tools such as Prometheus, Grafana, or the ELK stack is essential.
A strong understanding of CI/CD pipelines and automation is required.
Exceptional troubleshooting and problem-solving skills, with the ability to troubleshoot complex systems, are necessary.
Benefits:
The opportunity to work at the forefront of Generative AI innovation and contribute to cutting-edge SaaS products.
A collaborative work environment that encourages continuous improvement and innovation.
The chance to shape the future of resilient software systems and drive the reliability of AI technologies.
Opportunities for professional growth and development in a rapidly evolving field.