This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
Procurement Sciences AI (PSci.AI) is seeking an experienced Site Reliability Engineer (SRE) to ensure the reliability, performance, and scalability of their systems.
The role involves performing root cause analysis, designing and implementing automated testing, and monitoring key service level indicators (SLIs).
The SRE will ensure adherence to service level agreements (SLAs) and service level objectives (SLOs).
Responsibilities include collaborating with development and operations teams, developing monitoring and alerting systems, and implementing best practices for incident management and disaster recovery.
The successful candidate will have a strong background in Kubernetes, Helm, observability platforms, and cloud providers such as Azure.
Requirements:
Proficiency in Kubernetes, Helm, and troubleshooting in secure environments with limited or no remote access is required.
Expertise in observability and monitoring tools such as Prometheus, Grafana, ELK Stack, or Datadog is necessary.
Experience with cloud providers, particularly Azure and Azure Gov, is essential.
A strong understanding of microservices architecture, including Postgres and AI systems, is required.
Candidates must have expertise in automated testing frameworks and tools, including integrated tests, synthetic tests, and load testing.
Experience with monitoring and analytics tools to track SLIs, SLAs, and SLOs is needed.
Excellent problem-solving skills, attention to detail, and a tenacious attitude are essential.
Strong communication skills and the ability to work effectively in a collaborative environment are required.
Proficiency in programming languages such as TypeScript and Python is necessary.
Strong scripting skills in Bash, PowerShell, or similar languages are required.
Experience with Infrastructure as Code (IaC) tools like Azure Bicep, AWS CDK, or Terraform is essential.
Understanding of networking principles and experience with network troubleshooting is required.
Strong communication and collaboration skills are necessary to work effectively with both technical and non-technical personnel.
Benefits:
The position offers the opportunity to work in a fast-paced, dynamic environment at the forefront of generative artificial intelligence.
Employees will be part of a team dedicated to revolutionizing government contracting with disruptive AI capabilities.
The role provides a chance to collaborate with a venture-backed company supported by a top global technology leading venture capital firm.
Employees will have the opportunity to enhance their skills in a variety of cutting-edge technologies and methodologies.
The company promotes a culture of continuous improvement and innovation, allowing for personal and professional growth.