Remote Site Reliability Engineer (SRE)

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • Procurement Sciences AI (PSci.AI) is seeking an experienced Site Reliability Engineer (SRE) to ensure the reliability, performance, and scalability of their systems.
  • The role involves performing root cause analysis, designing and implementing automated testing, and monitoring key service level indicators (SLIs).
  • The SRE will ensure adherence to service level agreements (SLAs) and service level objectives (SLOs).
  • Responsibilities include collaborating with development and operations teams, developing monitoring and alerting systems, and implementing best practices for incident management and disaster recovery.
  • The successful candidate will have a strong background in Kubernetes, Helm, observability platforms, and cloud providers such as Azure.

Requirements:

  • Proficiency in Kubernetes, Helm, and troubleshooting in secure environments with limited or no remote access is required.
  • Expertise in observability and monitoring tools such as Prometheus, Grafana, ELK Stack, or Datadog is necessary.
  • Experience with cloud providers, particularly Azure and Azure Gov, is essential.
  • A strong understanding of microservices architecture, including Postgres and AI systems, is required.
  • Candidates must have expertise in automated testing frameworks and tools, including integrated tests, synthetic tests, and load testing.
  • Experience with monitoring and analytics tools to track SLIs, SLAs, and SLOs is needed.
  • Excellent problem-solving skills, attention to detail, and a tenacious attitude are essential.
  • Strong communication skills and the ability to work effectively in a collaborative environment are required.
  • Proficiency in programming languages such as TypeScript and Python is necessary.
  • Strong scripting skills in Bash, PowerShell, or similar languages are required.
  • Experience with Infrastructure as Code (IaC) tools like Azure Bicep, AWS CDK, or Terraform is essential.
  • Understanding of networking principles and experience with network troubleshooting is required.
  • Strong communication and collaboration skills are necessary to work effectively with both technical and non-technical personnel.

Benefits:

  • The position offers the opportunity to work in a fast-paced, dynamic environment at the forefront of generative artificial intelligence.
  • Employees will be part of a team dedicated to revolutionizing government contracting with disruptive AI capabilities.
  • The role provides a chance to collaborate with a venture-backed company supported by a top global technology leading venture capital firm.
  • Employees will have the opportunity to enhance their skills in a variety of cutting-edge technologies and methodologies.
  • The company promotes a culture of continuous improvement and innovation, allowing for personal and professional growth.
About the job
Posted on
Job type
Salary
$ 100,000 - 120,000 USD / year
Location requirements

-

Leave a feedback