Remote Senior/ Staff SRE Engineer

at Stellar Cyber

Posted 3 hours ago 0 applied

Description:

  • Join Stellar Cyber, a fast-growing global leader in cybersecurity trusted by major enterprises and government agencies.
  • Nearly 30% of the world’s top MSSPs rely on our platform, with a growing number recognizing the value of next-generation security solutions.
  • The company is at the forefront of protecting organizations against sophisticated cyber threats using cutting-edge AI and automation technologies.
  • The culture is built on diversity, openness, and collaboration, fostering creativity and innovation that drives real market impact.
  • The role involves driving reliability, scalability, and efficiency across production systems as a Senior/Staff Site Reliability Engineer (SRE).
  • The ideal candidate will have deep expertise in cloud infrastructure, Kubernetes administration, observability, and incident management.
  • Responsibilities include operating complex distributed systems and influencing architecture, tooling, and best practices for operational excellence.

Requirements:

  • A minimum of 5 years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles is required.
  • Proven success in leading large-scale production systems in cloud environments such as AWS, GCP, Azure, or OCI is essential.
  • Demonstrated leadership in driving incident response, on-call best practices, and fostering a reliability-focused culture is necessary.
  • Strong experience with production on-call operations and incident management is required.
  • Advanced proficiency in Kubernetes administration and troubleshooting is a must.
  • Hands-on experience with observability tools such as Prometheus, Grafana, Loki, and Alertmanager is needed.
  • Knowledge of chat-based operations interfaces and/or auto-remediation controllers using AI agentic frameworks is preferred.
  • Understanding of AI agents for auto-triaging alerts, correlating signals, and suggesting/root-cause hypotheses is beneficial.
  • Expertise in operating data platforms like Elasticsearch, MongoDB, Spark, Kafka, and Redis is required.
  • Proficiency with public cloud services (AWS, Azure, GCP, or OCI) is necessary.
  • Strong programming and automation skills in Python and Bash are essential.
  • A deep understanding of Infrastructure as Code (Terraform, Helm) is required.
  • Experience with CI/CD pipelines (GitHub Actions, Bitbucket, ArgoCD) is necessary.
  • A strong technical background in distributed systems, databases, networking, and Linux administration is required.
  • Excellent problem-solving, communication, and leadership abilities are essential.
  • A Bachelor's degree in Computer Science, Engineering, or a related technical field is required.
  • Certifications in AWS, GCP, Observability, Linux, or Kubernetes are a plus.

Benefits:

  • The position offers the opportunity to work with a fast-growing leader in cybersecurity.
  • Employees are part of a diverse and collaborative culture that fosters creativity and innovation.
  • The role provides the chance to influence architecture and best practices in a cutting-edge technology environment.
  • Employees can expect to work with advanced technologies and tools in the field of cybersecurity.
  • The company supports professional growth and development through various opportunities and resources.

Get realtime job alerts

Be the first to know about new jobs