Remote Senior Site Reliability Engineer

Posted

Apply now
Please, let Rackspace know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • Rackspace is building its Professional Services Center of Excellence on Application Performance Monitoring Suites.
  • The role involves solving complex business problems and contributing to the development of modern applications for customers.
  • The position focuses on helping customers understand the connections between application performance, user experience, and business outcomes.
  • Responsibilities include implementing Observability solutions, building and maintaining scalable systems, and developing monitoring tools.
  • The engineer will proactively gather and analyze metric and log data for anomaly detection, performance tuning, capacity planning, and fault isolation.
  • Collaboration with development teams is essential to implement and deploy new features while ensuring reliability, security, and performance standards.
  • The role requires maintaining a deep understanding of the customer’s business and technical environment.
  • Identifying performance bottlenecks and resolving root causes of service issues is a key responsibility.

Requirements:

  • Candidates must have 3+ years of experience designing, building, and maintaining AWS EKS and Azure AKS infrastructure with Terraform.
  • A minimum of 3 years' experience with Kafka in large-scale environments handling hundreds of terabytes to petabytes of data is required.
  • At least 3+ years of experience in designing, building, and maintaining SaaS environments is necessary.
  • Candidates should have 3+ years as a Site Reliability Engineer (SRE) with solid experience in Prometheus, Grafana, Datadog, ELK, etc.
  • Experience in building and running Kubernetes clusters with expertise in scaling, operators, istio, and troubleshooting for 3 years is required.
  • A minimum of 3 years' experience with observability, including monitoring, logging, tracing, and metrics is essential.
  • Candidates must have 3 years' experience with GitOps CI/CD processes.
  • Proficiency in scripting with Python, Go (Golang), bash, and AWS CLI tools for at least 3 years is required.
  • Experience with security operations, including security policies, infrastructure, key management, and encryption setup for 3 years is necessary.
  • Candidates should have 3 years of experience implementing and maintaining disaster recovery strategies, including MySQL and Zookeeper.

Benefits:

  • Rackspace offers a collaborative work environment that values diverse perspectives and innovation.
  • The company is recognized as a best place to work by Fortune, Forbes, and Glassdoor, attracting world-class talent.
  • Employees are encouraged to bring their whole selves to work and are supported in their unique perspectives.
  • Rackspace is committed to equal employment opportunities and provides accommodations for individuals with disabilities or special needs.
Apply now
Please, let Rackspace know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Posted on
Job type
Salary
-
Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback