The Site Reliability Engineer position is a remote role based in the United States, posted by Jobgether on behalf of McAfee.
The engineer will maintain high service levels, including availability, latency, and reliability to meet customer needs while reducing friction in managing changes.
Responsibilities include collaborating closely with DevOps, Engineering, and support teams to ensure services are scalable, secure, and performant.
The role involves monitoring critical production environments, troubleshooting incidents, automating processes, and continuously improving service reliability.
The engineer will support mission-critical applications with a focus on observability, incident response, and seamless integration with IT service operations.
Key accountabilities include proactively monitoring production environments, troubleshooting and escalating problems, managing the incident lifecycle, and collaborating with teams to maintain service reliability.
The engineer will automate processes to reduce Mean Time to Detect (MTTD) and Mean Time to Restore (MTTR), maintain security event responsiveness, and participate early in the software development lifecycle.
Documentation of processes and regular updates to operational knowledge bases are required.
Effective communication with stakeholders and leadership regarding high-priority incidents and service status is essential.
Requirements:
Candidates should have 1 to 3+ years of experience in software development, SRE, DevOps, or systems engineering roles.
A proven track record managing large-scale, highly available production systems with a SLA of greater than 99.95%, preferably in cloud environments, is required.
Strong troubleshooting, debugging, and root cause analysis skills are necessary.
Experience with monitoring, logging, and application performance management tools such as Grafana, CloudWatch, or similar is expected.
Familiarity with CI/CD tools like Git, Jenkins, or Harness is required.
Hands-on experience with container technologies, including Kubernetes and Docker, is essential.
Candidates should be comfortable working with both Windows and Linux operating systems.
A solid understanding of AWS cloud services, including serverless and containerized workloads, is necessary.
Excellent communication skills and the ability to collaborate across teams and time zones are required.
Preferred certifications include ITIL, HDI, AWS, or other cloud-related credentials.
Willingness to work some non-standard hours to support global teams is expected.
Benefits:
The position offers competitive compensation and a bonus program.
Comprehensive medical, dental, and vision coverage is provided.
Paid time off and paid parental leave are included in the benefits package.
Pension and retirement plans are available for employees.
Flexible work hours and support for community involvement are offered.
The company promotes an inclusive work environment that embraces diversity and encourages authenticity.