Remote Senior Site Reliability Engineer (SRE III) at LivePerson

Description:

LivePerson is seeking a Senior Site Reliability Engineer (SRE III) for the GPT Division, focusing on building and managing highly available, distributed systems.
The role involves designing, implementing, and maintaining the infrastructure and systems that support LivePerson's applications and services.
Responsibilities include leading a team to ensure high uptime and reliability of products 24/7, designing and maintaining scalable infrastructure across cloud and on-prem environments, and architecting automated solutions for provisioning and monitoring.
The engineer will provide technical leadership and support to various stakeholders, including development and data teams.
Daily operations will involve hands-on work with cloud environments, Kubernetes, data and messaging platforms, and security/compliance.
The role requires developing automation scripts, maintaining CI/CD pipelines, and ensuring system reliability through monitoring and incident response.
Participation in on-call rotations for 24/7 production support and leading post-incident reviews to identify improvements are also key responsibilities.
Establishing metrics for tracking incident frequency and response effectiveness is expected.

Candidates must have a minimum of 7+ years of experience managing cloud-based production environments (AWS, GCP, etc.).
Strong verbal and written communication skills are essential for collaboration with various stakeholders.
Solid hands-on experience and understanding of system architecture is required.
Candidates should have extensive experience working in a Linux environment and be proficient in scripting with Bash and Python.
Experience with configuration management systems such as Puppet, OpsCode Chef, and Ansible is necessary.
Proficiency in Infrastructure as Code (IAC) tools like Terraform and CloudFormation is required.
Strong experience in SQL and managing SQL and NoSQL databases, including MongoDB, ElasticSearch, Postgres, and MySQL, is essential.
Candidates should have significant experience working with Kubernetes and Helm.
Experience in microservices architecture using message buses like Kafka and Pulsar is required.
Familiarity with monitoring and alerting systems such as Grafana, Prometheus, Kibana, DataDog, and PagerDuty is necessary.
Knowledge of CI/CD pipeline orchestrators like TeamCity, Jenkins, and Gitlab is required.
Candidates should be highly motivated, independent, and possess excellent interpersonal skills.
A BS in Computer Science or a related field, or equivalent work experience, is required.
A strong background in cloud, network, and application security and compliance is necessary.
Experience with GPT or other LLMs is considered a strong advantage.

The salary range for this role is between 130,000 to 154,000 CAD, with final compensation determined by various factors including location, skills, experience, and education.
Comprehensive health benefits include medical, dental, vision, and wellbeing support.
Time away from work includes 15 days of PTO, public holidays, 5 care days, and 10 sick days.
Financial benefits include an Employee Stock Purchase Plan (ESPP), basic life and AD&D insurance, and long-term and short-term disability coverage.
Family benefits include parental leave, maternity support, and fertility services.
Development opportunities include generous tuition reimbursement and access to internal professional development resources.
Additional benefits include a Health Service Navigator, counseling services, and resources to support overall health and wellness.