Cordial is seeking a motivated and talented Site Reliability Engineer to help monitor, develop, and scale the Cordial platform.
The goal is to provide clients with a delightful experience and ensure that expected jobs and background processes run without issues.
The engineer will work with DevOps and Product teams to optimize performance, squash bugs, and reveal blind spots through comprehensive monitoring.
Responsibilities include administering, monitoring, and troubleshooting application and network components in a cloud-based environment, specifically AWS.
The role involves designing, authoring, deploying, and monitoring manifests for Kubernetes clusters, helm charts/repos, and service mesh configurations.
The engineer will actively contribute to platform infrastructure design and implementation discussions.
Software engineering skills will be utilized to trace/debug code and identify root causes of production data corruption and performance issues.
The position requires providing production support for Product Development teams and participating in an on-call rotation.
The engineer will develop and deploy monitoring and alerting architecture and implement monitoring/logging solutions.
Troubleshooting complex issues in a timely manner is essential to maintain the performance and stability of the Production Application environment.
The role also includes helping to build out SLOs and documenting and monitoring SLAs.
Requirements:
Candidates must have 5+ years of experience in UNIX/Linux Systems and Network Administration, including DNS, IPsec, VPN, Load Balancing, and process tracing.
Experience with AWS, specifically EC2 and EKS, is required.
Candidates should have experience deploying and/or maintaining Kubernetes/EKS clusters.
Hands-on experience writing and maintaining custom Helm charts is necessary.
Experience working with one or more service meshes such as app-mesh, Istio, or Linkerd is required.
Familiarity with monitoring, logging, and alerting tools is essential.
Previous positions held as a Site Reliability Engineer (SRE) and/or in a DevOps role are necessary.
Development experience in PHP is required.
Extensive experience with Docker/containers and Kubernetes is necessary.
Experience with Hashicorp products such as Consul and Vault is required.
Candidates must be comfortable working in a globally distributed team across time zones.
Strong teamwork and communication skills are essential.
A genuine desire to learn new technologies and grow is required.
Fluency in verbal and written English is necessary.
Experience with large-scale distributed systems is required.
Proficiency in infrastructure as code (IaC) tools such as Terraform or CloudFormation is necessary.
Understanding of observability principles and tools like Prometheus, Grafana, ELK stack, and distributed tracing is required.
Familiarity with CI/CD pipelines such as Jenkins, GitLab CI, or ArgoCD is necessary.
A strong grasp of networking fundamentals and security best practices in a cloud environment is required.
Benefits:
The position offers a salary range of $140,000.00 to $180,000.00 annually, which may be adjusted based on experience and location.
In addition to the base salary, the compensation package includes equity and bonuses.
A robust benefits plan is provided, including medical, dental, vision, and life insurance.
The company offers a 401k match and flexible time off.
Additional perks include monthly wellness and cell phone stipends, childcare support, and yearly reimbursements for continued education.
Cordial emphasizes maintaining a healthy work/life balance and has a strong dedication to diversity, equity, and inclusion efforts.
The company fosters an overall respectful and open culture.