This is a full-time role in the product organization for an expert in systems design with considerable skill and expertise in large software development in an AZURE dev environment.
The position involves designing and implementing Continuous Integration/Continuous Deployment (CI/CD) tooling using GitHub Actions / Azure DevOps, and related technologies.
Responsibilities include defining and implementing build and test pipelines for containerized architectures, infrastructure as code (IaC) for the stateful deployment of environments, Role-Based Access Control (RBAC), linting and other code quality controls, gitops and Kubernetes pipelines, and managing SaaS deployment APIs.
The individual will assist in the design, engineering, development, planning, and administration of Azure Kubernetes AKS clusters for critical business applications.
This role requires close collaboration with application, engineering, security, and operations teams to engineer and build Kubernetes and Azure PaaS & IaaS solutions within an agile and modern enterprise-grade operating model.
Requirements:
Candidates must have a strong background as a Site Reliability Engineer (SRE) supporting a 24x7 highly available production environment for a SaaS or cloud service provider.
Solid experience with Monitoring/APM/Observability tools such as Data dog, Application Insights, Prometheus, and Grafana is required.
A strong background with Azure Resources like Key Vault, Data Factory, Azure Databricks, and Storage Accounts is necessary.
Experience implementing observability plans around logs, metrics, and traces is essential.
Candidates should have experience in an agile development team developing software and implementing best practices for CI/CD.
Experience with cloud infrastructure environments, preferably Azure, and Infrastructure as code (Terraform, Bicep, ARM) is required.
Strong experience with containerization technology and/or Kubernetes is necessary.
Candidates should have experience with release automation, system administration, and configuration management.
Proficiency in programming languages such as Python and Go is required.
A strong understanding of Linux, Windows, software development, systems, networking, and cloud concepts is essential.
Strong interpersonal and teaming skills are necessary, with the ability to set and enforce processes and influence engineers who are not direct reports.
Strong analytical and programming skills are required.
Bonus points for experience with MLFlow and other MLOps pipeline technology.
Benefits:
The base salary pay range for this role is $142,500 - $198,750 USD.
In addition to the base salary, the successful candidate may be eligible to participate in a bonus plan.
Medical, dental, and vision coverage is provided.
Life insurance and disability programs are included.
Retirement savings with company match are offered.
Paid time off is available.
Flexible work arrangements, including remote work, are provided.