We are seeking a Senior Observability Engineer with strong expertise in Grafana and Python to lead telemetry, monitoring, and automation efforts across our cloud-native infrastructure.
This role is critical in shaping our observability strategy, building real-time dashboards, and automating alerting pipelines to ensure high system availability and performance.
Key responsibilities include designing, developing, and maintaining Grafana dashboards for real-time infrastructure and application monitoring.
The engineer will build and enhance Python-based automation tools for telemetry data processing, health checks, and alerts.
Integration of observability solutions with Azure Monitor, Log Analytics, Prometheus, and OpenTelemetry is required.
The role involves defining and implementing SLIs, SLOs, and proactive alerting mechanisms.
Collaboration with SREs, DevOps, and developers to improve monitoring coverage and incident response is essential.
The engineer will contribute to infrastructure automation and CI/CD workflows using Python, Git, and DevOps tools.
Leading tool selection, observability best practices, and adoption across engineering teams is also part of the role.
Requirements:
Candidates must have 5+ years of experience in observability, DevOps, or SRE roles.
Strong hands-on experience with Grafana, including templating, alerting, and data source integration is required.
Proficiency in Python scripting for automation and data processing is necessary.
Experience with Prometheus, Azure Monitor, Log Analytics, and Kubernetes is essential.
Familiarity with distributed systems, tracing, and telemetry pipelines is required.
Exposure to tools like Loki, OpenTelemetry, ArgoCD, or Terraform is a plus.
Benefits:
Experience with CI/CD pipelines (Jenkins, Azure DevOps, GitHub Actions) is a nice to have.
Knowledge of containerized environments (Docker, Kubernetes, AKS) is beneficial.
The ability to design cost-efficient monitoring solutions and dashboards is advantageous.
The company promotes a fun, happy, and politics-free work culture built on the principles of lean and self-organisation.
Employees will work with large scale systems powering global businesses.
A competitive salary and benefits package is offered.