Welcome to RemoteYeah 2.0! Find out more about the new version here.

Remote (1016) Staff Site Reliability Engineer

at Nearsure

Posted 3 days ago 2 applied

Description:

  • As a Staff Site Reliability Engineer, you will own and optimize OpenTelemetry pipelines, enabling scalable and efficient observability.
  • You will build tools that empower teams, support incident response, and drive best practices.
  • Your work ensures a reliable, secure infrastructure and actionable alerting across the organization.
  • Daily tasks include designing, implementing, and maintaining observability pipelines across logs, metrics, and traces.
  • You will optimize ingestion strategies to balance cost, performance, and usability.
  • You will build self-service automation and tooling that enables development teams to instrument and leverage observability without manual intervention.
  • You will design processes, playbooks, checklists, and automations for incident management.
  • Interaction with members from various teams will be necessary to understand their monitoring, alerting, and SLO/SLA requirements.
  • You will influence architectural decisions during initial design stages to ensure resiliency and scale.
  • You will leverage Infrastructure-as-Code (IaC) to manage monitoring tools and observability configurations.
  • You will take full ownership of client infrastructure reliability, ensuring adherence to key availability and security KPIs.

Requirements:

  • A Bachelor's Degree in Computer Science, Engineering, or a related field is required.
  • You must have 8+ years of experience working as an SRE Engineer or in a similar role focused on observability.
  • You should have 5+ years of experience working with cloud services, specifically AWS.
  • 5+ years of experience with IaC tools (Terraform) and GitOps CI/CD solutions (ArgoCD, GitHub Actions, or similar) is necessary.
  • You need 4+ years of experience with monitoring and logging OpenSource tools such as Grafana, Prometheus, Elastic/OpenSearch, Loki, and Tempo.
  • 4+ years of experience working in Kubernetes, including its core components and monitoring best practices, is required.
  • Strong scripting abilities in Python, Go, or similar languages for automating observability tasks are essential.
  • Experience in managing observability metrics such as SLI, SLOs, and distributed tracing is necessary.
  • You should have experience with automated alerting workflows and exposure to OpenTelemetry Pipelines.
  • An advanced level of English is required for effective communication with US clients.

Benefits:

  • A competitive USD salary is offered, valuing your skills and contributions.
  • The position allows for 100% remote work, with opportunities to connect with teammates at coworking spaces across LATAM.
  • Paid time off is provided according to your country’s regulations, allowing you to rest and recharge while receiving your full salary.
  • National holidays are celebrated, giving you time off to embrace important events and traditions with loved ones.
  • Sick leave is available to focus on your health without stress.
  • A refundable annual credit is provided to spend on perks that enhance your work-life balance.
  • Team-building activities such as coffee breaks, tech talks, and after-work gatherings are organized to foster community.
  • An extra day off during your birthday week is offered to celebrate with friends and family.