Remote Contractor: Senior-Level Site Reliability Engineering Services (Brazil or Argentina)

Posted

Apply now
Please, let Newsela know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • The position is for a Contractor based out of Brazil or Argentina for Senior-Level Site Reliability Engineering Services.
  • The contractor will be on an on-call rotation to respond to incidents impacting Newsela.com availability and provide support for developers during incidents.
  • Responsibilities include maintaining and extending infrastructure using Terraform, Github Actions CI/CD, Prefect, and AWS services.
  • The contractor will build monitoring systems that alert on symptoms rather than outages using tools like Datadog, Sentry, and CloudWatch.
  • They will seek to automate repeatable manual actions to reduce toil and improve operational processes such as deployments, releases, and migrations with fault tolerance in mind.
  • The role involves designing, building, and maintaining core cloud infrastructure on AWS and GCP to support thousands of concurrent users.
  • The contractor will debug production issues across services and levels of the stack and provide infrastructure and architectural planning support as an embedded team member.
  • They will plan the growth of Newsela’s infrastructure and influence the product roadmap to enhance the resiliency and reliability of the Newsela product.
  • Proactive efficiency and capacity planning will be required to set clear requirements and reduce system resource usage.
  • The contractor will identify non-scaling parts of the system, provide immediate solutions, and drive long-term resolutions.
  • They will identify Service Level Indicators (SLIs) to align the team with availability and latency objectives and maintain awareness of stage group plans and priorities.

Requirements:

  • A minimum of 5 years of experience in site reliability is required.
  • Advanced knowledge of Terraform syntax and CI/CD configuration, pipelines, and jobs is necessary.
  • Experience managing DAG tooling and data pipelines, such as Airflow, Dagster, or Prefect, is essential.
  • Candidates must have advanced knowledge and experience in maintaining data pipeline infrastructure and large-scale data migrations.
  • Proficiency in cloud infrastructure services, specifically AWS and GCP, is required.
  • Familiarity with container orchestration technologies, including ECS, Kubernetes, and Docker, is necessary.
  • Experience with service catalog metrics and alert recording rules using tools like Datadog, NewRelic, Sentry, and Cloudwatch is required.
  • Candidates should have experience with log shipping pipelines and incident debugging visualizations.
  • Familiarity with Linux operating system configuration, package management, and BASH/CLI scripting is essential.
  • Knowledge of block and object storage configuration and debugging is required.
  • The ability to identify significant projects that improve reliability, cost savings, or revenue is necessary.
  • Candidates must be able to identify architectural changes from reliability, performance, and availability perspectives using a data-driven approach.
  • Experience leading initiatives and problem definition, design, and planning through epics and blueprints is required.
  • Deep domain knowledge and the ability to share that knowledge through documentation and presentations is essential.
  • Candidates should be able to perform blameless Root Cause Analyses (RCAs) on incidents and outages.

Benefits:

  • Please note that given the nature of the contract, this role will not be eligible to participate in company-sponsored benefits.
Apply now
Please, let Newsela know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Posted on
Job type
Salary
-
Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback