Remote Staff Cloud Platform Engineer- Core Infra

Posted

Apply now
Please, let Sift know you found this job on RemoteYeah. This helps us grow 🌱.

Description:

  • The Core Platform team maintains and optimizes the data, infrastructure, messaging, and services platform that powers Sift’s online systems.
  • The team ensures these systems are always available, reliable, and performing at their best to meet customer needs.
  • In the event of an outage or failure, the team follows well-practiced recovery plans to restore services swiftly.
  • Managing complex, large-scale systems requires continuous monitoring and proactive maintenance to uphold these standards.
  • Responsibilities include owning the availability, performance, and scalability of Sift’s primary online storage systems and infrastructure.
  • The role involves designing and building immutable infrastructure and fault-tolerant, multi-AZ/multi-region systems that are resilient and self-healing.
  • The engineer will design and implement multi-region deployments, such as BigTable clusters spanning multiple regions, ensuring specific customers are routed to designated regions.
  • The position requires solving complex problems arising from unique data volume and request rates, which may involve deep dives into data store and messaging internals.
  • The engineer will optimize local development and testing workflows to be fast, efficient, and seamless.
  • Responsibilities also include designing and implementing services and libraries for components to interact with data stores, messaging layers, and services platforms.
  • The role involves developing tools for monitoring, detecting faults, and automatically repairing distributed systems.
  • The engineer will provide design support to internal engineering teams for optimal usage of data stores, data growth planning, production workload optimization, messaging, caching, and service platform.
  • Participation in on-call support and incident response activities is required, providing 12/7 coverage for one calendar week approximately once every 3-4 weeks.
  • The technical stack includes GCP, AWS, Airflow, Terraform, Kubernetes, Vault, Jenkins, Kafka, Snowflake, Spark, Java 11, Python 3, Ruby 2.7, and Ruby on Rails.

Requirements:

  • Candidates must have 8+ years of experience as a Software Engineer focused on infrastructure/platform services or in a Site Reliability Engineering (SRE) role.
  • Strong programming skills in languages such as Java, Scala, or Python are required.
  • Experience designing and implementing distributed systems is necessary.
  • Candidates must have experience building and managing cloud infrastructure on AWS or GCP.
  • Expertise in building infrastructure as code and automating provisioning processes using tools like CloudFormation or Terraform is essential.
  • Proficiency in setting up and managing monitoring and alerting systems, both open-source and commercial, is required.
  • Familiarity with Docker and container orchestration technologies like Kubernetes, GKE, or AWS ECS is necessary.
  • Strong experience troubleshooting and resolving production system issues, with a focus on building automated solutions to prevent future occurrences, is required.
  • Proven expertise in automation and a solid understanding of configuration management tools is essential.

Benefits:

  • The position offers a competitive total compensation package.
  • A 401k plan is provided.
  • Medical, dental, and vision coverage is included.
  • Wellness reimbursement is available.
  • Education reimbursement is offered.
  • Flexible time off is part of the benefits package.
Apply now
Please, let Sift know you found this job on RemoteYeah . This helps us grow 🌱.
About the job
Posted on
Job type
Salary
$ 164,200 - 222,200 USD / year
Report this job

Job expired or something else is wrong with this job?

Report this job
Leave a feedback