Please let DistantJob know you found this job on RemoteYeah. This helps us get more companies to post jobs here for you.
Description:
We are recruiting a seasoned DevOps Engineer for an innovative technology company specializing in AI-powered brand safety and contextual intelligence solutions.
This role focuses on building and maintaining the robust infrastructure foundation that supports advanced data analytics and content measurement platforms.
The ideal candidate will design, secure, and optimize operational systems that enable scalable, high-performance applications.
This is a pivotal role for someone passionate about automation, system ownership, and creating resilient infrastructure solutions from the ground up.
Key responsibilities include designing, implementing, and maintaining secure, scalable cloud infrastructure for high-throughput, low-latency applications, mainly on AWS.
The candidate will develop and enhance CI/CD pipelines for efficient, reliable, and consistent deployment processes.
Continuous identification and implementation of infrastructure improvements across the entire product ecosystem is expected.
The role involves architecting and supporting serverless solutions using AWS Lambda, ECS Fargate, and event-driven architecture components.
Establishing comprehensive monitoring, alerting, and logging frameworks to ensure system reliability and visibility is crucial.
The candidate will manage infrastructure-as-code implementations using CloudFormation, Terraform, and related tools.
Partnering with development teams to determine infrastructure requirements, deployment approaches, and operational toolsets is essential.
The role includes overseeing secrets management and identity systems utilizing AWS IAM and similar platforms.
Maintaining compliance with security and privacy standards, including access controls, encryption protocols, and audit mechanisms, is required.
The candidate will resolve production incidents across all service layers with a focus on rapid response and thorough post-incident analysis.
Creating and maintaining automated backup, disaster recovery, and failover systems is part of the responsibilities.
Researching and adopting emerging DevOps methodologies and technologies to enhance platform performance and reliability is expected.
Requirements:
The candidate must have 10+ years of software engineering background with 6+ years focused on DevOps, Site Reliability Engineering, or Infrastructure Engineering.
Demonstrated experience managing production environments with comprehensive infrastructure responsibilities is required.
Extensive AWS expertise, including compute, storage, networking, and identity management services, is necessary.
Practical experience with serverless technologies such as AWS Lambda, Step Functions, EventBridge, API Gateway, and ECS Fargate is essential.
Advanced skills in Docker and container orchestration platforms like Kubernetes, ECS, and GKE are required.
Proficiency with CI/CD platforms such as GitHub Actions, CircleCI, ArgoCD, or Jenkins is necessary.
Strong scripting capabilities in Bash, Python, or Go for automation and tooling development are required.
Experience with observability solutions like Datadog, Prometheus, Grafana, or the ELK stack is essential.
A solid understanding of network design, security frameworks, and zero-trust access architectures is necessary.
Knowledge of secrets management systems and infrastructure-level access policy enforcement is required.
Exceptional troubleshooting and root cause analysis abilities are necessary.
Strong collaborative and communication skills across diverse technical and business teams are essential.
A continuous improvement mindset with a focus on automation, optimization, and security enhancement is required.
Benefits:
This position offers the opportunity to join an early-stage team with proven success in developing cutting-edge contextual intelligence technology.
The candidate will work alongside talented engineers and product leaders who prioritize innovation and measurable client impact.