Remote Senior DevOps Engineer

at Aldea

Posted 1 day ago 1 applied

Description:

  • The role involves building AI infrastructure that scales, managing complex multi-cluster Kubernetes deployments across five distinct environments: NMS, Sandbox, Development, Staging, and Production.
  • The candidate will design systems for production readiness while ensuring security and operational excellence.
  • Responsibilities include managing multi-environment Kubernetes architecture, designing redundancy and failover mechanisms for the centralized NMS hub, and developing Pulumi-based infrastructure using Python.
  • The role requires managing complex cross-environment dependencies, automating resource provisioning, and implementing zero-trust security measures.
  • The candidate will deploy and configure observability tools such as Prometheus, Grafana, and CloudWatch, and design alerting and incident response procedures.
  • The position also involves managing a centralized API for all environments and optimizing resource utilization across node groups.

Requirements:

  • The candidate must have 5+ years of experience in DevOps, SRE, or infrastructure engineering.
  • Expert-level Kubernetes experience with EKS and multi-cluster management is required.
  • Strong Python programming skills for infrastructure automation and API development are essential.
  • Expertise in Infrastructure as Code with Pulumi, Terraform, or similar tools is necessary.
  • Deep knowledge of AWS services including VPC, EKS, ECR, S3, CloudWatch, IAM, and networking is required.
  • Experience in Linux system administration and containerization with Docker is needed.
  • Hands-on experience with Prometheus, Grafana, and centralized logging systems is a must.
  • The candidate should have network security experience, including VPN, firewalls, and certificate management, along with an understanding of zero-trust architecture principles.
  • Nice-to-have qualifications include experience with machine learning infrastructure, HashiCorp Vault administration, GitOps, service mesh technologies, database administration, and CI/CD pipeline design.

Benefits:

  • The position offers a competitive base salary and a performance-based bonus based on achieving goals.
  • Equity participation is included as part of the compensation package.
  • Comprehensive benefits are provided, including health, dental, vision, and paid time off.
  • A flexible work environment is available, with options for hybrid work or remote work considered.
  • There is an option to start on a contract basis with the potential for full-time hire.