Remote Network DevOps Engineer, RDMA Fabric Automation

Posted 4 months ago

Share:

Please let Vultr know you found this job on RemoteYeah. This helps us get more companies to post jobs here for you.

Description:

  • Vultr is seeking a highly skilled and experienced NetDevOps Engineer to help evolve, automate, and operate RoCE-based Ethernet fabrics.
  • This role is highly visible in a high-growth technology company, working at the intersection of network engineering, operations, automation, and observability.
  • The engineer will build and operate tooling and telemetry pipelines to maintain fast, deterministic, and reliable network fabrics at a global scale.
  • Key responsibilities include automating deployment and operations of large-scale RDMA (RoCEv2) Ethernet fabrics across Vultr data centers.
  • The engineer will build Ansible and Python-based frameworks for provisioning, validating, and remediating underlay and overlay networks.
  • Integration of network automation with Vultr’s source-of-truth systems (NetBox, OpsMill) for intent-driven configuration and validation is required.
  • Development of telemetry ingestion and correlation pipelines (gNMI, Prometheus, Kafka, custom collectors) for real-time network health and performance metrics is essential.
  • Collaboration with platform, orchestration, and product engineering teams to optimize RDMA performance, PFC/ECN behavior, and path symmetry across fabrics is expected.
  • Implementation of CI/CD workflows for network configuration changes, including validation, pre-checks, and rollbacks, is part of the role.
  • The engineer will investigate complex network behaviors across layers, including flow hashing, congestion domains, ECMP, and overlay interactions.
  • Contribution to the design of next-generation GPU and AI interconnect fabrics, ensuring seamless integration into Vultr’s global network architecture, is also required.

Requirements:

  • A solid understanding of modern data center networking, including EVPN-VXLAN, BGP, MLAG, QoS, and traffic engineering, is necessary.
  • Deep familiarity with RoCEv2, RDMA transport tuning, ECN/PFC, and lossless Ethernet design is required.
  • Strong experience with automation frameworks like Ansible and programming languages such as Python, Golang, Rust, or PHP is essential.
  • Comfort working with telemetry and monitoring stacks, including Prometheus, Grafana, Loki, ELK, or similar, is expected.
  • Previous experience integrating with NetBox, Nautobot, OpsMill, or similar for topology and configuration source-of-truth is required.
  • Familiarity with CI/CD systems (GitHub Actions, Jenkins, ArgoCD) for continuous delivery of network automation is necessary.
  • A strong Linux networking background, including knowledge of namespaces, netlink, and system-level debugging, is essential.

Benefits:

  • Vultr offers 100% company-paid insurance premiums for employee medical, dental, and vision plans.
  • A 401(k) plan that matches 100% up to 4%, with immediate vesting, is provided.
  • Professional Development Reimbursement of $2,500 each year is available for employees.
  • Employees receive 11 Holidays, Paid Time Off Accrual, and a Rollover Plan.
  • Increased PTO is offered at 3-year and 10-year anniversaries, along with a 1-month paid sabbatical every 5 years and an Anniversary Bonus each year.
  • A $500 stipend for remote office setup in the first year and $400 each following year is provided.
  • Internet reimbursement up to $75 per month is available.
  • Gym membership reimbursement up to $50 per month is offered.
  • A company-paid Wellable subscription is included as part of the benefits package.

Report this job

Job expired or something else is wrong with this job?

Report job
SerpApi

SerpApi

Scrape Google and other search engines from our fast, easy, and complete API.

RemoteYeah Ads