Remote Senior Platform Engineer - AI Team (100% remote-friendly within Spain)

at Docplanner

Posted 1 day ago 1 applied

Description:

  • The Internal Platform is a crucial foundation that accelerates product development by providing a reliable, scalable, and self-service ecosystem.
  • It supports the entire software lifecycle and is tailored to meet the organization’s technological needs and strategic direction.
  • The platform enables development teams to operate autonomously in 80% of cases, reducing dependency on the Internal Platform area.
  • Compliance, security, and business continuity are integrated across the entire platform, ensuring the reliability of services and data integrity.
  • The Internal Platform area consists of 34 people, including 27 individual contributors, 2 Staff Engineers, 4 Engineering Managers, and the Head of Internal Platform.
  • The AI Platform Engineers team is part of the Internal Platform area and currently consists of 1 Platform Engineer, 1 Software Engineer, and 1 Engineering Manager.
  • The team is responsible for designing, building, and maintaining the company AI infrastructure, ensuring high availability, scalability, security, and cost efficiency.
  • The team faces challenges such as understanding and integrating diverse technology stacks and ensuring scalability and reliability of systems as Docplanner grows.
  • The role involves working closely with Product Teams to understand their technical requirements and provide necessary tools, services, and infrastructure.
  • Responsibilities include designing and managing AI Platform architecture, controlling AI-related costs, ensuring high availability of AI services, collaborating with ML teams, troubleshooting issues, and providing support to team members.

Requirements:

  • Strong experience with Kubernetes is a must-have.
  • Knowledge of Terraform, Crossplane, and Helm charts is nice to have.
  • Experience with CI/CD tools like GH Actions, Argo CD, and Argo Rollouts is required.
  • Familiarity with tools like Karpenter, KEDA, Velero, and Cilium is beneficial.
  • Experience in building secure, scalable, and high-availability environments on AWS is necessary.
  • Understanding of Disaster Recovery Planning and strategies for cloud infrastructure is required.
  • Familiarity with cloud AI offerings such as AWS Bedrock or Azure OpenAI is expected.
  • Experience with Python applications at scale is necessary.
  • Experience working with GPUs and distributing workloads is required.
  • Candidates should be prepared to work in a startup-like environment with shifting priorities and evolving tasks.
  • Comfort with scripting or developing tools in Bash or Go is necessary.
  • Proficiency in English (both spoken and written, minimum B2 level) is required.
  • A growth mindset is valued, as not all candidates will tick all boxes but a willingness to learn is essential.

Benefits:

  • A salary that is adequate to your experience and skills, with 100% transparency.
  • A flexible remuneration and benefits system via Flexoh, which includes a restaurant card, transportation card, kindergarten, and training tax savings.
  • Share options plan available after 6 months of employment.
  • A remote or hybrid work model with a hub in Barcelona.
  • Fully flexible working hours, with only a couple of meetings required weekly.
  • A summer intensive schedule during July and August, allowing for a shorter workday.
  • 23 paid holidays, with the option to exchange local bank holidays.
  • An additional paid holiday on your birthday or work anniversary.
  • A private healthcare plan with Adeslas for you and subsidized for your family, covering medical and dental needs.
  • Access to hundreds of gyms for a symbolic fee through a partnership with Wellhub.
  • Access to iFeel, a technological platform for mental wellness offering online psychological support and counseling.
  • Free English and Spanish classes to promote continuous learning.