Nu is the world’s largest digital banking platform outside of Asia, serving over 105 million customers across Brazil, Mexico, and Colombia.
The company focuses on transforming the industry by leveraging data and proprietary technology to create innovative products and services.
The AI Infrastructure Squad within the AI Core BU builds and scales foundational cloud, data, and AI infrastructure for machine learning workloads.
The team designs and optimizes high-performance training, inference, and data processing systems while ensuring reliability, scalability, and efficiency.
As a Software Engineer in the AI Core BU, you will demonstrate strong expertise in cloud infrastructure (AWS or GCP) and distributed computing.
You will work with Kubernetes, container orchestration, and infrastructure as code (Terraform, Pulumi).
Proficiency in programming languages, particularly Python and Go, is expected.
Experience writing ETL pipelines, preferably with Spark or BigQuery, is required.
You will have experience with ML infrastructure, including model training, batch and online inference, and monitoring.
Strong knowledge of networking, storage, and security in large-scale systems is necessary.
Familiarity with workflow orchestration tools (e.g., Dagster, Airflow) and model-serving frameworks (e.g., Ray Serve, vLLM) is important.
You will optimize performance and cost efficiency of AI workloads on cloud and on-prem environments.
Proven track record of leading complex infrastructure projects from design to production is essential.
You should be comfortable working on ambiguous and evolving projects, quickly identifying key challenges and driving solutions.
Experience in designing high-availability, fault-tolerant systems for AI/ML workloads is required.
You will have hands-on experience with monitoring, observability, and alerting for production systems.
Requirements:
Strong expertise in cloud infrastructure (AWS or GCP) and distributed computing is required.
Experience with Kubernetes, container orchestration, and infrastructure as code (Terraform, Pulumi) is necessary.
Proficiency in programming languages, particularly Python and Go, is a plus.
Experience writing ETL pipelines, with a preference for Spark or BigQuery, is required.
Experience with ML infrastructure, including model training, batch and online inference, and monitoring, is essential.
Strong knowledge of networking, storage, and security in large-scale systems is necessary.
Familiarity with workflow orchestration tools (e.g., Dagster, Airflow) and model-serving frameworks (e.g., Ray Serve, vLLM) is important.
Experience optimizing performance and cost efficiency of AI workloads on cloud and on-prem environments is required.
Proven track record of leading complex infrastructure projects from design to production is essential.
You should be comfortable working on ambiguous and evolving projects, quickly identifying key challenges and driving solutions.
Experience in designing high-availability, fault-tolerant systems for AI/ML workloads is required.
Hands-on experience with monitoring, observability, and alerting for production systems is necessary.
Benefits:
Remote work is offered, with quarterly trips to Sao Paulo to build relationships with coworkers.
Top Tier Medical Insurance is provided.
Top Tier Dental and Vision Insurance is included.
Employees receive 20 days of time off, 14 company holidays, and a great culture that emphasizes work-life balance.
Life Insurance and AD&D are part of the benefits package.
Extended maternity and paternity leaves are available.
Nucleo, a learning platform of courses, is provided for employee development.
NuLanguage, a language learning program, is offered.
NuCare, a mental health and wellness assistance program, is available.
A 401K plan is included in the benefits.
Saving Plans, including Health Saving Account and Flexible Spending Account, are offered.