Please, let CentML know you found this job
on RemoteYeah.
This helps us grow π±.
Description:
CentML is seeking highly motivated and skilled systems engineers to develop the CentML platform, which provides cost-effective infrastructure for serving and training large-scale machine learning models.
The role involves designing and building solutions for scheduling large-scale ML training and inference workloads on GPU clusters across multiple cloud service providers (CSPs).
The engineer will communicate with product teams to define use cases and develop methodologies and benchmarks to evaluate different approaches.
The position is available in Toronto, San Francisco Bay Area, or remotely in the USA.
Requirements:
A Bachelor's degree in Computer Science, Computer Engineering, or a relevant technical field, or equivalent practical experience is required. A graduate degree with research experience is a plus.
Candidates must have experience building large-scale systems from scratch, with prior experience in container-based deployment systems like Kubernetes being a significant advantage.
Strong coding skills in at least one of Python or C++ are essential.
Solid fundamentals in computer science and engineering topics such as algorithms and data structures, operating systems, and computer architecture are required.
Benefits:
The company offers an open and inclusive work environment.
Employees receive stock options as part of their compensation.
Best-in-class medical and dental benefits are provided.
There is a parental leave top-up for 6 months.
A professional development budget is available for employees.
Flexible vacation time is offered to promote a healthy work-life balance.
Apply now
Please, let CentML know you found this job
on RemoteYeah
.
This helps us grow π±.