The company is building Protege to address the significant unmet need in AI for access to the right training data.
The Protege platform aims to facilitate the secure, efficient, and privacy-centric exchange of AI training data.
The role is for a Senior Member of the Core Data Team/Principal Scientist to lead the evaluation and optimization of large-scale datasets for training state-of-the-art AI models.
The candidate will define what "high-quality data" means in practice, using statistical, computational, and ML-driven methods to ensure data is diverse, representative, and impactful.
The position involves collaboration with research and engineering teams to enhance model performance through improved data quality.
Key responsibilities include designing statistical and machine learning methods for curating large-scale unstructured datasets, developing frameworks to assess data quality, and providing leadership on data quality strategy.
Requirements:
A PhD or equivalent Master's Degree with 4+ years of industry experience in machine learning, economics, mathematics, engineering, computer science, statistics, or a related quantitative field is required.
A strong understanding of AI model training pipelines, including pre-processing and evaluation, is essential.
Experience working with large, unstructured datasets, particularly text, is necessary.
A background in statistical analysis, bias detection, and data validation is required.
The candidate must be able to identify high-impact problems and drive independent solutions.
Benefits:
The position offers the opportunity to work at the forefront of AI data solutions, contributing to a generational opportunity in the tech industry.
The role provides a chance to shape the future of AI training data and influence the quality of AI models.
Employees will be part of a collaborative environment, working closely with research and engineering teams.
The company promotes a culture of innovation and values contributions to research and development in the field of AI.