Welcome to RemoteYeah 2.0! Find out more about the new version here.

Remote Member of Technical Staff, Pre-Training Data Engineer

at Cohere

Posted 1 day ago 3 applied

Description:

  • As a Pre-Training Data Engineer at Cohere, you will be responsible for developing the data infrastructure that supports advanced language models.
  • Your role includes end-to-end management of training data, which involves ingestion, cleaning, filtering, and optimization.
  • You will work with diverse data sources such as web data, code data, multilingual corpora, and synthetic data to ensure their quality, diversity, and reliability.
  • You will design and implement scalable and robust pipelines for data processing and conduct data ablations to evaluate data quality.
  • Experimenting with data mixtures to enhance model performance will also be part of your responsibilities.
  • Your work will bridge the gap between raw data and cutting-edge AI models, contributing to improvements in training metrics like throughput and accelerator utilization.
  • This position is remote-friendly, with no restrictions on location, and you will collaborate with cross-functional teams to meet the demands of language models.

Requirements:

  • Strong software engineering skills are required, with proficiency in Python and experience in building data pipelines.
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools is necessary.
  • Experience working with large-scale datasets, including web data, code data, and multilingual corpora, is essential.
  • Knowledge of data quality assessment techniques and experimentation with data mixtures is required.
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training is important.
  • Bonus points for having published papers at top-tier venues such as NeurIPS, ICML, ICLR, and others.

Benefits:

  • Cohere offers an open and inclusive culture and work environment.
  • Employees work closely with a team on the cutting edge of AI research.
  • A weekly lunch stipend, in-office lunches, and snacks are provided.
  • Full health and dental benefits are included, along with a separate budget for mental health care.
  • Employees based in Canada, the US, and the UK receive a 100% parental leave top-up for 6 months.
  • Personal enrichment benefits are available for arts and culture, fitness and well-being, quality time, and workspace improvement.
  • The position is remote-flexible, with offices in Toronto, New York, San Francisco, and London, along with a co-working stipend.
  • Employees enjoy 6 weeks of vacation.