Welcome to RemoteYeah 2.0! Find out more about the new version here.

Remote Research Engineer, Agentic AI Evals

at HUD

Posted 1 week ago 5 applied

Description:

  • HUD (YC W25) is developing agentic evals for Computer Use Agents (CUAs) that browse the web, with a mission to create detailed evaluations for AI agents to ensure they work effectively in the real world.
  • The company is funded by Y Combinator and a16z, collaborating with frontier AI labs to provide scalable agent evaluation infrastructure.
  • The role of a research engineer involves building task configurations and environments for evaluation datasets on HUD's CUA evaluation framework.
  • Responsibilities include creating environments for HUD's CUA evaluation datasets, which encompass safety redteaming, general business tasks, and long-horizon agentic tasks, as well as developing custom CUA datasets and evaluation pipelines in the future.

Requirements:

  • Proficiency in Python, Docker, and Linux environments is required.
  • Experience with React for frontend development is necessary.
  • Production-level software development experience is preferred.
  • A strong technical aptitude and demonstrated problem-solving ability are essential.
  • Hands-on experience with LLM evaluation frameworks and methodologies is a plus.
  • Contributions to evaluation harnesses (such as EleutherAI or Inspect) and experience in building custom evaluation pipelines or datasets are desirable.
  • Familiarity with agentic or multimodal AI evaluation systems is beneficial.
  • Candidates should have startup experience in early-stage technology companies and the ability to work independently in fast-paced environments.
  • Strong communication skills for remote collaboration across time zones are important.
  • Understanding of safety and alignment considerations in AI systems is advantageous.
  • Evidence of rapid learning and adaptability in technical environments is preferred.

Benefits:

  • The position is full-time preferred, with consideration for internship offers.
  • The role is remote-friendly, with an office available in the San Francisco Bay Area for those who prefer in-person collaboration.
  • Visa sponsorship and relocation support are provided for strong full-time candidates.
  • The application process is rolling, typically involving 1-2 interviews and taking less than a week.