Remote Research Engineer, Agentic AI Evals

at HUD

Posted 1 month ago 5 applied

Description:

HUD (YC W25) is developing agentic evals for Computer Use Agents (CUAs) that browse the web, with a mission to create detailed evaluations for AI agents to ensure they work effectively in the real world.
The company is funded by Y Combinator and a16z, collaborating with frontier AI labs to provide scalable agent evaluation infrastructure.
The role of a research engineer involves building task configurations and environments for evaluation datasets on HUD's CUA evaluation framework.
Responsibilities include creating environments for HUD's CUA evaluation datasets, which encompass safety redteaming, general business tasks, and long-horizon agentic tasks, as well as developing custom CUA datasets and evaluation pipelines in the future.

Requirements:

Proficiency in Python, Docker, and Linux environments is required.
Experience with React for frontend development is necessary.
Production-level software development experience is preferred.
A strong technical aptitude and demonstrated problem-solving ability are essential.
Hands-on experience with LLM evaluation frameworks and methodologies is a plus.
Contributions to evaluation harnesses (such as EleutherAI or Inspect) and experience in building custom evaluation pipelines or datasets are desirable.
Familiarity with agentic or multimodal AI evaluation systems is beneficial.
Candidates should have startup experience in early-stage technology companies and the ability to work independently in fast-paced environments.
Strong communication skills for remote collaboration across time zones are important.
Understanding of safety and alignment considerations in AI systems is advantageous.
Evidence of rapid learning and adaptability in technical environments is preferred.

Benefits:

The position is full-time preferred, with consideration for internship offers.
The role is remote-friendly, with an office available in the San Francisco Bay Area for those who prefer in-person collaboration.
Visa sponsorship and relocation support are provided for strong full-time candidates.
The application process is rolling, typically involving 1-2 interviews and taking less than a week.

Apply now

Please let HUD know you found this job on RemoteYeah. This helps us get more companies to post jobs here for you.

Hiring company

H

HUD

View all HUD jobs Visit www.hud.so

About the job

Posted on

June 26, 2025

Job type

Full-time Part-time Contract Internship

Salary

-

Location requirements

🇨🇳 China 🇸🇬 Singapore 🇺🇸 United States

Job title

Other

Experience level

Mid-level

Degree requirement

🎓🚫 No degree required

Skills

React.js Docker LESS Large Language Models Python

Benefits

-

Report this job

Job expired or something else is wrong with this job?

Report job