The role involves helping to train large-language models (LLMs) to write production-grade code across various programming languages.
Responsibilities include comparing and ranking multiple code snippets, explaining which is best and why.
The engineer will repair and refactor AI-generated code for correctness, efficiency, and style.
The position requires injecting feedback (ratings, edits, test results) into the RLHF pipeline and ensuring it runs smoothly.
The end goal is for the model to learn to propose, critique, and improve code in a manner similar to the engineer's approach.
The process involves generating code, having expert engineers rank, edit, and justify it, converting that feedback into reward signals, and using reinforcement learning to tune the model toward deployable code.
Requirements:
Candidates must have 4+ years of professional software engineering experience in Python. Constraint programming experience is a bonus but not required.
Strong code-review instincts are necessary, with the ability to quickly spot logic errors, performance traps, and security issues.
Extreme attention to detail and excellent written communication skills are essential, as much of the role involves explaining the reasoning behind code choices.
Candidates should enjoy reading documentation and language specifications and thrive in an asynchronous, low-oversight environment.
No prior RLHF or AI training experience is required, nor is deep machine learning knowledge; the ability to review and critique code clearly is sufficient.
Benefits:
The position is fully remote, allowing candidates to work from anywhere.
Compensation ranges from $30/hr to $70/hr, depending on location and seniority.
The role offers flexible hours, with a minimum of 15 hours per week and up to 40 hours available.
Engagement is through a 1099 contract, providing straightforward impact without unnecessary complexities.