LMArena is an engineering-first startup focused on evaluating large language models, created in 2023 by UC Berkeley researchers.
The company has a community-driven benchmarking platform that attracts over one million monthly users, comparing leading models to provide real-time insights.
The position is for an experienced Site Reliability Engineer (SRE) who will take ownership of infrastructure, processes, and operational security.
Responsibilities include managing infrastructure operations across Cloudflare, Vercel, and CI/CD pipelines, embedding security best practices, and mentoring team members.
The role requires a seasoned SRE who can balance reliability, performance, and security while supporting fast-moving product teams.
Requirements:
Candidates must have 7+ years of experience in SRE/DevOps roles for high-traffic SaaS or consumer web products.
Proven expertise in securing and scaling Cloudflare and Vercel or similar platforms is required.
A deep understanding of web application security, networking, TLS, and zero-trust principles is essential.
Strong proficiency with infrastructure as code tools like Terraform or Pulumi, and serverless build pipelines such as GitHub Actions is necessary.
Candidates should possess strong programming abilities in Golang, Python, or TypeScript, along with scripting skills.
Demonstrated success in designing and enforcing change-management workflows is required.
Excellent written communication skills are necessary for producing clear runbooks and architecture documentation.
A track record of mentoring or leading junior engineers is essential.
Benefits:
Employees will have the opportunity to set the foundation for reliability and security in a rapidly growing AI benchmarking platform.
The company promotes a culture that is engineering-first, documentation-driven, and community-obsessed.
Compensation includes a competitive salary, meaningful equity, comprehensive benefits, and a professional-development budget.