Are you a Python engineer with a background in AI/ML who wants to work on the systems that train large language models?
Project Overview:
You’ll be part of a team creating reinforcement learning environments used directly in advanced LLM training pipelines. Your work will influence how models learn complex behaviors and how those behaviors are evaluated and improved over time.
What You’ll Be Doing:
Design and implement RL environments for LLM training
Create task prompts that specify desired model behavior
Build judges and automated evaluation logic
Design and integrate tool interfaces for model interaction
Work with data to create diverse, high-quality tasks
Run, debug, and improve environments inside virtualized execution setups
Collaborate with AI/ML engineers and researchers on reward signals and evaluation criteria
Improve robustness, reproducibility, and diversity of the environment suite
Requirements:
5+ years of professional Python programming experience
Background in AI / Machine Learning (industry, research, or advanced projects)
Solid understanding of reinforcement learning concepts
Experience with RL environments or frameworks (e.g. OpenAI Gym or similar)
Experience building systems involving automated evaluation, validation, or judging logic
Experience working with large language models
Experience designing custom RL environments for training
Experience with virtual machines, sandboxed execution, or containerized environments
Background in AI research, ML infrastructure, or training pipelines
What do we offer?
Fully remote contractor role with flexible hours
Competitive compensation
Opportunity to transition into a full-time role
Career development opportunities within Camplight’s cooperative structure
Supportive, collaborative, and innovative team culture
What does the interview process look like?
Initial Interview: A 45-minute cultural and technical conversation with two Camplight team members.
Technical Deep Dive: Choose between:
A homework assignment (2 hours) followed by a 1-hour discussion.
A pair programming session (2 hours) focused on real-world problem-solving.
Regardless of the outcome, you’ll receive constructive feedback to help you grow.