Research Engineer Job at Delphi, San Francisco, CA

VDlrTnJ2TUNkK0JIRHMxdFBUKzlGaEtEWXc9PQ==
  • Delphi
  • San Francisco, CA

Job Description

Our “Clone Brain” architecture allows you to create a digital representation of your mind—reflecting your knowledge, tone, ways of thinking, and even the purpose that drives your conversations. (For example, a leadership coach might direct their clone to mentor emerging managers, while a consultant might want their clone to focus on sales strategy and client onboarding.)

Up until now, many of our improvements have come from intuition, first principles, and a very basic testing suite. We want to increase the fidelity of each Clone Brain, ensuring it captures its owner’s unique style, knowledge, and conversational aims, while also being able to reason in new situations. But to do that, we need rigorous measurements and interpretability tools that transform “it feels right” into “we have metrics & benchmarks that prove it.”

Enter the Research Engineer – Evals & Interpretability. You’ll develop frameworks that quantify how well each digital clone mirrors the authenticity and expertise of its human counterpart, while also building the tooling to open the black box and figure out why the clone behaves the way it does. If you’re curious about cognitive science, neural network interpretability, and the essence of what makes a human mind unique—this role has your name on it.

What You Will Work On

1. Frontier Eval Systems & Metrics

  • Design, implement, and manage robust evaluation frameworks that measure how faithfully a clone reflects its owner’s tone, style, purpose, and reasoning.
  • Develop automated tests and analysis pipelines to compare new models and architectures, ensuring we’re always improving the fidelity of our Clone Brain.

2. Interpretability & Debugging

  • Build interpretability tools that shine a light on the internal workings of our clone models, from attention heads to knowledge graph structures.
  • Investigate model behaviors and anomalies, surfacing insights that guide algorithmic improvements and mitigate unexpected outcomes.

3. Collaboration & Deployment

  • Work closely with our AI, product, and engineering teams to integrate your evaluation suites into production workflows.
  • Contribute to real-time feedback loops that help experts refine their clone’s knowledge and style with confidence.

4. Infrastructure & Tooling

  • Develop the technical infrastructure for large-scale experimentation and analysis, ensuring that interpretability and eval frameworks can scale across thousands of clones.
  • Help define our data schemas, retrieval strategies, and model instrumentation in collaboration with data and infra engineers.

Preferred Abilities

  • Hands-On Research Experience : A track record of designing experiments and running them end-to-end—whether in AI, ML, or another scientific domain.
  • LLM Familiarity : Experience evaluating or fine-tuning large language models, with an emphasis on measuring alignment, style transfer, or interpretability.
  • Python Proficiency : Strong coding skills to build robust pipelines and experiment frameworks.
  • Evals & Benchmarking : Familiarity with common language model benchmarks and an eagerness to develop new ones.
  • Interpretability Fundamentals : Knowledge of mechanistic interpretability, feature attribution, or circuit-level analysis is a huge plus.
  • Infrastructure & Tools : Comfort with containers, scaling experiments on clusters, and building internal tools.
  • Experimental Mindset : Ability to pivot quickly when an approach doesn’t pan out, and a relentless drive to find creative solutions to open-ended questions.

Why You Might Like This Role

  • Evals for AI is pushing the frontier of research. How to do evals correctly is still an open question. People who will thrive in this role are excited by this challenge, and the opportunity to be at the forefront of research.
  • High level of ownership and impact on product, technical architecture, and company culture
  • Opportunity to define the future of digital cloning, ultimately enabling digital immortality and 1-1 mentorship for the masses.
  • Challenging work that pushes you to your limits
  • Collaboration with a team passionate about scaling human potential and personalized learning
  • Chance to join a fast-growing startup creating a new market, approaching problems from first principles while valuing design and brand

Why You Might Not Like This Role

  • Not a 9-to-5
  • We move fast, iterate often, and tackle ambitious challenges—this isn’t a clock-in/clock-out environment.
  • No Existing Blueprint
  • If you prefer well-trodden paths and established frameworks, be warned: we’re creating something that’s never existed before.
  • Applied AI Over Foundation Research
  • Our focus is on building and optimizing real products for end users, not on training new LLMs from scratch.
  • Fully On-Site
  • We believe in-person collaboration drives better ideas. If you’re looking for remote, this might not be for you.

Job Tags

Remote job,

Similar Jobs

Kellermeyer

Warehouse Janitorial Lead Job at Kellermeyer

 ...immediate full-time, permanent openings for a Warehouse Lead to join our Warehouse Cleaning/...  ...support to the Maintenance Operations Center Team and KBS Field Operations Team by...  ...Experience in janitorial operations and distribution center environment strongly preferred.... 

RejuvenX

Medical Marketing Representative (bilingual) Job at RejuvenX

 ...clients and the strategy to seal the deal? If so, we want YOU as our next Marketing Representative!! You must currently live within 25 miles of the West Palm Beach area NO SALES EXPERIENCE NECCESARY &##128640; Who We Are: At RejuvenX, we're not just about... 

LHH

Graphic Designer Job at LHH

 ...Graphic Designer LHH Recruitment Solutions is currently seeking a graphic designer with 2 or more years of experience for a contract to hire opportunity for a real estate company in New York, NY. This role is hybrid 4 days onsite 1 day work from home. This is a great... 

New York State Office of Mental Health

Office Assistant 2 - Keyboarding, (NY HELPS), Central New York Psychiatric Center, Greene Satelli... Job at New York State Office of Mental Health

 ...within State Service can apply. As an Office Assistant 2 Keyboarding, you will perform a wide variety of complex clerical and administrative support activities, including: Typing, proofreading, reviewing, and correcting correspondence and documents Transcribing... 

Red Bull

Office Administrator Job at Red Bull

 ...providing general office support with a variety of clerical activities and related tasks. Work as liaison between the local Region and Red Bull Distribution Company's (RBDC's) Finance and Human Resources Service Center. OAs embody the RBDC Values (Professionalism, Focus,...