Senior Machine Learning Engineer

Apply for this job

Job Description

Summary

The Senior Machine Learning Engineer designs and builds scalable, high-performance machine learning training infrastructure to support advanced AI research and model development. This role focuses on distributed training, performance optimization, and platform reliability to enable next-generation intelligent systems.

Key Responsibilities

  • Design and develop scalable, reliable, and high-performance ML training frameworks
  • Analyze and optimize model training performance to scale distributed workflows and maximize hardware utilization
  • Improve system observability, debuggability, operational excellence, and overall user experience
  • Collaborate with machine learning engineers, research scientists, and cross-functional teams to integrate new platform features
  • Support distributed training across heterogeneous hardware environments to improve efficiency and reduce cost
  • Contribute hands-on technical solutions in highly dynamic and evolving environments

Required Qualifications

  • Bachelor’s degree or higher in Computer Science or a related field, or equivalent experience
  • 5+ years of professional software engineering experience
  • 2+ years of experience in AI or ML infrastructure, including distributed training systems
  • Strong programming skills in Python
  • Experience with machine learning frameworks such as PyTorch or TensorFlow
  • Experience with distributed computing, GPU computing, and cloud platforms
  • Willingness to travel to Sunnyvale, CA as needed
  • Ability to work effectively in ambiguous and fast-changing environments

Preferred Qualifications

  • Extensive experience with PyTorch 2.x and distributed training frameworks
  • Experience designing training systems supporting FSDP, pipeline parallelism, or large-scale model training
  • Strong background in profiling, debugging, and optimizing training and data loading performance
  • Excellent communication skills with the ability to align stakeholders and resolve complex technical discussions

Tech Stack

Python, PyTorch, TensorFlow, Distributed Training, GPU Computing, Cloud Platforms, Machine Learning Infrastructure

Compensation & Benefits

Competitive compensation aligned with market standards. Bonus eligibility and comprehensive benefits including health coverage, retirement plans, paid time off, and tuition assistance.

Work Schedule

Monday–Friday with hybrid or remote work depending on location and proximity to designated offices.

About the Client

Our client is a global automotive and technology organization advancing intelligent mobility solutions through cutting-edge AI and machine learning platforms.

Why Join the Team?

  • Build foundational ML infrastructure supporting large-scale AI research
  • Work on high-impact technologies shaping the future of intelligent systems
  • Collaborate with world-class engineers and researchers

How to Apply

Submit your resume to people@ignitetalentpartners.com. Shortlisted candidates will be contacted.