Job Description
Summary
The Senior Machine Learning Engineer designs and builds scalable, high-performance machine learning training infrastructure to support advanced AI research and model development. This role focuses on distributed training, performance optimization, and platform reliability to enable next-generation intelligent systems.
Key Responsibilities
- Design and develop scalable, reliable, and high-performance ML training frameworks
- Analyze and optimize model training performance to scale distributed workflows and maximize hardware utilization
- Improve system observability, debuggability, operational excellence, and overall user experience
- Collaborate with machine learning engineers, research scientists, and cross-functional teams to integrate new platform features
- Support distributed training across heterogeneous hardware environments to improve efficiency and reduce cost
- Contribute hands-on technical solutions in highly dynamic and evolving environments
Required Qualifications
- Bachelor’s degree or higher in Computer Science or a related field, or equivalent experience
- 5+ years of professional software engineering experience
- 2+ years of experience in AI or ML infrastructure, including distributed training systems
- Strong programming skills in Python
- Experience with machine learning frameworks such as PyTorch or TensorFlow
- Experience with distributed computing, GPU computing, and cloud platforms
- Willingness to travel to Sunnyvale, CA as needed
- Ability to work effectively in ambiguous and fast-changing environments
Preferred Qualifications
- Extensive experience with PyTorch 2.x and distributed training frameworks
- Experience designing training systems supporting FSDP, pipeline parallelism, or large-scale model training
- Strong background in profiling, debugging, and optimizing training and data loading performance
- Excellent communication skills with the ability to align stakeholders and resolve complex technical discussions
Tech Stack
Python, PyTorch, TensorFlow, Distributed Training, GPU Computing, Cloud Platforms, Machine Learning Infrastructure
Compensation & Benefits
Competitive compensation aligned with market standards. Bonus eligibility and comprehensive benefits including health coverage, retirement plans, paid time off, and tuition assistance.
Work Schedule
Monday–Friday with hybrid or remote work depending on location and proximity to designated offices.
About the Client
Our client is a global automotive and technology organization advancing intelligent mobility solutions through cutting-edge AI and machine learning platforms.
Why Join the Team?
- Build foundational ML infrastructure supporting large-scale AI research
- Work on high-impact technologies shaping the future of intelligent systems
- Collaborate with world-class engineers and researchers
How to Apply
Submit your resume to people@ignitetalentpartners.com. Shortlisted candidates will be contacted.