NVIDIA Logo

NVIDIA

Deep Learning Engineer - Distributed Task-Based Backends

Job Posted 4 Days Ago Posted 4 Days Ago
Be an Early Applicant
Remote
2 Locations
Senior level
Remote
2 Locations
Senior level
Develop next generation distributed backends for Deep Learning frameworks, optimizing performance of AI models using task-based runtime systems. Collaborate with teams at NVIDIA to enhance training workloads and support enterprise customers.
The summary above was generated by AI

We are looking for Senior to Principal level experienced software professionals to help build the next generation of distributed backends for premier Deep Learning frameworks like PyTorch, JAX and TensorFlow. You will build on top of validated task-based runtime systems like Legate, Legion & Realm to develop a platform that can scale a wide range of model architectures to thousands of GPUs!

What You Will Be Doing:

  • Develop extensions to popular Deep Learning frameworks, that enable easy experimentation with various parallelization strategies!

  • Develop compiler optimizations and parallelization heuristics to improve the performance of AI models at extreme scales

  • Develop tools that enable performance debugging of AI models at large scales

  • Study and tune Deep Learning training workloads at large scale, including important enterprise and academic models

  • Support enterprise customers and partners to scale novel models using our platform

  • Collaborate with Deep Learning software and hardware teams across NVIDIA, to drive development of future Deep Learning libraries

  • Contribute to the development of runtime systems that underlay the foundation of all distributed GPU computing at NVIDIA

What We Need To See:

  • BS, MS or PhD degree in Computer Science, Electrical Engineering or related field (or equivalent experience)

  • 5+ years of relevant industry experience or equivalent academic experience after BS

  • Proficient with Python and C++ programming

  • Strong background with parallel and distributed programming, preferably on GPUs

  • Hands-on development skills using Machine Learning frameworks (e.g. PyTorch, TensorFlow, Jax, MXNet, scikit-learn etc.)

  • Understanding of Deep Learning training in distributed contexts (multi-GPU, multi-node)

Ways To Stand Out From The Crowd:

  • Experience with deep-learning compiler stacks such as XLA, MLIR, Torch Dynamo

  • Background in performance analysis, profiling and tuning of HPC/AI workloads

  • Experience with CUDA programming and GPU performance optimization

  • Background with tasking or asynchronous runtimes, especially data-centric initiatives such as Legion

  • Experience building, debugging, profiling and optimizing multi-node applications, on supercomputers or the cloud

The base salary range is 148,000 USD - 287,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Top Skills

C++
Cuda
Jax
Mlir
Mxnet
Python
PyTorch
Scikit-Learn
TensorFlow
Torch Dynamo
Xla
HQ

NVIDIA Seattle, Washington, USA Office

4545 Roosevelt Way NE 6th Floor, Seattle, Washington, United States, 98105

Similar Jobs

2 Hours Ago
Easy Apply
Remote
4 Locations
Easy Apply
Senior level
Senior level
Consumer Web • Healthtech • Professional Services • Social Impact • Software
As an Engineering Manager at Headway, you will lead and grow a team of engineers dedicated to building operationally efficient claims tooling and systems, ensuring alignment with product goals while advancing the company's mission of accessible mental healthcare.
Top Skills: AWSDatadogFastapiGitLaceworkNext.JsPagerdutyPostgresPython 3ReactRedisRemixSemgrepSentrySnykSqlalchemyTypescript
2 Hours Ago
Easy Apply
Remote
Hybrid
United States
Easy Apply
135K-228K Annually
Senior level
135K-228K Annually
Senior level
Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
The Senior Machine Learning Engineer at Samsara will develop and deploy AI models for edge devices, optimizing ML for real-time performance and collaborating with hardware teams. Responsibilities include enhancing AI efficiency, ensuring model integration in resource-constrained environments, and troubleshooting edge deployments while staying updated with advancements in the field.
2 Hours Ago
Remote
Hybrid
Charlotte, NC, USA
130K-170K Annually
Mid level
130K-170K Annually
Mid level
Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Big Data Analytics • Automation
As a Global Solutions Engineer at Dynatrace, you will provide technical support to the sales team, execute demos, manage POCs, and build customer relationships. Collaborate with teams to communicate technical value and innovate solutions, while participating in trade shows and gathering feedback for continuous improvement.

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

  • Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Amazon, Microsoft, Meta, Google
  • Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Madrona, Fuse, Tola, Maveron
  • Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account