NVIDIA Logo

NVIDIA

Senior System Software Engineer, Distributed Systems - DGX Cloud

Job Posted 25 Days Ago Posted 25 Days Ago
Be an Early Applicant
Remote
3 Locations
Senior level
Remote
3 Locations
Senior level
As a Senior System Software Engineer at NVIDIA, you'll design and architect a platform for automating GPU asset management across cloud providers. You'll create solutions for datacenter firmware, ensure seamless integration of software with hardware, and collaborate with multi-functional teams to address reliability and performance in distributed systems.
The summary above was generated by AI

NVIDIA is hiring engineers to scale up its AI Infrastructure. We expect you to have a strong programming background, a deep understanding of distributed systems, familiarity with software testing and deployment, and excellent communication and planning abilities. We also welcome out-of-the-box thinkers who can provide new ideas with strong at execution bias. Expect to be constantly challenged, improving, and evolving for the better. You and other engineers in this team will help advance NVIDIA's capacity to build and deploy leading infrastructure solutions for a broad range of AI-based applications that affect core data science. What are you waiting for if you're creative, passionate about what you do, and love having fun apply today!

What you’ll be doing:

  • We are designing and architecting a comprehensive platform that automates GPU asset provisioning, configuration, and lifecycle management across cloud providers.

  • Design, develop, test, debug, and optimize creative solutions for Datacenter firmware throughout lifecycle.

  • Work closely with hardware, software, infrastructure, and business teams to transform new firmware features from idea to reality.

  • Define server-level reliability, availability, and serviceability requirements in collaboration with various customers like CSPs and deliver fault resilient solution at scale as per customer expectations.

  • Collaborate with hardware, software and firmware teams to drive failure analysis and large scale solution deployment.

  • Work with engineering teams across NVIDIA to ensure your software integrates seamlessly from the hardware all the way up to the AI training applications.

What we need to see:

  • BS, MS, or PhD in EE/CS or related field of education (or equivalent experience) with 6+ years of experience active development using Python as primary programming language using Linux as OS.

  • Highly motivated with strong communication skills, you have the ability to work successfully with multi-functional teams, principles and architects and coordinate effectively across organizational boundaries and geographies.

  • Familiarity with industry standards and specifications such as SPI, I2C, PCIe, UEFI and PLDM.

  • System knowledge - how platform management works - areas like BMC-BIOS communication, thermal management, power management, firmware update, device monitoring, firmware security, etc.

  • Expert level knowledge of a systems programming language (Go, Python) and a solid understanding of Data Structure and Algorithms.

  • Understanding of performance, security and reliability in complex distributed systems. Familiarity with system level architecture, data synchronization, fault tolerance and state management.

Ways to stand out from the crowd:

  • Background with In-depth understanding of the interaction of machine check architecture and error flows with system firmware/software.

  • Familiar with Linux server design, x86/ARM system architecture, interconnects like PCI, and other I/O buses.

  • Proven operational excellence in designing and maintaining cloud AI infrastructure. Proficiency in architecting and running large-scale distributed systems, independent of cloud providers.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hard-working people in the world working for us. Are you creative and autonomous? Do you love a challenge? If so, we want to hear from you.

The base salary range is 148,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Top Skills

Ai-Based Applications
Datacenter Firmware
Distributed Systems
Go
Gpu Asset Provisioning
Linux
Pcie
Pldm
Python
Uefi
HQ

NVIDIA Seattle, Washington, USA Office

4545 Roosevelt Way NE 6th Floor, Seattle, Washington, United States, 98105

Similar Jobs

13 Days Ago
Easy Apply
Remote
United States
Easy Apply
Senior level
Senior level
Big Data • Fintech • Mobile • Payments • Financial Services
The Senior Software Engineer will optimize cloud infrastructure and distributed systems, develop and maintain load-testing frameworks, and lead project delivery while collaborating with stakeholders. Responsibilities include capacity planning, creating monitoring metrics, advocating for quality standards, and mentoring team talent.
Top Skills: AWSKotlinKubernetesLocustMySQLPython
13 Days Ago
Remote
CA, USA
110K-180K Annually
Senior level
110K-180K Annually
Senior level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
This role involves designing and implementing scalable Kubernetes solutions, optimizing infrastructure reliability, integrating open-source technologies, and mentoring junior engineers. The candidate will work on large-scale distributed systems and contribute to the evolution of Kubernetes infrastructure services.
Top Skills: Amazon Web ServicesArgoBashCiliumClusterapiFluxcdHelmKubernetesPythonRook
11 Days Ago
Easy Apply
Remote
United States
Easy Apply
Senior level
Senior level
Big Data • Fintech • Mobile • Payments • Financial Services
As a Staff Software Engineer in Capacity Engineering at Affirm, you will optimize cloud infrastructure and distributed systems. Your responsibilities include setting technical strategies, collaborating across teams, leading load testing initiatives, and fostering a culture of quality and ownership within your team, ensuring operational excellence.
Top Skills: AWSGatlingJmeterKotlinKubernetesLocustMySQLPythonSpark

What you need to know about the Seattle Tech Scene

Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.

Key Facts About Seattle Tech

  • Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Amazon, Microsoft, Meta, Google
  • Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Madrona, Fuse, Tola, Maveron
  • Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account