Find your dream job now!

Click on Location links to filter by Job Title & Location.
Click on Company links to filter by Company & Location.
For exact match, enclose search terms in "double quotes".

Keywords: Senior Site Reliability Engineer - GPU Clusters, Location: Santa Clara, CA

Page: 1

Senior Site Reliability Engineer - Internal AI Research Clusters

Compute Clusters. As a Site Reliability Engineer, you will help us with the strategic challenges we encounter including... and implementation of ground breaking GPU compute clusters that run demanding deep learning, high performance computing...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 25 Sep 2024

Senior Site Reliability Engineer - GPU Clusters

Engineer to lead the design, deployment, and management of our large-scale GPU clusters. These clusters will power AI workloads...: Design, deploy and support large-scale, distributed GPU clusters to run high-performance AI and machine learning workloads...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 13 Nov 2024

Senior Site Reliability Engineer - AI Research Clusters

to improve researchers productivity. As a Site Reliability Engineer, you will help us with the strategic challenges we encounter... and implementation of ground breaking GPU compute clusters that powers all AI research across NVIDIA. We seek an expert to build...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 13 Sep 2024

Senior Site Reliability Engineer - Storage

like you to help us accelerate the next wave of artificial intelligence. Join our team at NVIDIA as a Senior Site reliability..., and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 31 Aug 2024

Senior Site Reliability Engineer - DGX Cloud

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale... at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 24 Oct 2024

Senior Production SRE Engineer - Storage

Site Reliability Engineering (SRE) is an engineering discipline that involves designing, building, and maintaining... that our internal and external facing GPU cloud services have reliability and uptime as promised to the users and at the same time...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 30 Oct 2024