Find your dream job now!

Click on Location links to filter by Job Title & Location.
Click on Company links to filter by Company & Location.
For exact match, enclose search terms in "double quotes".

Keywords: Senior Site Reliability Engineer, AI Infrastructure, Location: Santa Clara, CA

Page: 1

Senior Site Reliability Engineer, AI Infrastructure

. What You Will Be Doing: Develop and maintain large-scale systems supporting critical use cases for AI Infrastructure, driving reliability.... Come join the AI Infrastructure Production engineering team and see how you can make a lasting impact on the world...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 11 Jan 2025

Senior Site Reliability Engineer - AI Research Clusters

to improve researchers productivity. As a Site Reliability Engineer, you are responsible for the big picture of how our systems... infrastructure Proven experience in site reliability engineering for high-performance computing environments with operational...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 03 Jan 2025

Senior Staff Site Reliability Engineer, Database Platform

operating our existing infrastructure to the highest level of reliability and security. You will work side by side with NVIDIA...+ years of experience in Software Development and/or Site Reliability Engineering/Production Engineering. Strong software...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 04 Dec 2024

Senior Site Reliability Engineer - DGX Cloud

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale... at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 12 Jan 2025

Senior Site Reliability Engineer - DGX Cloud

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale... at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 12 Jan 2025

Senior Site Reliability Engineer - Observability and Telemetry Platform

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale... at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 03 Jan 2025

Senior Site Reliability Engineer - GPU Clusters

Engineer to lead the design, deployment, and management of our large-scale GPU clusters. These clusters will power AI workloads... at NVIDIA. Join our engineering team and collaborate with researchers, AI engineers, and infrastructure teams to ensure our GPU...

Company: Nvidia
Location: Santa Clara, CA
Posted Date: 13 Nov 2024

Sr Staff Site Reliability Engineer (Cortex Data Lake)

Alto Networks runs a large infrastructure and is one of the largest GCP customers. As a Senior Staff DevOps Engineer for the CDL/SLS..., architecture, performance, observability, troubleshooting, security, and reliability. Our Infrastructure Platform stack includes...

Location: Santa Clara, CA
Posted Date: 14 Dec 2024
Salary: $126000 - 203500 per year

Senior Distinguished Memory Architect Engineer

and innovative Senior Distinguished Custom Memory Architect Engineer to push the boundaries on next generation memory architectures... next generation AI leading SoCs. What You Can Expect As a Senior Distinguished Memory Architect Engineer in the CTO Office...

Company: Marvell
Location: Santa Clara, CA
Posted Date: 14 Dec 2024

Senior Software Development Engineer, Customer Engagement Technology

languages. Key job responsibilities - Architect and lead the development of robust inference infrastructure for Amazon... with at least one software programming language experience - 5+ years of leading design or architecture (design patterns, reliability...

Company: Amazon
Location: Santa Clara, CA
Posted Date: 09 Jan 2025
Salary: $151300 per year

Principal DevOps Engineer

Your Experience Incident and Alerts Management - Clear understanding of incident and alerts management in Site Reliability... into our systems’ performance and health. Your Impact As a Senior Staff SRE with the Cortex Cloud Security Posture Management team...

Location: Santa Clara, CA
Posted Date: 12 Jan 2025
Salary: $147000 - 237500 per year

Principal DevOps Engineer

Your Experience Incident and Alerts Management - Clear understanding of incident and alerts management in Site Reliability... into our systems' performance and health. Your Impact As a Senior Staff SRE with the Cortex Cloud Security Posture Management team...

Location: Santa Clara, CA
Posted Date: 12 Jan 2025
Salary: $147000 - 237500 per year

Systems Development Engineer, Amazon Elastic VMware Service(EVS)

Systems development engineer and a self-starter who is excited to build something new and work at cloud scale? If the answer.... We are looking for a Systems Development Engineer to build new capabilities to help customers run VMware-based workloads on AWS. The AWS Commercial...

Company: Amazon
Location: Santa Clara, CA
Posted Date: 04 Jan 2025
Salary: $116300 per year

Software Development Engineer II, Network Lifecycle Management

facing routers. In this software dev engineer position, you will be designing, building and owning highly distributed, large... and overhead for customers, while driving up performance, availability and reliability. Why now? We want to expand scope...

Company: Amazon
Location: Santa Clara, CA
Posted Date: 21 Nov 2024
Salary: $129300 per year