Job Search Results

Senior Site Reliability Engineer, AI Infrastructure

. What You Will Be Doing: Develop and maintain large-scale systems supporting critical use cases for AI Infrastructure, driving reliability.... Come join the AI Infrastructure Production engineering team and see how you can make a lasting impact on the world...

Apply Now

Company: Nvidia

Location: Santa Clara, CA

Posted Date: 11 Jan 2025

Senior Site Reliability Engineer - AI Research Clusters

to improve researchers productivity. As a Site Reliability Engineer, you are responsible for the big picture of how our systems... infrastructure Proven experience in site reliability engineering for high-performance computing environments with operational...

Apply Now

Company: Nvidia

Location: Santa Clara, CA

Posted Date: 03 Jan 2025

Senior Staff Site Reliability Engineer, Database Platform

operating our existing infrastructure to the highest level of reliability and security. You will work side by side with NVIDIA...+ years of experience in Software Development and/or Site Reliability Engineering/Production Engineering. Strong software...

Apply Now

Company: Nvidia

Location: Santa Clara, CA

Posted Date: 04 Dec 2024

Senior Site Reliability Engineer - DGX Cloud

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale... at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the...

Apply Now

Company: Nvidia

Location: Santa Clara, CA

Posted Date: 12 Jan 2025

Senior Site Reliability Engineer - DGX Cloud

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale... at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the...

Apply Now

Company: Nvidia

Location: Santa Clara, CA

Posted Date: 12 Jan 2025

Senior Site Reliability Engineer - Observability and Telemetry Platform

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale... at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the...

Apply Now

Company: Nvidia

Location: Santa Clara, CA

Posted Date: 03 Jan 2025

Senior Site Reliability Engineer - GPU Clusters

Engineer to lead the design, deployment, and management of our large-scale GPU clusters. These clusters will power AI workloads... at NVIDIA. Join our engineering team and collaborate with researchers, AI engineers, and infrastructure teams to ensure our GPU...

Apply Now

Company: Nvidia

Location: Santa Clara, CA

Posted Date: 13 Nov 2024

Sr Staff Site Reliability Engineer (Cortex Data Lake)

Alto Networks runs a large infrastructure and is one of the largest GCP customers. As a Senior Staff DevOps Engineer for the CDL/SLS..., architecture, performance, observability, troubleshooting, security, and reliability. Our Infrastructure Platform stack includes...

Apply Now

Company: Palo Alto Networks

Location: Santa Clara, CA

Posted Date: 14 Dec 2024

Salary: $126000 - 203500 per year

Senior Distinguished Memory Architect Engineer

and innovative Senior Distinguished Custom Memory Architect Engineer to push the boundaries on next generation memory architectures... next generation AI leading SoCs. What You Can Expect As a Senior Distinguished Memory Architect Engineer in the CTO Office...

Apply Now

Company: Marvell

Location: Santa Clara, CA

Posted Date: 14 Dec 2024

Senior Software Development Engineer, Customer Engagement Technology

languages. Key job responsibilities - Architect and lead the development of robust inference infrastructure for Amazon... with at least one software programming language experience - 5+ years of leading design or architecture (design patterns, reliability...

Apply Now

Company: Amazon

Location: Santa Clara, CA

Posted Date: 09 Jan 2025

Salary: $151300 per year

Principal DevOps Engineer

Your Experience Incident and Alerts Management - Clear understanding of incident and alerts management in Site Reliability... into our systems’ performance and health. Your Impact As a Senior Staff SRE with the Cortex Cloud Security Posture Management team...

Apply Now

Company: Palo Alto Networks

Location: Santa Clara, CA

Posted Date: 12 Jan 2025

Salary: $147000 - 237500 per year

Principal DevOps Engineer

Your Experience Incident and Alerts Management - Clear understanding of incident and alerts management in Site Reliability... into our systems' performance and health. Your Impact As a Senior Staff SRE with the Cortex Cloud Security Posture Management team...

Apply Now

Company: Palo Alto Networks

Location: Santa Clara, CA

Posted Date: 12 Jan 2025

Salary: $147000 - 237500 per year

Systems Development Engineer, Amazon Elastic VMware Service(EVS)

Systems development engineer and a self-starter who is excited to build something new and work at cloud scale? If the answer.... We are looking for a Systems Development Engineer to build new capabilities to help customers run VMware-based workloads on AWS. The AWS Commercial...

Apply Now

Company: Amazon

Location: Santa Clara, CA

Posted Date: 04 Jan 2025

Salary: $116300 per year

Software Development Engineer II, Network Lifecycle Management

facing routers. In this software dev engineer position, you will be designing, building and owning highly distributed, large... and overhead for customers, while driving up performance, availability and reliability. Why now? We want to expand scope...

Apply Now

Company: Amazon

Location: Santa Clara, CA

Posted Date: 21 Nov 2024

Salary: $129300 per year

Find your dream job now!

Keywords: Senior Site Reliability Engineer, AI Infrastructure, Location: Santa Clara, CA

Page: 1

Senior Site Reliability Engineer, AI Infrastructure

Senior Site Reliability Engineer - AI Research Clusters

Senior Staff Site Reliability Engineer, Database Platform

Senior Site Reliability Engineer - DGX Cloud

Senior Site Reliability Engineer - DGX Cloud

Senior Site Reliability Engineer - Observability and Telemetry Platform

Senior Site Reliability Engineer - GPU Clusters

Sr Staff Site Reliability Engineer (Cortex Data Lake)

Senior Distinguished Memory Architect Engineer

Senior Software Development Engineer, Customer Engagement Technology

Principal DevOps Engineer

Principal DevOps Engineer

Systems Development Engineer, Amazon Elastic VMware Service(EVS)

Software Development Engineer II, Network Lifecycle Management