Virtual Tech Gurus
Published
July 9, 2024
Location
Ashburn, VA
Category
Default
Job Type
C2H
Duration
6 Months

Description

Position Overview:
  • We are looking for a skilled NVIDIA DGX Infrastructure Engineer to join our dynamic team. As a DGX Infrastructure Engineer, you will be responsible for managing NVIDIA DGX-based infrastructure. You will play a crucial role in ensuring the optimal performance, reliability, and scalability of NVIDIA DGX infrastructure. 
Key Responsibilities: 
  • Managing NVIDIA DGX systems and related infrastructure. 
  • Configuring and optimizing DGX clusters for performance, reliability, and scalability. 
  • Collaborating with data scientists, AI engineers, and IT teams to integrate DGX systems into the overall AI and deep learning workflows. 
  • Monitoring system performance and implementing proactive measures to maintain optimal operation. 
  • Troubleshooting and resolving issues related to DGX systems, including hardware, software, and network components. 
  • Implementing security measures and best practices to ensure the integrity and confidentiality of DGX-based data and workflows. 
  • Documenting infrastructure configurations, processes, and procedures. 
  • Providing technical guidance and training to team members on DGX-related technologies and best practices. 
  • Staying current with NVIDIA DGX hardware and software advancements and recommending upgrades or enhancements as needed. 
Requirements: 
  • Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.  
  • Proven experience in managing NVIDIA DGX systems in production environments. 
  • Understanding of AI and deep learning frameworks and their integration with NVIDIA DGX systems. 
  • Proficiency in scripting languages such as Python for automation and configuration management. 
  • Experience with virtualization technologies (e.g., Docker, Kubernetes) in conjunction with DGX systems. 
  • Knowledge of storage solutions (e.g., NFS, Ceph) and their integration with DGX clusters. 
  • Familiarity with networking principles, protocols, and configurations related to DGX infrastructure. 
  • Excellent troubleshooting and problem-solving skills. 
  • Ability to work independently and collaboratively in a team environment. 
  • Effective communication skills, both verbal and written. 

JOBID: 11717

Apply
Drop files here browse files ...