+91 80748 68174 contactoffcampusjob@gmail.com

Senior Software Engineer, Ai Training & Infrastructure

ChatGPT Jobs Palo Alto, California, US

About the Role

Senior Software Engineer, AI Training & Infrastructure Location: San Mateo, CA Employment Type: On-site, Remote Company: Skild AI, Inc. Key Responsibilities Architecting, building, and maintaining distributed training pipelines and frameworks. Optimizing training performance and resource utilization. Integrating state‑of‑the‑art ML techniques into production training systems. Implementing monitoring, logging, alerting, automated testing, and CI/CD for reliable training operations. Developing developer tooling and documentation. Minimum Requirements Master's degree (or foreign equivalent) in Computer Science, Robotics, Engineering, or a related field. 2 years of experience in machine learning infrastructure. 2 years of experience designing and operating distributed training pipelines at scale. Experience with Python or C++ and at least one deep learning library (e.g., PyTorch, TensorFlow, JAX). Experience with CI/CD and automated testing for ML/infra services. Knowledge of optimizing data loading and I/O for deep learning workloads. Knowledge of processing multimodal datasets and formats, and image processing/compression. Experience with cloud‑based training (AWS, Google Cloud, or Azure). Experience implementing monitoring, logging, and alerting for training systems. Knowledge of Linux OS fundamentals, distributed systems, and ML training techniques/models. Solid understanding of core software engineering principles. #J-18808-Ljbffr

Responsibilities

  • Architect, build and maintain distributed training pipelines
  • Optimize training performance and resources
  • Implement monitoring and CI/CD for training operations

Qualifications

  • Master's degree in CS/Engineering or related
  • 2+ years in ML infra
  • Experience with Python or C++ and DL libraries

Required Skills

Python or C++ PyTorch/TensorFlow/JAX CI/CD for ML/infra distributed training cloud platforms (AWS/GCP/Azure)

Keywords

AI ML Infrastructure Distributed Training MLOps Software Engineering

Interested in this role?

Apply now and take the next step in your career.

Apply Now

Job Overview

Date Posted 5 days ago
Location Palo Alto, California, US
Job Type Full-time
Work Mode Onsite
Experience 2+ years
Category Engineering, Artificial intelligence, Ml infrastructure training pipelines

About the Company

ChatGPT Jobs