Senior Software Engineer, Ai Training & Infrastructure
About the Role
Senior Software Engineer, AI Training & Infrastructure Location: San Mateo, CA Employment Type: On-site, Remote Company: Skild AI, Inc. Key Responsibilities Architecting, building, and maintaining distributed training pipelines and frameworks. Optimizing training performance and resource utilization. Integrating state‑of‑the‑art ML techniques into production training systems. Implementing monitoring, logging, alerting, automated testing, and CI/CD for reliable training operations. Developing developer tooling and documentation. Minimum Requirements Master's degree (or foreign equivalent) in Computer Science, Robotics, Engineering, or a related field. 2 years of experience in machine learning infrastructure. 2 years of experience designing and operating distributed training pipelines at scale. Experience with Python or C++ and at least one deep learning library (e.g., PyTorch, TensorFlow, JAX). Experience with CI/CD and automated testing for ML/infra services. Knowledge of optimizing data loading and I/O for deep learning workloads. Knowledge of processing multimodal datasets and formats, and image processing/compression. Experience with cloud‑based training (AWS, Google Cloud, or Azure). Experience implementing monitoring, logging, and alerting for training systems. Knowledge of Linux OS fundamentals, distributed systems, and ML training techniques/models. Solid understanding of core software engineering principles. #J-18808-Ljbffr
Responsibilities
- Architect, build and maintain distributed training pipelines
- Optimize training performance and resources
- Implement monitoring and CI/CD for training operations
Qualifications
- Master's degree in CS/Engineering or related
- 2+ years in ML infra
- Experience with Python or C++ and DL libraries
Required Skills
Keywords
Interested in this role?
Apply now and take the next step in your career.
