Developersummit
  • HOME
  • SPEAKERS
  • SESSIONS
  • SCHEDULE
  • FAQ
  • BUY TICKETS
  • ONDEMAND
  • CONTACT
saltmarch

GIDS news media, articles, insights and virtual events educate and illuminate its audiences so they can be fully prepared to deal with the new realities at work and in their professions.

Saltmarch On-Demand
Media

Our Experts

Videos On Demand

Insights

Call for Papers

Connect

About Us

Privacy Policy

Terms & Conditions

Contact Us

Subscribe to Developersummit

Get the latest event updates, and insights from today's leading voices.

© 2026-2027 Saltmarch. All rights reserved.

Orchestrating Thousands of GPUs: Engineering Patterns for Large-Scale Model Training
RegisterTwitterLinkedInFacebook

< session />

Orchestrating Thousands of GPUs: Engineering Patterns for Large-Scale Model Training

Tue, April 21 at 2:00 PM - 3:00 PM GMT+5:30ArchitectureDeepTech OpsTech

Training large AI models requires more than raw compute. It demands careful orchestration of multi-node GPU systems, robust communication, and disciplined engineering trade-offs. This session traces the shift from traditional computing models to large-scale parallel training, explaining how distributed training works beneath the surface and what it takes to make it reliable in production. The talk examines real-world challenges in distributed data processing, introduces the five dimensions of parallelism, and walks through practical heuristics and trade-off decisions used to scale AI training architectures across diverse hardware environments.

What You Will Learn

  • How gradient synchronization, collective operations, and fault tolerance operate in practice, including the role of frameworks such as NCCL, Gloo, and MPI

  • The five dimensions of parallelism and how data, tensor, pipeline, expert, and context parallelism are applied at scale

  • Engineering trade-offs across communication patterns, memory management, network topology, and resource utilization in distributed training systems

Who Should Attend

  • Software Architects

  • Platform Engineers

  • Distributed Systems Engineers

  • Infrastructure and Systems Practitioners

  • Technical Leads working on large-scale compute platforms

< speaker_info />

About the speaker

Krishnaswamy Subramanian

Krishnaswamy Subramanian

Principal Consultant, Thoughtworks

Krishnaswamy Subramanian is a Principal Consultant at Thoughtworks with over 18 years of experience in custom software development. As an "expert generalist," he specializes in solving complex technical challenges across full-stack development, mobile applications, and DevOps. His expertise encompasses databases, infrastructure, and Kubernetes, with a proven track record of leading large-scale infrastructure projects.

Throughout his career, Krishnaswamy has served as technical leader, advisor, and principal architect. He is passionate about empowering teams and delivering impactful, scalable solutions. A dedicated knowledge sharer, he has presented at multiple conferences and actively contributes to open-source projects, demonstrating his commitment to technological innovation and community collaboration.

His technical approach focuses on understanding system architectures and creating innovative solutions through strategic development.

Related Talks

The Intersection of Architecture and AI

Thu, April 23

The Intersection of Architecture and AI

Neal Ford
Shaping Intelligent APIs: Scaling LLMs, Open Ecosystems, Enterprise AI

Thu, April 23

Shaping Intelligent APIs: Scaling LLMs, Open Ecosystems, Enterprise AI

Daniel Oh
Architecture as Code

Fri, April 24

Architecture as Code

Neal Ford

On-Demand Talks

Domain-Driven Design - Where Rubber Meets the Road

Domain-Driven Design - Where Rubber Meets the Road

Raju Gandhi
Navigating the Cloud as a Cloud Architect

Navigating the Cloud as a Cloud Architect

Ken Sipe
Mastering the System Design Methodology

Mastering the System Design Methodology

Rohit Bhardwaj
Optimizing Business Workflows for SaaS Applications: A Microservices and Serverless Approach

Optimizing Business Workflows for SaaS Applications: A Microservices and Serverless Approach

Aravind S
Tailor-Made Software Architecture

Tailor-Made Software Architecture

Michael Carducci
Reengineering Monoliths: Cloud-Native Transformation with Jakarta EE, MicroProfile & JBoss EAP

Reengineering Monoliths: Cloud-Native Transformation with Jakarta EE, MicroProfile & JBoss EAP

Daniel Oh
All On-Demand »