Building an OSS tool for debugging slow or poorly scaling PyTorch training; looking for user feedback and collaborators

abhinavsriva · March 27, 2026, 2:48pm

I am working on an open-source tool focused on a simple question:

why is this PyTorch training run slower than it should be, and what is actually bottlenecking it?

I am trying to make this easier to understand for ML engineers and researchers without needing to jump straight into heavy profiling or piece together multiple low-level tools. I would really value input from people running real workloads:

what is missing from current tooling?
what part of the debugging workflow is still too manual or unclear?
what would make a tool like this genuinely useful to you?

I am also open to collaborating with people who care deeply about this problem and may want to contribute to the project over time. My main goal right now is to learn from real users and shape the tool around actual pain, not assumptions.

Repo: GitHub - traceopt-ai/traceml: Find why PyTorch training is slow while it’s still running · GitHub

Topic		Replies	Views
Looking for feedback from PyTorch users on an ML prototyping workflow idea Community Calls	2	15	March 13, 2026
Very slow training (>5mins per batch) - code review request Research	2	699	October 11, 2023
Is really Trainer class support TPU for faster training? DeepSpeed	2	364	October 2, 2022
Distributed Training with Trainer Class is Really Slow Beginners	0	1132	October 24, 2022
Is the Trainer slower than customised loops? 🤗Transformers	3	148	July 4, 2025

Building an OSS tool for debugging slow or poorly scaling PyTorch training; looking for user feedback and collaborators

Related topics