PyTorch is the biggest codebase I have ever had to deal with, and the first time I opened it I wasn’t trying to contribute at all, I was just trying to figure out why my PyTorch code was not working. A NumPy upgrade had broken code I relied on, error messages from torch.compile were long, and a downstream library (pyhf) had started pinning dependencies just to keep CI running. Chasing those problems pulled me into parts of PyTorch I’d never expected to touch, and along the way I ended up shipping three upstream fixes and affecting a small change in pyhf itself. This post is a reflection on that journey: what I learned about working inside a huge codebase, the pattern I now use to structure my contributions, and how you can apply the same ideas to your own first OSS PRs.
As I worked on my contributions, I kept coming back to three essential steps: Stabilize, Isolate, Generalize.
The rest of this post walks through that loop using three contributions to PyTorch and one related change in the high‑energy‑physics library pyhf. Together they show how small, well‑scoped changes in a giant codebase can ripple outward into the wider ecosystem.
When NumPy 2.0 was released, it introduced a new numpy.bool_ scalar type. PyTorch’s NumPy interop code tried to interpret these booleans as integers, which blew up with:
TypeError: 'numpy.bool' object cannot be interpreted as an integer
This wasn’t just a theoretical bug. PyTorch’s own NumPy tests started failing (tracked in issue #157973), and any downstream project that used NumPy scalars with the PyTorch backend saw the same error. One of those projects was pyhf, a statistical modelling library used in high‑energy physics. Its maintainers ended up pinning numpy<2.0 in the “PyTorch” extra and referencing the PyTorch issue in the commit message, just to keep CI green.
In PR #158036 I fixed the root cause inside PyTorch’s Python C API. The change is small but very targeted: when PyTorch sees a NumPy scalar, it now checks explicitly if it’s a numpy.bool_ (via torch::utils::is_numpy_bool) and, if so, calls PyObject_IsTrue to get its truth value instead of treating it as an integer. The accompanying tests exercise this path directly, so future refactors won’t silently re‑break it.
This is the Stabilize step in action:
pyhf to eventually remove their numpy<2.0 pin and test against modern NumPy.ndarray.astype(object) into a clear errorThe other class of failures I kept hitting weren’t about versions, they were about mysterious compiler errors. If you called torch.compile on code that used ndarray.astype("O") or astype(object), Dynamo would try to trace it and eventually die deep inside fake tensor propagation with something like:
torch.dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: ... got TypeError("data type 'O' not understood")
This is technically “correct” since PyTorch can’t compile dynamic Python objects, but from a user’s perspective it’s terrible: there’s no clear statement that object‑dtype arrays are unsupported, and the error looks like a random internal failure.
In PR #157810 I added a small but opinionated guardrail inside Dynamo’s NumPy handling. When NumpyNdarrayVariable.call_method sees .astype("O") or .astype(object), it now immediately raises torch._dynamo.exc.Unsupported with an explicit explanation that object‑dtype NumPy arrays are not supported by torch.compile. A new test, test_ndarray_astype_object_graph_break, asserts that this error is raised both when compiling and when calling astype directly. During review, this even turned up a stale xfail: once the behaviour was fixed, a previously “expected to fail” test started passing and the xfail could be removed.
This is the Isolate step in practice:
Unsupported error.ndarray.astype(object) as unsupported, we give users something they can actually act on.xfail) make it obvious to future contributors that this pattern should never silently “work” again.F.one_hot work with transformsThe third category of issues I ran into was more subtle. torch.func.jacfwd would fail when it encountered torch.nn.functional.one_hot inside torch.compile(dynamic=True), often only once dynamic shapes and fake tensors were involved. The existing vmap rule for one_hot expanded the operation into a pattern that allocated a zeros tensor and then scattered indices into it, which confused shape inference and did not play well with dynamic tracing.
In PR #160837 I rewrote this behaviour in terms of a purely functional comparison. Instead of building a zeros tensor and scattering, the vmap rule now constructs class labels with arange(num_classes) and compares each index against that range with an expression like eq(self.unsqueeze(-1), arange(num_classes)).to(kLong). This avoids the problematic scatter step entirely and gives Dynamo a simple, elementwise view of the operation that is easier to reason about under dynamic shapes. The change also updates the C++ batch rules and extends tests for dynamic tracing, JIT, and eager execution.
This is the Generalize step. Rather than patching one failing test, the goal is to design F.one_hot so that it behaves predictably across the whole transform ecosystem: vmap, jacfwd, torch.compile(dynamic=True), and friends. Thinking this way turns a local bug fix into a broader improvement in how the library composes.
Working inside PyTorch did not feel glamorous most of the time. It felt like reading unfamiliar subsystems until my brain couldn't take it anymnore, running the same failing test again and again as per usual with tests, and slowly building a mental map of where things lived. What helped most was treating every review comment and CI failure as a clue, not a verdict. One important lesson is that the maintainers were not asking for perfection. They were asking for clarity, good tests, and changes that fit the design of the project.
Another essential point to keep in mind is the importance of communication. After submitting your pull request, making the effort to communicate your intentions and decisions clearly can make a big difference in how the review process goes. For example, my very first pull request attracted more than forty comments from PyTorch maintainers. At first, it felt overwhelming, but actively engaging with their feedback asking clarifying questions, explaining my thought process, and being open to suggestions really helped move the discussion forward.
Good communication also means being willing to revise your changes, to explain the reasoning behind what you have done, and to acknowledge misunderstandings or mistakes as they come up. Building this kind of open dialogue not only improves your code but also builds trust with reviewers and maintainers, making collaboration much more productive and enjoyable. I learned a lot from the comments left on my first PR and subsequent PRs received significantly less comments.
Looking back, a few themes stand out:
If you want to contribute to a large project like PyTorch, you do not need to wait until you feel like an expert. You can start from the same place I did, with a real bug that affects you and a desire to make it go away in a principled way.
torch.compile, vmap, or different backends. Add tests that cover those cases so the improvement sticks.The core idea is simple. Stabilize what you can see. Isolate the behavior you want to change. Generalize the fix so that it helps more than one user. This loop scales from tiny bug fixes to significant features, and it is a good way to navigate any codebase that feels larger than you are.
Stepping back, this story started with my own code breaking and with me not quite believing that I was ready to contribute to a codebase like PyTorch. Following those threads through PyTorch, guided by the errors I was hitting in my own work, taught me that you can make meaningful changes even when the system feels too big to hold in your head. It may not work for everyone, but if you apply the same loop of stabilizing what you depend on, isolating the behavior you care about, and generalizing each fix so that it helps more than your own code, you can slowly build both your understanding and your impact. This is the path from a confusing error message to a pull request that quietly improves life for other people.
(Listed in order of completion)