What’s the simplest neural network I can build with an RTX 5090?

I use uv for all my Python projects so I started there https://docs.astral.sh/uv/guides/integration/pytorch/#installing-pytorch

The instructions are long but I went with the simplest option:

To start, consider the following (default) configuration, which would be generated by running uv init —python 3.14 followed by uv add torch torchvision.

Now a quick check to confirm we are using the GPU:

import torch
 
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

The above prints out cuda. Beatiful.

I asked Claude for the simplest neural network I could build and it gave me this code:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
 
 
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)
 
train = DataLoader(
    datasets.MNIST(".", train=True, download=True, transform=transforms.ToTensor()),
    batch_size=128,
    shuffle=True,
)
 
 
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.c1 = nn.Conv2d(1, 32, 3)
        self.c2 = nn.Conv2d(32, 64, 3)
        self.fc1 = nn.Linear(64 * 5 * 5, 128)
        self.fc2 = nn.Linear(128, 10)
 
    def forward(self, x):
        x = F.max_pool2d(F.relu(self.c1(x)), 2)
        x = F.max_pool2d(F.relu(self.c2(x)), 2)
        return self.fc2(F.relu(self.fc1(x.flatten(1))))
 
 
model = Net().to(device)
opt = torch.optim.Adam(model.parameters(), lr=1e-3)
 
for epoch in range(3):
    for x, y in train:
        x, y = x.to(device), y.to(device)
        opt.zero_grad()
        loss = F.cross_entropy(model(x), y)
        loss.backward()
        opt.step()
    print(epoch, loss.item())

Explanation:

  • Import several dependencies
  • Sanity check to confirm we are using GPU
  • Load datasets
  • Outer DataLoader wraps that Dataset to yield mini-batches: 128 examples at a time, reshuffled each epoch. When iterate over it, each step gives you x of shape [128, 1, 28, 28] and y of shape [128]
  • We define the network from nn.Module
  • The four layers:
    • c1: convolution, 1 input channel (grayscale) → 32 output channels (feature maps), 3×3 kernel
    • c2: 32 → 64 channels, 3×3 kernel
    • fc1: fully-connected layer, 1600 → 128. The 6455 is the flattened size after the convolutions and pooling — I’ll explain where 5×5 comes from in a sec.
    • fc2: 128 → 10. Ten outputs, one per digit class.
  • Perform forward pass:
    • c1(x) → [128, 32, 26, 26] (a 3×3 conv with no padding shrinks each spatial dim by 2). relu zeros out negatives. max_pool2d(…, 2) takes the max over each 2×2 block, halving spatial dims → [128, 32, 13, 13].
    • c2 → [128, 64, 11, 11]. Pool → [128, 64, 5, 5]. That’s where the 6455 = 1600 in fc1 came from.
    • flatten(1) collapses everything from dim 1 onward, so [128, 64, 5, 5] → [128, 1600]. Through fc1, ReLU, fc2 → final shape [128, 10]. Ten raw scores (logits) per image.
  • Build the model and move all its parameters onto the GPU. Use Adam as the optimiser, it’ll update the weights using gradients. model.parameters() hands it every learnable tensor. lr=1e-3 is the learning rate, a sensible default for Adam.

The four-step heartbeat of basically every PyTorch training loop:

  1. opt.zero_grad() — clear out gradients from the previous batch. PyTorch accumulates gradients by default, so without this they’d pile up across batches and ruin training.
  2. model(x) — forward pass, calls your forward method, returns logits [128, 10]. F.cross_entropy(logits, y) computes the loss: how wrong the predictions are vs. the true labels. Returns a single scalar.
  3. loss.backward() — autograd walks backward through every operation that produced loss and computes the gradient of the loss with respect to each parameter. Stores those gradients on the parameters themselves.
  4. opt.step() — Adam reads those gradients and nudges each parameter in the direction that reduces loss.
  • Once per epoch we print out the result: print(epoch, loss.item())

That’s it.

Excited for what’s to come.