Daniel Han wrote Unsloth, the reason half of open-source can fine-tune a model on one GPU instead of a cluster.
He didn't optimize the math. He rewrote the kernels by hand, found bugs in everyone else's code, and made training 2 to 3 times faster with zero accuracy loss.
Millions of fine-tunes run through his code every month. Most people training a model locally are standing on it without knowing.
Everyone talks about who has the most GPUs. He made yours enough.
h100envy@h100envy