Thoughts on software engineering, AI, side projects, and the things I learn along the way.
A rough guide to what NATs actually mean in LLM training loss, because I kept seeing this term everywhere and finally decided to understand it.
I spent a day fine-tuning Qwen3-VL-2B for PDF-to-markdown conversion using SFT + GRPO. Total cost: $4. Here's what worked, what didn't, and why GRPO alone fails on vision models.
Three weeks running Claude 24/7 taught me how to make coding agents actually work: verification loops, team standards, and the right tooling setup.
Scattered thoughts on the nature of trust in LLMs, how context gives meaning, and language as a facilitator of senses.
Striving for the best in the age of Gen AI, and preparing for the unknown future.
Notes on the DeepSeek-R1 paper — how pure reinforcement learning with GRPO enables emergent reasoning in LLMs without supervised data.