New research by AlexWortega shows that using different optimization (alignment) methods leads to fundamentally different internal weight structures in large language models, even if external quality metrics remain identical.


What Happened
AlexWortega's research found that SFT, RFT, DFT, and offline GRPO methods form similar weight landscapes when training on the same data. At the same time, DPO, GRPO, and DAPO methods create fundamentally different weight structures. This weight geometry effect remains stable and is independent of hyperparameters such as learning rate or random seed.
Context
During LLM training, alignment methods are used to fine-tune model behavior. It is traditionally assumed that if models show the same results on benchmarks, their internal workings are similar; however, this work challenges that assumption by analyzing the weight geometry itself.
Why It Matters for the Industry
For the industry, this means that high benchmark metrics can mask deep differences in internal knowledge representations. This is critically important for developing transfer learning methods and assessing model reliability. Understanding these differences allows for the creation of unique technological advantages (moats) through specialized training methods.
Why It Matters for Users
For users and developers, this explains why models with identical accuracy scores may behave differently in non-standard tasks or edge-case scenarios. Their "internal map" of knowledge is constructed differently depending on the chosen training method, which directly affects the predictability of model behavior.
What Remains Unknown / Limitations
No direct technical disagreements regarding the research results have been identified.
Sources
Author
Look at AI, Editorial Staff
