🧠 Latent Space Geometry Persists Across LLM Architecture Changes
In the paper "Thinking in Different Spaces," it is demonstrated that different model architectures (MoE, Dense) form similar latent space geometries. Using Ridge projection, researchers successfully transferred "teacher" activations into the "student" space, increasing accuracy on TruthfulQA by 25.2% and on GSM8K by 25.5% without fine-tuning weights.
🌍 This paves the way for effective Zero-Shot Steering — the ability to "boost" the intelligence of small models via linear transformations, significantly reducing inference costs.
👤 Small and fast models can be made to reason at the level of large models simply by "guiding" them toward the correct directions within their own internal mathematical spaces.
Source 1: https://arxiv.org/abs/2603.20406
