Subquadratic Unveils SubQ Model to Accelerate LLMs

🚀 Breakthrough in LLM Architecture from Subquadratic

The startup has introduced the SubQ model, which utilizes Dynamic Sparse Attention to bypass the quadratic complexity of standard transformers. This allows it to operate 56 times faster than FlashAttention and support a context window of up to 12 million tokens with 98% accuracy in needle-in-a-haystack tests.

🌍 The transition to dynamic sparse attention radically reduces the computational cost of processing long contexts, enabling the analysis of massive codebases and archives without exponential cost growth.

👤 This brings us closer to an era where neural networks can instantaneously analyze entire libraries or thousands of code files at once, working faster and cheaper than current models like GPT-4.

Source 1: https://www.technologyreview.com/2026/06/19/1139313/a-startup-claims-it-broke-through-a-bottleneck-thats-holding-back-llms/

Sources