🚀 Qwen 3.6 with MTP technology delivers 187 tokens/sec on dual RTX 3090s

Experimental tests of Qwen 3.6 93B using Multi-Token Prediction (MTP) on two RTX 3090 cards via NVLink showed speeds of up to 187 tokens per second.

🌍 MTP technology shifts the speculative decoding paradigm, allowing for the prediction of multiple tokens simultaneously in a single step.

👤 This makes running powerful LLMs on home GPUs a reality with near-instant response times.

Source 1: https://insiderllm.com/guides/wicked-fast-qwen-3-6-27b-mtp-rtx-3090/ Source 2: https://insiderllm.com/guides/best-way-2x-token-output-rtx-3090-qwen-3-6-dflash/