News

Qwen 3.6 with MTP Technology Delivers 187 Tokens/sec on Dual RTX 3090s

Utilizing Multi-Token Prediction (MTP) technology on the Qwen 3.6 model enables ultra-high generation speeds on consumer hardware.

Compiled by Sergey KostenchukPublished 2026-06-14Updated 2026-06-14

2026-06-14 Research Meta

Expanded analysis for this story

Open the longform version with context, source trail, and what changed.

Read longform

Wicked Fast Qwen 3.6 27B: 60 tok/s with MTP on RTX 3090 (2026) | InsiderLLM Source

🚀 Qwen 3.6 with MTP technology delivers 187 tokens/sec on dual RTX 3090s

Experimental tests of Qwen 3.6 93B using Multi-Token Prediction (MTP) on two RTX 3090 cards via NVLink showed speeds of up to 187 tokens per second.

🌍 MTP technology shifts the speculative decoding paradigm, allowing for the prediction of multiple tokens simultaneously in a single step.

👤 This makes running powerful LLMs on home GPUs a reality with near-instant response times.

Source 1: https://insiderllm.com/guides/wicked-fast-qwen-3-6-27b-mtp-rtx-3090/ Source 2: https://insiderllm.com/guides/best-way-2x-token-output-rtx-3090-qwen-3-6-dflash/

Sources