Paying for LLM Inference per Kilowatt-Hour Instead of Tokens...

Startup NeuralWatt is introducing a revolutionary pricing model for Large Language Model (LLM) inference based on actual electricity consumption (kWh) rather than the number of generated tokens. This approach allows for a radical reduction in operational costs, especially when using efficient open-weights models.

What Happened

NeuralWatt has proposed a shift from traditional pay-per-token models to one tied to energy consumption. According to user data, using this model for the Qwen and Kimi model families allowed for an average cost reduction of 82.9%. In specific cases, such as for the qwen3.6-35b-fast model, savings reached as high as 95.2%.

Context

The traditional pay-per-token model does not always account for the actual computational load and the efficiency of the provider's infrastructure. Moving to energy-based pricing makes the cost directly dependent on caching efficiency and GPU/NPU utilization optimization, creating incentives to develop more advanced hosting methods.

Why It Matters for the Industry

For the industry, this means a potential shift in the economics of cloud inference. The emergence of an energy-based pricing model encourages providers to invest in energy efficiency and "green" AI hosting. This could lead to increased competition and the destabilization of standard token prices as new players emerge, focused on minimizing kWh per request.

Why It Matters for Users

Developers and companies with high LLM workloads can significantly reduce their OPEX, especially when working with open-weights models. Transitioning to an energy-based model provides more predictable costs for intensive and repetitive requests, where caching efficiency plays a decisive role.

What Is Not Yet Known / Limitations

Technical specialists and architects express moderate skepticism, pointing to the need for verification of the accuracy of energy consumption measurements and potential complexities when integrating such a model into existing workflows.

Sources

Coinerella

Author

Look at AI, Editorial Team