⚡️ Paying for LLMs via electricity instead of tokens
Startup NeuralWatt is introducing a new billing model for LLM inference based on electricity consumption (kWh). This has reduced costs for Qwen and Kimi models by an average of 82.9%.
🌍 The transition to an energy-based model incentivizes the optimization of energy efficiency and caching in cloud inference.
👤 Developers can gain access to significantly cheaper inference during intensive request periods.
Source 1: https://www.coinerella.com/energy-based-llm-billing-cut-my-bill-to-a-sixth/
