🤖 Small LLM Benchmark on NVIDIA Jetson Orin Nano Super
Research has shown that the 25W power mode is optimal for llama.cpp, providing 35–47% higher throughput compared to 15W. The SmolLM2-135M model achieved 165.2 tok/s.
🌍 The results emphasize the importance of optimizing CUDA kernels for new architectures, as unoptimized backends like Ollama can be up to 4 times slower than llama.cpp. This is critical for edge devices.
👤 When running local neural networks on compact NVIDIA hardware, the 25W mode on llama.cpp provides the best balance of speed and energy efficiency.
Source 1: https://www.smolhub.com/posts/jetson-nano-super-benchmark-non-reasoning/
