🤖 Running LLMs on ESP32-S3 Microcontrollers

A developer has presented a solution for running the Llama architecture on ESP32-S3 chips. Due to memory limitations, model layers are distributed between two controllers via UART (460800 baud). The system supports 15M and 42M parameter models using INT4 quantization and memory-mapped flash.

🌍 The project proves the possibility of distributed inference on extremely constrained Edge hardware, opening the way to local LLMs in cheap IoT devices without cloud dependency.

👤 This is a practical example of how, using simple components and smart resource distribution, one can run a language model on hardware costing only a few dollars.

Source 1: https://github.com/harmansingh4163-ai/ESP-32-s3-Story-maker-LLM