The AI team at VK has developed Discovery AI, an advanced neural search and recommendation system for its ecosystem, which includes VK, Dzen, and Mail.ru. The technology is based on a hybrid approach and optimized language models, allowing it to process massive datasets in real-time with minimal latency.
What Happened
VK developers have implemented Discovery AI, which uses a combination of classical BM25 search and vector search to ensure query accuracy. At the core of the system lies a custom LLM based on LLaMA (8B), which underwent knowledge distillation and Quantization Aware Training (QAT). This has enabled a throughput of up to 30,000 document chunks per second on a single GPU with a response latency of less than 500 ms.
Context
The project marks a transition from simple demonstration chatbots to full-scale, industrial-grade RAG (Retrieval-Augmented Generation) systems. For training and quality assessment, the team used the LLM-as-a-Judge methodology, which allows for the generation of synthetic training data and efficient management of model preparation pipelines.
Why It Matters for the Industry
The solution demonstrates the feasibility of deploying high-load custom LLMs in production at the level of major content platforms with acceptable latency. This sets a new efficiency standard for the local AI solutions market, shifting the focus from simple interfaces to complex search engines capable of working with fragmented data within large ecosystems.
Why It Matters for Users
Users of VK, Dzen, and Mail.ru services will gain access to search that understands the semantic meaning of queries rather than just keywords. This will allow for more accurate, relevant, and detailed answers based on real content, significantly improving the quality of recommendations and personalized interaction.
What Is Not Yet Known / Limitations
There are differing views on the project's risks: while technical specialists focus on architectural efficiency, legal experts point to potential ethical questions regarding the use of synthetic data and the risks of processing cross-service data within the ecosystem.
Sources
Author
Look at AI, Editorial Staff
