🛠 PageToMD: Converting Web Pages to Clean Markdown

PageToMD has been released — a Python-based CLI tool designed to convert web pages into clean Markdown, optimized for AI agents and RAG systems. It supports both fast HTTP requests via httpx and the rendering of complex SPA pages using Playwright.

🌍 Specialized data ingestion tools simplify the creation of high-quality RAG systems by minimizing noise (ads, navigation) and providing structured context for LLMs.

👤 You can use this tool to quickly create a local knowledge base from documentation or articles in a format that is best understood by neural networks.

Source 1: https://github.com/gs202/PageToMD