🛠 Monitoring Reward Hacking in RL
The open-source tool rewardspy (by author AvAdiii) has been released for debugging and visualizing reward functions in RL. The library detects "reward hacking" via a terminal dashboard by tracking anomalies and variance collapse.
🌍 It helps combat "Goodhart's Law," allowing for automated training audits and preventing agent degradation in CI/CD.
👤 It replaces print(reward) with full statistical diagnostics to better understand agent behavior.
Source 1: https://github.com/AvAdiii/rewardspy
