Abstract
A groundbreaking technology has been developed that enables artificial intelligence (AI) to make more accurate value judgments in situations where real-time data acquisition is not feasible.
Professor Seungyul Han and his research from the Graduate School of Artificial Intelligence at UNIST has introduced a novel technology, called the Exclusively Penalized Q-learning (EPQ) that enhances the reliability of value functions in offline reinforcement learning (RL) environments. This significant achievement was announced at NeurIPS 2024, one of the premier academic conferences in AI and machine learning, where it was also recognized as a spotlight paper.
Offline RL is a critical component of AI that learns optimal policies using only pre-collected data, particularly in scenarios where real-time data acquisition is challenging. This approach is essential for applications such as drones and autonomous vehicles operating in disaster-stricken areas, where unexpected variables can arise.
Maintaining stable learning performance is crucial, especially when offline RL methods encounter data distributions that differ from real-world situations. Previous offline RL methods faced an issue of underestimation due to the application of uniform penalties across all states. These penalties represent the inability to leverage data in situations where real-time learning is not possible, thus hindering accurate value judgment.
Their EPQ technology selectively penalizes only those states that exhibit high distributional deviations, significantly reducing errors and enabling more accurate learning outcomes.
In practical applications, the research team tested the EPQ technology on a task requiring AI to hammer nails. The existing method struggled to achieve optimal performance due to its indiscriminate penalties, whereas the application of EPQ technology resulted in successful task execution.
Professor Han stated, "This study has significantly broadened the applicability of reinforcement learning across various industries, including autonomous driving, robotic control, and smart manufacturing."
This research was conducted with the support of the Ministry of Science and ICT (MSIT), the Information and Communication Planning and Evaluation Institute (IITP), and UNIST.
Meanwhile, the NeurIPS is recognized as one of the top three AI conferences globally, alongside the International Conference on Learning Representations (ICLR) and the International Conference on Machine Learning (ICML). The 2024 edition of NeurIPS was held in Vancouver, Canada, from April 10 to 15, with only 4,500 of the 15,671 papers submitted worldwide being accepted.
Journal Reference
Junghyuk Yeom, Yonghyeon Jo, Jeongmo Kim, et al., "Exclusively Penalized Q-learning for Offline Reinforcement Learning," NeurIPS, (2024).