Mapping Smarter, Not Harder: A Test-Time Reinforcement Learning Agent That Improves Without Labels or Model Updates
2510.14900v1
cs.AI, cs.CR
2025-10-18
Авторы:
Wen-Kwang Tsao, Yao-Ching Yu, Chien-Ming Huang
Abstract
The Enterprise Intelligence Platform must integrate logs from numerous
third-party vendors in order to perform various downstream tasks. However,
vendor documentation is often unavailable at test time. It is either misplaced,
mismatched, poorly formatted, or incomplete, which makes schema mapping
challenging. We introduce a reinforcement learning agent that can self-improve
without labeled examples or model weight updates. During inference, the agent:
1) Identifies ambiguous field-mapping attempts. 2) Generates targeted
web-search queries to gather external evidence. 3) Applies a confidence-based
reward to iteratively refine its mappings. To demonstrate this concept, we
converted Microsoft Defender for Endpoint logs into a common schema. Our method
increased mapping accuracy from 56.4\%(LLM-only) to 72.73\%(RAG) to 93.94\%
over 100 iterations using GPT-4o. At the same time, it reduced the number of
low-confidence mappings requiring expert review by 85\%. This new approach
provides an evidence-driven, transparent method for solving future industry
problems, paving the way for more robust, accountable, scalable, efficient,
flexible, adaptable, and collaborative solutions.
Ссылки и действия
Дополнительные ресурсы: