BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models
2511.04919v1
cs.CL, cs.AI, I.2.7; I.2.6; H.3.3
2025-11-11
Авторы:
Chandra Vamsi Krishna Alla, Harish Naidu Gaddam, Manohar Kommi
Abstract
Large Language Models (LLMs) face significant computational and memory
constraints when processing long contexts, despite growing demand for
applications requiring reasoning over extensive documents, multi-session
dialogues, and book length texts. While recent advances have extended context
windows to 100K-1M tokens, such approaches incur prohibitive costs for resource
constrained deployments. We propose BudgetMem, a novel memory augmented
architecture that learns what to remember rather than remembering everything.
Our system combines selective memory policies with feature based salience
scoring (entity density, TF-IDF, discourse markers, position bias) to decide
which information merits storage under strict budget constraints. Unlike
existing retrieval augmented generation (RAG) systems that store all chunks,
BudgetMem employs learned gating mechanisms coupled with BM25 sparse retrieval
for efficient information access. Through comprehensive experiments on 700
question answer pairs across short (237 tokens) and long (5K-10K tokens)
documents with Llama-3.2-3B-Instruct, we demonstrate that BudgetMem achieves
remarkable results on long documents: only 1.0% F1 score degradation while
saving 72.4% memory compared to baseline RAG. We validate our approach through
budget sensitivity analysis (testing 7 budget ratios), naive baseline
comparisons, and document length analysis, showing that BudgetMem's benefits
increase with document length. Our work provides a practical pathway for
deploying capable long context systems on modest hardware, democratizing access
to advanced language understanding capabilities.