Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation
2510.10676v1
cs.AR, cs.CL, cs.RO, eess.AS
2025-10-15
Авторы:
Mukul Lokhande, Tanushree Dewangan, Mohd Sharik Mansoori, Tejas Chaudhari, Akarsh J., Damayanti Lokhande, Adam Teman, Santosh Kumar Vishvakarma
Abstract
This paper introduces Bhasha-Rupantarika, a light and efficient multilingual
translation system tailored through algorithm-hardware codesign for
resource-limited settings. The method investigates model deployment at
sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental
results indicating a 4.1x reduction in model size (FP4) and a 4.2x speedup in
inference speed, which correlates with an increased throughput of 66 tokens/s
(improvement by 4.8x). This underscores the importance of ultra-low precision
quantization for real-time deployment in IoT devices using FPGA accelerators,
achieving performance on par with expectations. Our evaluation covers
bidirectional translation between Indian and international languages,
showcasing its adaptability in low-resource linguistic contexts. The FPGA
deployment demonstrated a 1.96x reduction in LUTs and a 1.65x decrease in FFs,
resulting in a 2.2x enhancement in throughput compared to OPU and a 4.6x
enhancement compared to HPTA. Overall, the evaluation provides a viable
solution based on quantisation-aware translation along with hardware efficiency
suitable for deployable multilingual AI systems. The entire codes
[https://github.com/mukullokhande99/Bhasha-Rupantarika/] and dataset for
reproducibility are publicly available, facilitating rapid integration and
further development by researchers.