Serverless GPU Architecture for Enterprise HR Analytics: A Production-Scale BDaaS Implementation
2510.19689v1
cs.DC, cs.AI, cs.LG, C.2.4; H.3.4; I.2.6
2025-10-24
Авторы:
Guilin Zhang, Wulan Guo, Ziqi Tan, Srinivas Vippagunta, Suchitra Raman, Shreeshankar Chatterjee, Ju Lin, Shang Liu, Mary Schladenhauffen, Jeffrey Luo, Hailong Jiang
Abstract
Industrial and government organizations increasingly depend on data-driven
analytics for workforce, finance, and regulated decision processes, where
timeliness, cost efficiency, and compliance are critical. Distributed
frameworks such as Spark and Flink remain effective for massive-scale batch or
streaming analytics but introduce coordination complexity and auditing
overheads that misalign with moderate-scale, latency-sensitive inference.
Meanwhile, cloud providers now offer serverless GPUs, and models such as TabNet
enable interpretable tabular ML, motivating new deployment blueprints for
regulated environments. In this paper, we present a production-oriented Big
Data as a Service (BDaaS) blueprint that integrates a single-node serverless
GPU runtime with TabNet. The design leverages GPU acceleration for throughput,
serverless elasticity for cost reduction, and feature-mask interpretability for
IL4/FIPS compliance. We conduct benchmarks on the HR, Adult, and BLS datasets,
comparing our approach against Spark and CPU baselines. Our results show that
GPU pipelines achieve up to 4.5x higher throughput, 98x lower latency, and 90%
lower cost per 1K inferences compared to Spark baselines, while compliance
mechanisms add only ~5.7 ms latency with p99 < 22 ms. Interpretability remains
stable under peak load, ensuring reliable auditability. Taken together, these
findings provide a compliance-aware benchmark, a reproducible Helm-packaged
blueprint, and a decision framework that demonstrate the practicality of
secure, interpretable, and cost-efficient serverless GPU analytics for
regulated enterprise and government settings.