Hybrid Quantum-Classical Policy Gradient for Adaptive Control of Cyber-Physical Systems: A Comparative Study of VQC vs. MLP
2510.06010v1
quant-ph, cs.AI, cs.LG, cs.RO, cs.SY, eess.SY
2025-10-09
Авторы:
Aueaphum Aueawatthanaphisut, Nyi Wunna Tun
Abstract
The comparative evaluation between classical and quantum reinforcement
learning (QRL) paradigms was conducted to investigate their convergence
behavior, robustness under observational noise, and computational efficiency in
a benchmark control environment. The study employed a multilayer perceptron
(MLP) agent as a classical baseline and a parameterized variational quantum
circuit (VQC) as a quantum counterpart, both trained on the CartPole-v1
environment over 500 episodes. Empirical results demonstrated that the
classical MLP achieved near-optimal policy convergence with a mean return of
498.7 +/- 3.2, maintaining stable equilibrium throughout training. In contrast,
the VQC exhibited limited learning capability, with an average return of 14.6
+/- 4.8, primarily constrained by circuit depth and qubit connectivity. Noise
robustness analysis further revealed that the MLP policy deteriorated
gracefully under Gaussian perturbations, while the VQC displayed higher
sensitivity at equivalent noise levels. Despite the lower asymptotic
performance, the VQC exhibited significantly lower parameter count and
marginally increased training time, highlighting its potential scalability for
low-resource quantum processors. The results suggest that while classical
neural policies remain dominant in current control benchmarks, quantum-enhanced
architectures could offer promising efficiency advantages once hardware noise
and expressivity limitations are mitigated.