Signal-Based Malware Classification Using 1D CNNs
2509.06548v1
cs.CR, cs.AI, cs.CV, cs.LG, I.2.6; K.6.5
2025-09-10
Авторы:
Jack Wilkie, Hanan Hindy, Ivan Andonovic, Christos Tachtatzis, Robert Atkinson
Резюме на русском
## Контекст
Modern malware detection faces significant challenges due to the use of advanced obfuscation techniques, which can bypass traditional static analysis methods. Dynamic analysis, while effective, is resource-intensive and impractical for large-scale deployment. To address these issues, existing research transforms malware binaries into 2D images by reshaping their data into a grid format and resizing it using Lanczos resampling. These images are then analyzed using computer vision techniques, enabling detection of obfuscated malware more effectively than static analysis. However, this approach introduces significant information loss due to quantization noise and the artificial introduction of 2D dependencies, which do not exist in the original binary data. This limitation reduces the classification performance of downstream models. This study proposes a novel approach that converts malware binaries into 1D signals, eliminating the need for heuristic reshaping and avoiding quantization noise by storing data in a floating-point format.
## Метод
The proposed methodology focuses on converting malware binaries into 1D signals, leveraging their inherent structure and minimizing information loss. Unlike traditional 2D image-based approaches, this method preserves the original signal's integrity by avoiding heuristic reshaping and quantization noise. The signals are processed using a bespoke 1D convolutional neural network (1D CNN) based on the ResNet architecture. The network incorporates squeeze-and-excitation layers to enhance feature representation and classification accuracy. The model was evaluated on the MalNet dataset, a comprehensive dataset for malware classification, to assess its performance across binary, type, and family-level classification tasks. This approach represents a significant departure from conventional methods, offering improved classification accuracy and robustness.
## Результаты
The experiments demonstrated the efficacy of the 1D signal-based approach in malware classification. The bespoke 1D CNN achieved state-of-the-art performance on the MalNet dataset, with F1 scores of 0.874 for binary classification, 0.503 for type-level classification, and 0.507 for family-level classification. These results outperform existing 2D CNN models when applied to the same dataset, highlighting the superiority of the proposed signal-based methodology. The floating-point representation of signals eliminates quantization noise, ensuring that the models receive more accurate and complete data for analysis. This improvement in signal fidelity directly translates to better classification performance, paving the way for more effective malware detection systems.
## Значимость
The proposed 1D signal-based approach offers several advantages over traditional 2D image-based methods. By avoiding heuristic reshaping and quantization noise, it preserves the integrity of the original malware data, leading to more accurate classification. The method is computationally efficient, making it suitable for large-scale deployment in real-world cybersecurity systems. Its applications extend beyond malware classification, as the signal-based modality can be applied to other domains requiring robust signal processing. The potential impact of this work includes enhanced malware detection capabilities, improved system security, and reduced resource consumption in large-scale deployment scenarios.
## Выводы
The study demonstrates the effectiveness of converting malware binaries into 1D signals for classification using 1D CNNs. The bespoke 1D CNN architecture, based on ResNet and squeeze-and-excitation layers, achieves state-of-the-art performance on the MalNet dataset, outperforming existing 2D CNN models. This approach eliminates the limitations of traditional 2D image-based methods, offering superior classification accuracy and robustness. Future research directions include exploring advanced signal processing techniques to further enhance signal fidelity and investigating the applicability of the proposed methodology to other cybersecurity and signal processing tasks.
Abstract
Malware classification is a contemporary and ongoing challenge in
cyber-security: modern obfuscation techniques are able to evade traditional
static analysis, while dynamic analysis is too resource intensive to be
deployed at a large scale. One prominent line of research addresses these
limitations by converting malware binaries into 2D images by heuristically
reshaping them into a 2D grid before resizing using Lanczos resampling. These
images can then be classified based on their textural information using
computer vision approaches. While this approach can detect obfuscated malware
more effectively than static analysis, the process of converting files into 2D
images results in significant information loss due to both quantisation noise,
caused by rounding to integer pixel values, and the introduction of 2D
dependencies which do not exist in the original data. This loss of signal
limits the classification performance of the downstream model. This work
addresses these weaknesses by instead resizing the files into 1D signals which
avoids the need for heuristic reshaping, and additionally these signals do not
suffer from quantisation noise due to being stored in a floating-point format.
It is shown that existing 2D CNN architectures can be readily adapted to
classify these 1D signals for improved performance. Furthermore, a bespoke 1D
convolutional neural network, based on the ResNet architecture and
squeeze-and-excitation layers, was developed to classify these signals and
evaluated on the MalNet dataset. It was found to achieve state-of-the-art
performance on binary, type, and family level classification with F1 scores of
0.874, 0.503, and 0.507, respectively, paving the way for future models to
operate on the proposed signal modality.