Major alignment safety

Learning to Look Benign: Targeted Evasion of Malware Detectors via API Import Injection

Juozas Dautartas, Olga Kurasova, Juozapas Rokas Čypas, Viktor Medvedev

Published: May 18, 2026 — 16:32 UTC

Problem
This paper addresses a significant gap in the literature regarding the vulnerability of machine learning-based malware detectors to adversarial attacks, specifically through targeted evasion techniques. The authors explore the feasibility of misclassifying malware samples as benign software categories by injecting specific Win32 API imports, rather than merely evading detection. This work is presented as a preprint and has not yet undergone peer review.

Method
The authors propose a framework utilizing a Conditional Variational Autoencoder (CVAE) with a strictly additive decoder. This architecture allows for the introduction of new API calls while preserving existing ones, thereby maintaining the malware’s functionality. The framework automatically identifies the benign category that a given malware sample most closely resembles, which serves as the target for evasion. A differentiable proxy, derived from knowledge distillation, facilitates gradient-based training against a non-differentiable ensemble detector. The experiments are conducted on a dataset comprising binary Win32 API import vectors from 3,799 Windows executables, categorized into five benign classes and one malware class. The training process involves varying the number of injected API imports (k) from 5 to 50.

Results
The proposed method demonstrates a substantial reduction in malware recall from 87.5% to 30% when 20 API imports are added. Among the samples that successfully evaded detection, 99% were classified as the intended benign target category. The CVAE outperformed both a frequency-based baseline and random selection across all tested injection sizes. Additionally, validation on real Portable Executable (PE) files submitted to VirusTotal revealed an average 54.5% reduction in flagging by commercial static detection engines, indicating the attack’s effectiveness in practical scenarios.

Limitations
The authors acknowledge that their approach relies on the availability of benign categories that can be mimicked through API import injection, which may not be universally applicable across all malware types. They also note that the effectiveness of the attack may vary depending on the specific architecture of the malware detector being targeted. An obvious limitation not discussed is the potential for detection mechanisms to evolve in response to such attacks, which could mitigate the effectiveness of the proposed method over time.

Why it matters
This research highlights a critical vulnerability in API-based malware classifiers, demonstrating that targeted evasion into a chosen benign category is feasible with minimal modifications that preserve malware functionality. The implications of this work extend to the design of more robust malware detection systems, necessitating the incorporation of dynamic analysis and behavioral features to counteract such adversarial strategies. Furthermore, it raises awareness of the need for ongoing research into adversarial machine learning techniques within cybersecurity frameworks.

Authors: Juozas Dautartas, Olga Kurasova, Juozapas Rokas Čypas, Viktor Medvedev
Source: arXiv:2605.18624
URL: https://arxiv.org/abs/2605.18624v1

By Callan Zhang · May 18, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.LG