← Back to the schedule

Keynote | Technical | talkReduce()

Attacking Machine Learning used in AntiVirus with Reinforcement

Friday 17th | 14:40 - 14:50 | Theatre 19

One-liner summary:

In recent years Machine Learning (ML) and especially Deep Learning (DL) have achieved great success in many areas such as visual recognition, NLP or even aiding in medical research. The computer security sector has wanted to use the capabilities of these algorithms to develop classifiers to detect malware. Unfortunately, ML and DL systems, often demonstrate incorrect behaviors for several reasons such as overfitting, biased training data, or because they are too linear. The objetive of this short talk is to show how an attacker could evade an ML-based malware detection system using Reinforcement Learning.

Keywords defining the session:


- Cybersecurity

- Malware Evasion


In Reinforcement Learning there is no clearly defined correct answer to an entry as in Supervised Learning. You will have an agent who has to decide what action to take from the state in which he is to maximize the reward he will receive. To do this, the agent learns from experience by collecting training examples that will tell you by trial-error what actions were good and what were bad. A specific type of Reinforcement Learning is known as Q-Learning which consists in evaluating what action to take based on an “action-value” function that determines the value of performing a concrete action from a given state. More specifically, there is a “Q function” that receives as input a state and an action and returns the expected reward of that action in that state. Before scanning the environment, “Q” returns the same random fixed value. But as this environment is explored more and more, “Q” provides a better approximation of the value of an action “a” from the state “s”. The Q function will be updated as the training progresses. An improvement in this type of learning that is receiving very good results is achieved through Deep Q-Networks (DQN), which employ Deep Learning techniques to approximate non-linear “Q-functions”. To create undetectable malware using Reinforcement Learning, the environment will be a sample of malware and the agent will be the algorithm whose function is to change the environment. The action policy and the “Q function” will determine what action to take. The set of actions available will be possible modifications that can be made to malware without corrupting it or altering its functionality. The reward function will use the output of the antivirus malware classifier to generate its result. So it will return 0 if the modified malware sample was not detected by that classifier or 1 in case the antivirus detected it as dangerous. This study (based on the paper called “Evading next-gen AV using A.I.”) is focused on attacking antivirus engines that use malware classifiers for Windows PE files during the static analysis of those files. The Portable Executable (PE) format is a file format for executables and DLLs used in 32-bit