models

Hi everyone!

In this post, I will try to resume some explained ideas in the paper Adversarial Training with Contrastive Learning in NLP. I will go through some of the concepts that I need to clarify for myself, and I hope this is useful as a very brief introduction for anybody else!

Notes

Main ideas to discuss:

  • Adversarial Training
  • Contrastive Learning

Similar inputs => Semantically similar outcomes.

Adversarial Training

Good old Wikipedia defines Adversarial learning as a machine learning technique that attempts to exploit models by taking advantage of obtainable model information and using it to create malicious attack, usually to cause a malfunction in machine learning.

The most common way is to create adversarial examples, which are inputs designed to fool the model. These examples are created by introducing an adversarial perturbation in any of the dataset examples. When this new examples have been created, our machine learning model should be trained with this new examples, with the aim of making it robust to adversarial examples.

Adversarial examples in NLP

Defining adversarial examples in the Natural Language Processing field is not as easy as it is done in other data science fields, such as computer vision. For instance, having to indistinguishable but different images its easy, but two sentences of text cannot be indistinguishable without being the same. Also, if we wanted to apply these perturbations straightly to the text tokens, we would face the following problem: they belong to a discrete space, so applying small perturbations is unfeasible.

So, what is our alternative? A few techniques used to generate adversarial examples in NLP are: