Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

How does the attention mechanism work in nlp?


Asked by Dorothy Clements on Dec 08, 2021 FAQ



Interpreted another way, the attention mechanism is simply giving the network access to its internal memory, which is the hidden state of the encoder. In this interpretation, instead of choosing what to “attend” to, the network chooses what to retrieve from memory.
One may also ask,
Mathematically, the self-attention matrix for input matrices (Q, K, V) is calculated as: where Q, K, V are the concatenation of query, key, and value vectors. In the attention paper, the authors proposed another type of attention mechanism called multi-headed attention.
In this manner, How Attention Mechanism was Introduced in Deep Learning. The attention mechanism emerged as an improvement over the encoder decoder-based neural machine translation system in natural language processing (NLP). Later, this mechanism, or its variants, was used in other applications, including computer vision, speech processing, etc.
Moreover,
“Self-attention, sometimes called intra-attention, is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence.” Take a look at the above image. Can you figure out what the term “it” in this sentence refers to? Is it referring to the street or to the animal?
Also Know,
The basic idea in Attention is that each time the model tries to predict an output word, it only uses parts of an input where the most relevant information is concentrated instead of an entire sentence i.e it tries to give more importance to the few input words. Let’s see how it works: