Self-attention
I. Self-attention Overview Input: A sequence of vectors (size not fixed) Output: Each vector has a label (POS tagging) The whole sequence has a label (sentiment analysis) Model decides the number of labels itself (seq2seq) Self-attention can handle global iformation, and FC can handle local information. Self-attention is the key module of Transformer, which will be shared in other articles.
II. How to work? Firstly, we should figure out the relevance between each vector.