NOTION_PAGE:afdc392d-d83e-44fa-ab91-f7c1b8f563a0
Word Embedding (Word Vector)
*Word Embeddings *originated in the field of Natural Language Processing as a statistical approach of representing words as vectors based on co-occurrence of words in sentences. The main advantage of these representations is the high correlation between word sense similarity and embedding similarity.
In simpler terms, a word vector is a row of real-valued numbers (as opposed to dummy numbers) where each point captures a dimension of the word’s meaning and where semantically similar words have similar vectors.
- This means that words such as wheel and engine should have similar word vectors to the word *car *(because of the similarity of their meanings), whereas the word banana should be quite distant.
- Words that are used in a similar context will be mapped to a proximate vector space
/Untitled-358.png)
Sair atribuindo valores para essas escalas manualmente para TODAS as palavras em um corpus (conjunto de textos) seria muito complicado, certo? Então, como obtemos esses embeddings de fato?
- Fazemos o computador ‘aprendê-los’, isto é, usamos algum algoritmo de machine learning para gerá-los, a partir de seu contexto. (unsupervised learning → autoencoders)
A intuição por trás disso é que o significado de uma palavra está intimamente relacionado às palavras que em geral aparecem junto a ela.
Generally, word2vec is trained using something called a skip-gram model. The skip-gram model, pictures above, attempts to use the vector representation that it learns to predict the words that appear around a given word in the corpus. Essentially, it uses the context of the word as it is used in a variety of books and other literature to derive a meaningful set of numbers. If the “context” of two words is similar, they will have similar vector representations
’Bag of Words’ (CBOW) VS ‘Skip-Gram’:
CBOW and Skip-gram are just mirrored versions of each other. CBOW is trained to predict a single word from a fixed window size of context words, whereas Skip-gram does the opposite, and tries to predict several context words from a single input word.
/Untitled-359.png)
/Untitled-360.png)
==For the same logic regarding the task difficulty, CBOW learn better syntactic relationships between words while Skip-gram is better in capturing better semantic relationships.==
- In practice, this means that for the word ‘cat’ CBOW would retrive as closest vectors morphologically similar words like plurals, i.e. ‘cats’ while Skip-gram would consider morphologically different words (but semantically relevant) like ‘dog’ much closer to ‘cat’ in comparison.
https://jalammar.github.io/illustrated-word2vec/
https://adventuresinmachinelearning.com/word2vec-keras-tutorial/
https://adventuresinmachinelearning.com/word2vec-tutorial-tensorflow/