NOTION_PAGE:afdc392d-d83e-44fa-ab91-f7c1b8f563a0


Word Embedding (Word Vector)

*Word Embeddings *originated in the field of Natural Language Processing as a statistical approach of representing words as vectors based on co-occurrence of words in sentences. The main advantage of these representations is the high correlation between word sense similarity and embedding similarity.

In simpler terms, a word vector is a row of real-valued numbers (as opposed to dummy numbers) where each point captures a dimension of the word’s meaning and where semantically similar words have similar vectors.

  • This means that words such as wheel and engine should have similar word vectors to the word *car *(because of the similarity of their meanings), whereas the word banana should be quite distant.
  • Words that are used in a similar context will be mapped to a proximate vector space

Sair atribuindo valores para essas escalas manualmente para TODAS as palavras em um corpus (conjunto de textos) seria muito complicado, certo? Então, como obtemos esses embeddings de fato?

  • Fazemos o computador ‘aprendê-los’, isto é, usamos algum algoritmo de machine learning para gerá-los, a partir de seu contexto. (unsupervised learning → autoencoders)

A intuição por trás disso é que o significado de uma palavra está intimamente relacionado às palavras que em geral aparecem junto a ela.

Generally, word2vec is trained using something called a skip-gram model. The skip-gram model, pictures above, attempts to use the vector representation that it learns to predict the words that appear around a given word in the corpus. Essentially, it uses the context of the word as it is used in a variety of books and other literature to derive a meaningful set of numbers. If the “context” of two words is similar, they will have similar vector representations


’Bag of Words’ (CBOW) VS ‘Skip-Gram’:

CBOW and Skip-gram are just mirrored versions of each other. CBOW is trained to predict a single word from a fixed window size of context words, whereas Skip-gram does the opposite, and tries to predict several context words from a single input word.

==For the same logic regarding the task difficulty, CBOW learn better syntactic relationships between words while Skip-gram is better in capturing better semantic relationships.==

  • In practice, this means that for the word ‘cat’ CBOW would retrive as closest vectors morphologically similar words like plurals, i.e. ‘cats’ while Skip-gram would consider morphologically different words (but semantically relevant) like ‘dog’ much closer to ‘cat’ in comparison.

https://jalammar.github.io/illustrated-word2vec/

https://towardsdatascience.com/creating-word-embeddings-coding-the-word2vec-algorithm-in-python-using-deep-learning-b337d0ba17a8

https://medium.com/turing-talks/word-embedding-fazendo-o-computador-entender-o-significado-das-palavras-92fe22745057

https://adventuresinmachinelearning.com/word2vec-keras-tutorial/

https://adventuresinmachinelearning.com/word2vec-tutorial-tensorflow/


🌱 Back to Garden