![]() This tutorial aims to help other users get off the ground using Word2Vec for their own research. I personally spent a lot of time untangling Doc2Vec and crashing into ~50% accuracies due to implementation mistakes. The C-code is nigh unreadable (700 lines of highly optimized, and sometimes weirdly optimized code). Imagine being able to represent an entire sentence using a fixed-length vector and proceeding to run all your standard classification algorithms. This is made even more awesome with the introduction of Doc2Vec that represents not only words, but entire sentences and documents. These representations have been applied widely. This process, in NLP voodoo, is called word embedding. For example, v_man - v_woman is approximately equal to v_king - v_queen, illustrating the relationship that "man is to woman as king is to queen". Furthermore, these vectors represent how we use the words. What's so special about these vectors you ask? Well, similar words are near each other. In short, it takes in a corpus, and churns out vectors for each of those words.
0 Comments
Leave a Reply. |