In order to put the words into the machine learning algorithm the text data should be converted into a vector representations.
There are three major ways of doing that.The are listed as follows:
I will focus on one-hot encoding in the following post.
A one hot encoding is a representation of categorical variables as binary vectors. Each integer value is represented as a binary vector that is all zero values except the index of the integer, which is marked with a 1.
Example implementation of One-hot encoding is shown below.
Evaluation of One-Hot encoding
- Simplest method to implement
- Un-ordered,therefore the context of words are lost.
- The vector representation is in binary form, therefore no frequency information is taken into account.