Vector Representations of Text for Machine Learning

Athif Shaffy
2 min readNov 8, 2017

--

In order to put the words into the machine learning algorithm the text data should be converted into a vector representations.

There are three major ways of doing that.The are listed as follows:

Source:(Ravi, 2017)

I will focus on one-hot encoding in the following post.

One-Hot encoding

A one hot encoding is a representation of categorical variables as binary vectors. Each integer value is represented as a binary vector that is all zero values except the index of the integer, which is marked with a 1.

Example implementation of One-hot encoding is shown below.

Source :(Marco Bonzanini, 2017)

Evaluation of One-Hot encoding

  • Simplest method to implement
  • Un-ordered,therefore the context of words are lost.
  • The vector representation is in binary form, therefore no frequency information is taken into account.

Further Readings

--

--

Athif Shaffy
Athif Shaffy

Written by Athif Shaffy

Senior Software Engineer | Freelance developer | Interested in History and Philosophy