Vector Representations of Text for Machine Learning

In order to put the words into the machine learning algorithm the text data should be converted into a vector representations.

There are three major ways of doing that.The are listed as follows:

Source:(Ravi, 2017)

I will focus on one-hot encoding in the following post.

A one hot encoding is a representation of categorical variables as binary vectors. Each integer value is represented as a binary vector that is all zero values except the index of the integer, which is marked with a 1.

Example implementation of One-hot encoding is shown below.

Source :(Marco Bonzanini, 2017)
  • Simplest method to implement
  • Un-ordered,therefore the context of words are lost.
  • The vector representation is in binary form, therefore no frequency information is taken into account.

Senior Software Engineer at 99x | Freelance developer | Interested in History and Philosophy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store