Vector Representations of Text for Machine Learning

2 min readNov 8, 2017

In order to put the words into the machine learning algorithm the text data should be converted into a vector representations.

There are three major ways of doing that.The are listed as follows:

I will focus on one-hot encoding in the following post.

One-Hot encoding

A one hot encoding is a representation of categorical variables as binary vectors. Each integer value is represented as a binary vector that is all zero values except the index of the integer, which is marked with a 1.

Example implementation of One-hot encoding is shown below.

Evaluation of One-Hot encoding

Simplest method to implement
Un-ordered,therefore the context of words are lost.
The vector representation is in binary form, therefore no frequency information is taken into account.

How to One Hot Encode Sequence Data in Python - Machine Learning Mastery

Machine learning algorithms cannot work with categorical data directly. Categorical data must be converted to numbers…

machinelearningmastery.com

Vector Representations of Text for Machine Learning

One-Hot encoding

Evaluation of One-Hot encoding

Further Readings

How to One Hot Encode Sequence Data in Python - Machine Learning Mastery

Machine learning algorithms cannot work with categorical data directly. Categorical data must be converted to numbers…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Athif Shaffy

Responses (2)