Building a sentiment classification model

Abinaya Jayaprakash
5 min readNov 1, 2020

Did the movie get a positive or negative review?

Sentiment classification uses natural language processing and machine learning to interpret emotions in the inputted data. It is a text analysis technique that detects polarity.

This article would explain the steps to building a sentiment classification model using the “IMDB dataset of 50k movie reviews”.

Understanding the dataset

We convert the data in the form of a csv file to a data frame and make note of the data type in each column.

Importing libraries and packages

All necessary libraries and packages for text preprocessing and building the model are imported.

Text preprocessing

This process refers to steps taken to transfer text from human language to a machine-readable format for further analysis. The better your text is preprocessed, the more accurate the results obtained are.

It would include the following:

  • Changing data types

We need to change the data type of the sentiment column to suit a classification model, so we use the pandas apply function to carry this out.

  • Removing punctuation
  • Removing stopwords

Stop words are commonly used words such as ‘a’, ‘the’, and ‘an’ that do not add much meaning to the sentence. This reduces the dataset size and time to train the model without affecting the accuracy significantly. This also ensures the data inputted has fewer and essential tokens therefore improves classification accuracy too.

  • Stemming
Abinaya Jayaprakash

Srilankan living in Berlin. Mathematics master student at Freie Universitat. Interested in Data science & Machine Learning

Recommended from Medium