Friday, 9 December 2016

Sentiment Analysis with Python

In this post, I will demonstrate  how quick and easy it is to run sentiment analysis on text data - inspiration for this post came from Sirajology - many thanks for your awesome videos!

Sentiment analysis is a process which can determine the emotional tone behind a series of words, used to gain an understanding of attitudes, opinions and emotions expressed within. Sentiment analysis is extremely useful in social media monitoring as it allows us to gain an overview of the wider public opinion behind certain topics.

For this example, we will be using Twitter as a text source - specifically searching for opinions about the latest Star Wars film, Rogue One. We can do this using only twenty lines of Python code, which will execute on Windows or Linux!

First, we must install a couple of Python libraries, if not already present, using PIP:

  • Tweepy for accessing the Twitter API
  • TextBlob for natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more

These are the required PIP commands:

> pip install tweepy
> pip install textblob
> python -m textblob.download_corpora

Should we wish, we can run TextBlob from the Python command line:

> python

>>> from textblob import TextBlob

>>> blob = TextBlob("DarkMind is possibly the worst company I've ever known")

>>> blob.tags
[('DarkMind', 'NNP'), ('is', 'VBZ'), ('possibly', 'RB'), ('the', 'DT'), ('worst', 'JJS'), ('company', 'NN'), ('I', 'PRP'), ("'ve", 'VBP'), ('ever', 'RB'), ('known', 'VBN')]

>>> blob.sentiment
Sentiment(polarity=-0.5, subjectivity=1.0)

>>> blob = TextBlob("Robosoup is one of the most inspirational organisations of our time")

>>> blob.tags
[('Robosoup', 'NNP'), ('is', 'VBZ'), ('one', 'CD'), ('of', 'IN'), ('the', 'DT'), ('most', 'RBS'), ('inspirational', 'JJ'), ('organisations', 'NNS'), ('of', 'IN'), ('our', 'PRP$'), ('time', 'NN')]

>>> blob.sentiment
Sentiment(polarity=0.5, subjectivity=0.75)

As you can see, TextBlob has accurately parsed the two sentences and calculated a sentiment score for each. The first phrase having a negative score of minus 0.5 and the second phrase with a positive score of plus 0.5

Next, let us incorporate this into a simple Python program. To use the Twitter API, you will first need to generate some authorisation keys with Twitter Apps, which is quick and free.

import tweepy
from textblob import TextBlob

consumer_key = '...YOUR_KEY_HERE...'
consumer_secret = '...YOUR_KEY_HERE...'

access_token = '...YOUR_KEY_HERE...'
access_token_secret = '...YOUR_KEY_HERE...'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

tweets ="#RogueOne OR #StarWars -filter:links -filter:media", lang="en")

for tweet in tweets:
    print(tweet.text) analysis = TextBlob(tweet.text)

Here are the results:

RT @GodfreyElfwick: For 39 years #StarWars hasn't had a single disabled genderqueer black feminist female jedi. This franchise is so out of…
Sentiment(polarity=-0.10952380952380952, subjectivity=0.2785714285714286)

I woke up to find #dumpstarwars was trending and just assumed it must mean Jar Jar Binks appears in #RogueOne
Sentiment(polarity=-0.3125, subjectivity=0.6875)

This time next week, we will all be riding a high of enjoyment after seeing #RogueOne
Sentiment(polarity=0.08, subjectivity=0.26999999999999996)

RT @shaunduke: That's one thing TFA made clear by injecting much needed diversity into the #StarWars universe: this is a story for people w…
Sentiment(polarity=0.15000000000000002, subjectivity=0.29166666666666663)

May the force be with you… but don't force too hard. Or you'll find the droids you're looking for. #StarWars #RogueOne #JediCouncil
Sentiment(polarity=-0.2916666666666667, subjectivity=0.5416666666666666)

RT @chrisjallan: Genuinely so glad none of these #DumpStarWars lot will be at any #RogueOne screenings like. They seem like an absolute nig…
Sentiment(polarity=0.35, subjectivity=0.95)

#DumpStarWars lead female character, evil government, futuristic ideas, unity against oppression #rogueone full of hate from Trumpsters
Sentiment(polarity=-0.36250000000000004, subjectivity=0.6541666666666667)

I'm literally so excited for @swidentities exhibition! This should keep me going til next week! Ahhhhhhhhh #rogueone #StarWarsIdentities
Sentiment(polarity=0.234375, subjectivity=0.375)

RT @ryan_mceachern: I've seen #RogueOne & there aren't any allusions to Trump but this new character Cheetodust McDaughtergroper was pretty…
Sentiment(polarity=0.018181818181818174, subjectivity=0.2772727272727273)

Considering this is only twenty lines of code, the results are pretty impressive. Of course, this was a bit of fun to see how quickly we could set up a quick and dirty sentiment analysis system. In production, the sentiment system could be trained specifically on your domain, which would yield greater accuracy. The text data may be sourced from emails, product reviews, or pretty much anything else you can think of.

With a little digging, you will find there are a huge number of open source libraries available to accelerate your business with machine learning.

Follow me on Twitter for more updates like this.