Sentiment Analysis on Twitter

How to do a Twitter Sentiment Analysis?

Or: What´s the mood on Twitter?

Hello there!

Today I want to show you how to do a so-called Sentiment Analysis. It is about analyzing the mood on Twitter about a certain Keyword. You get a number of tweets which contain a keyword you can define, filter out the text of these tweets and then see if there are more positive or negative words. Of course you can´t just do it by hand; you need a tool doing the work for you.

Our tool:

Our main tool is called R. (yes just R, it´s not a typo)

It is a free “software environment for statistical computing and graphics” and is available for Unix platforms, Windows and MacOS.

It´s available here: http://www.r-project.org/

It has a comfortable installer, so this step shouldn´t be a problem.

After installing you can open the GUI and get the following screen:

1

Ok now we can download our other tool: twitteR

It´s a script written for R.

You don´t have to download it from a website, you can do it directly from within R.

You can to it with:


Install.packages(‘twitteR’, dependencies=T)

You then have to select a CRAN mirror, from where you want to download it and click ok. (you can show what ever mirror you want)

R will now download the package and install it.

Then we have to activate it for our current session with:


library(twitteR)

library(plyr)

Your screen should look like this now:

2

Ok now we come to a tricky part:

The Twitter Authentification

 

Since Twitter released the Version 1.1 of their API a OAuth handshake is necessary for every request you do. So we have to verify our app.

First we need to create an app at Twitter.

Got to https://dev.twitter.com/ and log in with your Twitter Account.

Now you can see your Profile picture in the upper right corner and a drop-down menu. In this menu you can find “My Applications”.

Click on it and then on “Create new application”.

You can name your Application whatever you want and also set Description on whatever you want. Twitter requires a valid URL for the website, you can just type in http://test.de/ ; you won´t need it anymore.

And just leave the Callback URL blank.

3

Click on Create you´ll get redirected to a screen with all the OAuth setting of your new App. Just leave this window in the background; we´ll need it later

Continue to R and type in the following lines (on separate lines):


reqURL <- "https://api.twitter.com/oauth/request_token"

accessURL <- "http://api.twitter.com/oauth/access_token"

authURL <- "http://api.twitter.com/oauth/authorize"

consumerKey <- "yourconsumerkey"

consumerSecret <- "yourconsumersecret"

twitCred <- OAuthFactory$new(consumerKey=consumerKey,consumerSecret=consumerSecret,requestURL=reqURL,accessURL=accessURL,authURL=authURL)

download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

twitCred$handshake(cainfo="cacert.pem")

registerTwitterOAuth(twitCred)

You have to replace yourconsumerkey and yourconsumersecret with the data provided on your app page on Twitter, still opened in your webbrowser.

The command twitCred$handshake(cainfo=”cacert.pem”) will ask you to go a certain URL and entert he PIN you receive on this page.

The Data Mining:

Ok we passed the authentication and can now go on with getting the tweets we want from Twitter.

Type in:


tweets = searchTwitter("#apple", n=200, cainfo="cacert.pem")

This makes twitteR get 200 Tweets with the keyword #apple in it (you can change the keyword of course).

After waiting a few seconds you can use length(tweets) to see how many tweets were actually saved; maybe for some keywords the number existing is actual smaller than our sample size n.

4

Now we have our Tweets.

The Analyzing:

To be able to analyze our tweets, we have to extract their text and save it into the variable tweets.text by typing:


Tweets.text = laply(tweets,function(t)t$getText())

What we also need are our lists with the positive and the negative words.

We can find them here:

https://github.com/mjhea0/twitter-sentiment-analysis/tree/master/wordbanks

After downloading the ZIP you can put them in a folder on your Computer; you should just keep the absolute path in mind.

We now have to load the words in variables to use them by typing:


pos = scan('/Users/julian/Documents/positive-words.txt', what='character', comment.char=';')

neg = scan('/Users/julian/Documents/negative-words.txt', what='character', comment.char=';')

Of course you have to change the path, but we have our two lists: pos and neg

Now we have to insert a small algorhytm written by Jeffrey Breen analyzing our words.

Just copy-paste the following lines and hit enter:


score.sentiment = function(sentences, pos.words, neg.words, .progress='none')

{

require(plyr)

require(stringr)

# we got a vector of sentences. plyr will handle a list

# or a vector as an "l" for us

# we want a simple array ("a") of scores back, so we use

# "l" + "a" + "ply" = "laply":

scores = laply(sentences, function(sentence, pos.words, neg.words) {

# clean up sentences with R's regex-driven global substitute, gsub():

sentence = gsub('[[:punct:]]', '', sentence)

sentence = gsub('[[:cntrl:]]', '', sentence)

sentence = gsub('\\d+', '', sentence)

# and convert to lower case:

sentence = tolower(sentence)

# split into words. str_split is in the stringr package

word.list = str_split(sentence, '\\s+')

# sometimes a list() is one level of hierarchy too much

words = unlist(word.list)

# compare our words to the dictionaries of positive & negative terms

pos.matches = match(words, pos.words)

neg.matches = match(words, neg.words)

# match() returns the position of the matched term or NA

# we just want a TRUE/FALSE:

pos.matches = !is.na(pos.matches)

neg.matches = !is.na(neg.matches)

# and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():

score = sum(pos.matches) - sum(neg.matches)

return(score)

}, pos.words, neg.words, .progress=.progress )

scores.df = data.frame(score=scores, text=sentences)

return(scores.df)

}

The final steps:

Type in:


analysis = score.sentiment(Tweets.text, pos, neg)

Congrats, your first sentiment Analysis was now saved.

You can get a table by typing:


table(analysis$score)

Or the mean by typing:


mean(analysis$score)

Or get a histogram with:


hist(analysis$score)

The positive values stand for positive tweets and the negative values for negative tweets. The mean tells you about the overall mood of your sample.

 

 

 

 

 

Note:

Sometimes it doesn´t work because there are some tweets with invalid characters in it. Then you have to do the data mining again or change the keyword. As soon an update is available I will update this article.

32 thoughts on “Sentiment Analysis on Twitter

  1. Pingback: Create a wordcloud with your Twitter Data | julianhi's Blog

  2. I seem to have a problem with the authetication.
    I get the following error in R (using RStudio)
    > twitCred$handshake(cainfo=”cacert.pem”)
    To enable the connection, please direct your web browser to:
    http://api.twitter.com/oauth/authorize?oauth_token=g6ivyRUEckJoGcniqV8yo2pR0qcYwwvsJjF7BibzU
    When complete, record the PIN given to you and provide it here: 3204922
    > registerTwitterOAuth(twitCred)
    [1] TRUE
    > tweets = searchTwitter(“#python”, n=200, cainfo=”cacert.pem”)
    [1] “Unauthorized”
    Error in twInterfaceObj$doAPICall(cmd, params, “GET”, …) :
    Error: Unauthorized

  3. Yup just tried it with the normal R console and get the same error.
    tweets = searchTwitter(“#apple”, n=200, cainfo=”cacert.pem”)
    Error in twInterfaceObj$doAPICall(cmd, params, “GET”, …) :
    OAuth authentication is required with Twitter’s API v1.1

  4. I did and i got passed it. My new favourite error that i don’t understand is this…! I’m close to shooting myself right now!!!

    Loading required package: stringr
    Error in FUN(X[[1L]], …) : could not find function “str_split”
    In addition: Warning message:
    In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
    there is no package called ‘stringr’

  5. Hi ,

    Thanks for the blog. Actually I am trying to do sentiment analysis of telecom operators, but I get for every tweet there is some 15 duplicates. So, if I pull 1500 tweets, there are only 100 unique tweets. How to remove the duplicates in such cases.

    Also, for tweets in languages other than English – is there a way to get them translated in English from twitter or should we do it after saving the tweets.

    • Hey Uthra,
      nice to see you on my blog.
      If everything is working correctly there you shouldn´t receive duplicates. Or better: if you get all the tweets with just one search, the Twitter API does not return duplicates. Please check your code if everything is correct.

      And there is no way to get them translated directly from Twitter. You should save the tweets as you receive them and then think about translating them.

      Please give me an answer if you could find the problem with the duplicates.

      Regards
      Julian

  6. Hi

    I am facing a Problem here with Rstudio.

    after Completing the Authentication Process I am trying to get the Tweets but its showing some kind of error.

    Athletics.list Athletics.df = twListToDF(Athletics.list)
    Error in lapply(X = X, FUN = FUN, …) :
    object ‘Athletics.list’ not found
    > write.csv(Athletics.df, file=’C:/temp/AthleticsTweets.csv’, row.names=F)
    Error in is.data.frame(x) : object ‘Athletics.df’ not found

    Help me..

    • Hey Sourabh,
      there seems to be a problem with the lists and dataframes you are using in your code and not with the Twitter Authentication.
      Could you please show me your whole code?

      Regards

    • Hey Sourabh,
      no you can´t because the LinkedIn API is structured completely than the Twitter API. LinkedIn focuses on contacts and
      there is no way to search LinkedIn for public posts like you could do with Twitter.
      I hope I could help you.

      Regards

  7. Read all the comments, I still get this error:
    > tweets = searchTwitter(“#apple”, n=200, cainfo=”cacert.pem”)
    [1] “Unauthorized”
    Error in twInterfaceObj$doAPICall(cmd, params, “GET”, …) :
    Error: Unauthorized

    Cant figure out a way. Can you help?

  8. I have gone completely bananas over the twitter authentication and PIN generation.

    After executing the following command –
    Cred$handshake(cainfo = system.file(“CurlSSL”,”cacert.pem”,package=”RCurl”))
    OR
    Cred$handshake(cainfo=”cacert.pem”)

    I get this –
    To enable the connection, please direct your web browser to:
    http://api.twitter.com/oauth/authorize?oauth_token=V0W4WSrgKg7s336bMv6o2kCPmunzEToyW2UhnTCCcpM
    When complete, record the PIN given to you and provide it here:

    On redirecting, i get the “Authorize App” page. After that, i get either of the messages –
    “The web page is not available” OR
    “Could not connect to 127.0.0.1:8000/twitter_callback”

    I have tried 4 different Callback URL’s –
    1) 127.0.0.1:8000/twitter_callback
    2) 127.0.0.1:8080/twitter_callback
    3) 127.0.0.1:8000/twitter/oauth
    4) 127.0.0.1:8080/twitter/oauth
    Note – i have even tried the shortened versions of the URL’s mentioned above through Bitly

    I have even tried changing the accessURL and authURL from http to https.

    After struggling with it for days, I still see no signs of moving ahead.
    Please guide me.

    • Hey Nitisch,
      there is no need for providing a callback URL in the app setting. Just leave that field blank as you get redirected to a Twitter page automatically and you just have to copy paste the pin code.
      But I will update the Twitter Auth post in a few minutes as the whole login process got much much more easier in the newest version of the twitteR package.

      Regards

  9. Hi Julianhi,

    is it possible to remove tweets from the company while forming the corpus. I want to do this because most of the tweets from the company are not reflecting the sentiments of the consumers and hence would result in some noise. for eg- offers, new products etc do not add value to the sentiment.

    please help it’s urgent for my project.

    regards,
    Abhishek

  10. ya, sort of. I mean, I want to remove tweets from the admin of that twitter account. so suppose apple is tweeting about anything on their account with #apple, I want to remove that. And I want to retain all those other tweets with #apple from other users.

  11. hi,
    Do you know any method to do the same(this is regarding the above question)? please reply. i also have another question, can i extract facebook comments also and add it to the tweets corpus to have a big set of data?

    regards,

  12. Hi Julianhi.

    I know that I am too late, but could you explain me how this code works?

    function(t)t$getText()

    I understand “lapply” function, but I don’t found “getText” function.

    Thanks in advance!!

    • Hey,
      it is part of the status-class of the twitteR package. So the tweets are stored in a status-class which provides the function getText() to return the text of the tweets.

      Regards

Leave a reply to Chrisfs Cancel reply