Sentiment analysis, my approach

My TvSeriesTweet app collects popular tweets from the best Tv Series. I’ve recently added a Sentiment Analysis tab, to provide info about how the Tv Series communication on Twitter is polarized (positive, negative or neutral content) and what type of content best suits the followers. Unlike other analysts, I like to study the communication of the Twitter influencers instead of the sentiments of the users reacting to them.

The Roots Dictionary

After having studied and tried the best current approaches to Sentiment analysis (syntactic-semantic grammar rules, heuristic methods, combined methods), I decided to develop a different solution to better address the Twitter user content.

Twitter users tend to use a lot of hashtags and to aggregate words, so a classic dictionary based algorithm wouldn’t correctly recognize a lot of words. I created a new dictionary, using only the root of every word (e.g. the root “alert” for the words: alerts, alerting, alerted) and used it to look for the roots inside every word of a given tweet (so that the root gets recognized within the #spoileralert hashtag).

The roots dictionary is derived from the AFINN-111 file. It’s a text file, every row contains a word (in my case the root of a word) and a positive or negative score (in the range -5/+5) separated by comma. The full Dictionary is available here. An extract:

aghast,-2
agog,2
agonis,-3
agoniz,-3
agree,1
alarm,-2
alas,-1
alert,-1

Now, I use this code to import dictionary data from my dictionary-roots.txt file:

function get_dictionary () {
  $dictionary = array();
  $rows = explode("\n", file_get_contents('./dictionary-roots.txt'));
  foreach ($rows as $val) {
    list($word,$score) = explode(',', $val);
    $dictionary[$word] = $score;
  }
  return $dictionary;
}

The dictionary is used by my get_sentiment function, which compares the given text with every word inside the dictionary and computes a sentiment score. The sentiment words possibly contained inside the Twitter account name are ignored. The final score is divided by the number of words in the sentence, to take account of the higher scores for longer content.

function get_sentiment ($text='', $twitter_account='', $dictionary='') {
  $word_count = str_word_count($text);
  // CHECK FOR EMPTY CONTENT
  if ($word_count == 0) {
    return 0;
  }
  // CHECK FOR EMPTY DICTIONARY
  if ($dictionary=='') {
    $dictionary = get_dictionary();
  }
  $total_score = 0;
  foreach ($dictionary as $word => $score) {
    // COUNTERACT TWITTER ACCOUNT NAME INFLUENCE
    if ($twitter_account!='' && stripos($twitter_account, $word) !== false) {
      continue;
    }
    $total_score += $score * substr_count(strtolower($text), strtolower($word));
  }
  // COMPARE TOTAL SCORE TO TOTAL WORD COUNT
  $final_score = round( $total_score * 100 / ($word_count + 2) );
  return $final_score;
}

Works using the Roots Dictionary

This work built a dictionary combining my Roots Dictionary and two other sources.

The Roots Dictionary

Works using the Roots Dictionary

Leave a Reply Cancel reply