Skip to content

Opinion Analysis of Text using Plutchik

I got inspired to write this blog by a post I saw today on the French presidential election. Plutchik is really the strongest framework I know for sentiment analysis that goes beyond pedestrian positive/negative classification of sentiment. It was good to see that it was being adopted by others in the industry. So, here we go.

To recap some of my earlier posts, Sirocco parses each news article or user-generated content into subjects and opinions. The subject and the opinion, together with the author of the news article, form a triad that allows us to answer the question: “Who (author) thinks about what (the subject) in what way (the opinion)?”

Author-Opinion-Subject Triad

Finding out who authored a news piece tends to be straightforward. Their names are either on the byline of the article and can be parsed out of the text, or are in the metadata of the record that is part of an imported dataset.

We think of news articles as collections of opinions of various degrees of factuality. The individual opinions can be in a single sentence or in a sequence of sentences, and they are often, but not always, separated by paragraphs. Sirocco analyzes individual sentences and then “chunks” related sentences together, to form an “opinion”.

To determine the subjects of a sentence, we take the sentence parse tree (the constituency-based parse tree, to be precise) and extract Proper Nouns (aka Named Entities) and Noun Phrases. Named Entities are names of persons, organizations, locations, expressions of times, quantities and etc. Typically, they can be found by looking at the first letter of a word (if it is capitalized, and in the middle of a sentence, then it’s a good sign that it’s a Named Entity). There are many more rules for when a combination of words represent a name, but that’s why we have machine learning and training data sets, and the quality of Named Entity extraction has reached pretty high levels. In the terminology of the Penn Treebank Project, Named Entities are Proper Nouns, carrying the NNP part-of-speech tag.

Noun Phrases are 1-to-N-word phrases of nouns, adjectives, etc. For example, “missile test” and “air defense force” are all noun phrases. The NLP algorithms group words together into Noun Phrases based on their role in the sentence, and observed patterns in the training data sets. It’s helpful to process both Named Entities and Noun Phrases as potential subjects, because in many cases a sentence won’t have a name mentioned in it, but they will usually have a Noun Phrase.

Sentiment or opinion extraction is where the bulk of the work is done. Opinions can be roughly divided into emotions (“very interesting”) and expressions of qualities (“nice”, “big”). The Sirocco emotion detection technology is using a framework for human emotions originally developed by Robert Plutchik, a professor at the Albert Einstein College of Medicine.

Plutchik’s Wheel of Emotions identifies 8 basic emotions: Joy, Acceptance, Fear, Surprise, Sadness, Disgust, Anger, and Anticipation.

Plutchik’s Wheel of Emotions. Source: Wikipedia (released in Public Domain)

The 8 basic emotions are divided into 4 pairs, for example Fear is the opposite of Anger, Disgust is the opposite of Trust, Sadness is the negative of Joy, and Surprise is the opposite of Anticipation. Emotional intensity can vary, and taking Sadness as an example, on the extreme end we are looking at Grief, while low intensity Sadness is Pensiveness.

Plutchik emotion pairs

More complex human emotions, according to Plutchik’s research, can be thought of as combinations of two basic ones. For example, Love is Joy + Trust, while Pessimism is Sadness + Anticipation.

Plutchik’s emotion theory is not the only game in town for sentiment analysis; there are other emotion frameworks used by psychologists and researchers. However, Plutchik has something that other frameworks lack, and that is it’s perfect suitability for algorithmic implementation. As we have 4 pairs of emotions, we can use a relatively simple arithmetic for processing negations. And the composability of derived emotions allows us to build search patterns that consist of simple AND operations. Lastly, we can use the emotional intensity gradings, e.g. ranging from Pensiveness to Sadness to Grief on the Sadness emotional dimension, to process degree adverbs, e.g. “very sad” is Grief.

When a computer program attempts to understand the emotions people try to express in text, it faces dozens of signals that indicate the presence of emotions. In addition to adjectives and nouns such as “cranky” or “merry” representing emotions, strong signals are sent by linguistic constructs such as idioms, adjective phrases, adverbs and others, including:

Signals of opinions in text

The ability to process abbreviations, interjections, capitalization and emoticons is very handy for dealing with informal text including tweets and blog comments. In fact, when processing informal text so common in Twitter, the goals of sentiment analysis are a bit different than when processing fully formed sentences of an WSJ editorial. Building sentence trees is almost useless for tweets. There, after we tokenized sentences into words, and ran part-of-speech tagging, we stop at the chunking step and do not proceed to parsing at all.

The flat, sequential nature of written text also imposes some very unique challenges for a computer algorithm. For example, should emotion signals in quotes be associated with the author of the news post that is being analyzed? (In our opinion, they should not) How should we deal with emotions in question sentences? (The person asking the question seldom carries that emotion). What about the favorite example in every sentiment analysis textbook — irony? The words say one thing, while both the reader and the writer know they mean something else, based on the context of the previous conversation. To handle questions, quotes, and irony, we introduced a special sentiment dimension and called it “Ambiguous”. “Ambiguous” sentiment is a catch-all sentiment that we have to use when we find phrases that would be normally one of the 8 basic emotions (according to Plutchik’s theory), but are located in quotes or in question sentences or are possibly ironic.

So far we talked about emotions in text, but there are other interesting signals in text that could be considered opinions — word phrases representing qualities, such as “correct” or “elegant”. Expressions of quality cannot typically be mapped to one of the Plutchik’s emotion, but they do usually carry a positive or negative connotation. Sometimes quality words can have different meaning depending on the context, e.g. “massive flood” is negative, while “massive investment” is positive. We use our idiom dictionary to determine the semantic orientation, but when we can’t find a known idiom, we again use the Ambiguous dimension to flag such a phrase occurrence.

Lastly, when processing text sourced from the Internet, especially from non-moderated sites, one comes across profanity that one would not want to republish or repost in a research paper or site. We’ve added the ability to recognize several thousand words and idioms that are considered racist, pornographic, and profane, and flag such occurrences under the “profane” dimension. We also added the “unsafe” sentiment dimension to flag phrases that refer to sexuality, drug use, criminal activity and other potentially delicate opinions.

Let’s also talk about what happens to the entire news article once we processed all constituent sentences of that article. Knowing the subjects of each sentence allows us calculating statistics for the entire article of what the most important subjects are, taking into account the frequency of their occurrence, and their placement in sentences with strong sentiment. Our algorithm produces up to 7 tags to classify the topic of the article (we borrowed the term Tag from the social media context). The tags, together with all the opinions we extracted from the article form what we call a Content Index of the article. Here is an example of a Content Index for an article reporting on a UEFA soccer game between the Besiktas and Napoli soccer clubs.

TEXT STATS Length [3408] Paragraphs [18] Unique Entities [118]
TAGS Begin:
8.5 "Besiktas" [Good Topic]
6.0 "14-time national champion refusal" [Good Topic]
5.9 "Liverpool historic 2005 Istanbul triumph" [Good Topic]
5.5 "Napoli" [Good Topic]
4.9 "quarter-final appearance" [Good Topic]
4.7 "runners-up Bayern Munich" [Good Topic]
4.6 "Turkish" [Good Topic]
Sentiment {0} Tags: Besiktas, quarter-final appearance
Sentiment {0} Dominant Valence: positive
Sentiment {0} Total Sentiment Score: 34.0
Sentiment {0} Annotated Text: {{Besiktas}} will <<hope>> that their <<best>> previous showing— a 1987 European Cup {{quarter-final appearance}}— will not prove a <<poor>> omen.
Sentiment {0} Serialized Representation: V2|2|Besiktas|0|7|1|equarter-final appearance|73|96|1|e3|14|17|1|t116|119|1|n30|33|1|p
Sentiment {1} Tags: Besiktas, Liverpool historic 2005 Istanbul triumph, Turkish
Sentiment {1} Dominant Valence: positive
Sentiment {1} Total Sentiment Score: 30.0
Sentiment {1} Annotated Text: That {{Turkish}} comeback, rekindling memories of {{Liverpool's historic 2005 Istanbul <<triumph}>, has left the group outcome on a knife-edge with {{Besiktas}} knowing that a <<win>> at <<eliminated>> Dynamo Kiev will send them through.
Sentiment {1} Serialized Representation: V2|3|Besiktas|138|145|1|eLiverpool historic 2005 Istanbul triumph|46|87|1|eTurkish|5|11|1|e3|169|178|1|n81|87|1|j162|164|1|j
Sentiment {2} Tags: Besiktas
Sentiment {2} Dominant Valence: positive
Sentiment {2} Total Sentiment Score: 20.0
Sentiment {2} Annotated Text: The winner of Benfica-Napoli in Lisbon will <<qualify>> as group winners, while a draw would be enough to send the latter through given their head-to-head edge. If {{Besiktas}} lose, both will <<progress>> regardless.
Sentiment {2} Serialized Representation: V2|1|Besiktas|160|167|1|e2|44|50|1|p185|192|1|p
Sentiment {3} Tags: Besiktas
Sentiment {3} Dominant Valence: negative
Sentiment {3} Total Sentiment Score: 20.0
Sentiment {3} Annotated Text: Portugal's Benfica, European champions back in 1961 and 1962 in their <<halcyon>> <<days>> but who have underachieved since, should have secured their passage on matchday five but <<threw>> away a three-goal lead at {{Besiktas}}.
Sentiment {3} Serialized Representation: V2|1|Besiktas|204|211|1|e3|172|176|1|s78|81|1|g70|76|1|g

I hope I was able to provide you with more insight into how Sirocco works. As a parting thought, here is why I think the knowledge of how a person feels towards a subject matters.

Commercial/practical uses of sentiment/opinion analysis of text

When you know that someone is anticipating or is interested in something, you can work with that person to see if you can offer them what they are looking for. Alternatively, when someone is experiencing sadness, disgust, or anger towards a subject, then it’s a good indicator that they are looking for an alternative. When someone is positive, or experiences joy about a subject, you can use that to influence positively other people in their circle. Conversely, when someone is negative about something, you can use that to promote the alternative.

Leave a Reply