Our media landscaped is shaped by different perspectives on any given issue. What choice of words frames political debate in online media?
Perspectives manifest through the way language is used around an issue. By examining the emotional affect of the used language we can surface patterns that influence the way we think about the issue more subtle than the stated positions in the very same article.
Building an excerpt of the media landscape
In order to build a rough representation of the variety of news outlets, we handpicked four websites ranking from liberal to conservative :
We scraped 1000 articles of each outlet by starting on the front page and then randomly following links to other articles. The full text and metadata of each article was then written into a database.
We then ran quantitative analysis on this dataset. For example, counting articles, that are using a certain combination of words and then sorting them by their outlet. This reveals first patterns about what the outlets tend to write about:
Articles containing the phrases „hillary“ and „email“Articles containing the phrases „trump“ and „grab“
How do words affect us?
Based on the „norms of valencevalence: the pleasantness of the stimulus, arousalarousal: the intensity of emotion provoked by the stimulus, and dominancedominance: the degree of control exerted by the stimulus “ ( Wariner et al.) we analyzed each article to gather information about the affective meaning of the chosen words. As the following examples show, we can calculate a mean value of each dimension:
„Obama’s decision to commute Chelsea Manning’s sentence is a reasonableact of mercy.“
Ø Valence: 5.6Ø Arousal: 3.42Ø Dominance: 5.92
„To give Manning a commutation is a slap in the face to those in uniform.“
Ø Valence: 5.3Ø Arousal: 4.32Ø Dominance: 5.28
The big picture: conservative media less valent
By calculating the mean value of each dimension, we can compare the affective rating of all articles with each other: Looking at the distribution of all articles in a scatterpot reveals a pattern: As we can see, conservative media writes in a slighty negative, but less dominant language, while only blue dots of liberal media are found in the top right corner, stating more positivity in the choice of words:
To get a more clear view on the confusing middle section of the scatter-plot, we view the valence dimension only in a streamgraph, counting, how many articles are published with a certain value of valence:
By overlapping the outlets we can highlight the differences in terms of language valence. While the conservative media has a tendency to use a less valent language, the more liberal outlets orient in the opposite direction, using a more positive language.
Getting the detail: Who talks about which topic in what language?
A more granular approach onto text analysis is to not look at the full text of each article, but rather have a closer look to the affective context to given topics. For the following chart, we look at the valence rating for words, that are within the same sentence on a given topic. Words further to the lef imply a less positive contextualization.
Discussion & next steps
Scaling and refining the data set
A bigger haystack reveals more needles. Our data set consists of 4.000 articles, which is only a fraction of the available articles. We assume, that by scraping more articles, the big picture becomes more clear.
Also, a more sophisticated approach on what data to collect will thin out false positives. For example, we randomly followed links to other articles in various outlet departments. By strictly sticking to the politics department - e.g. politics - will improve our results.
Natural Language Processing
Our analysis runs on rather trivial methods like counting words or measure the affective rating for a single word. We assume, that this makes our results less clear - specially when we work with mean values of affective ratings.
In order to dig deeper into the subtle functionality of human language - like negation, sentence structure or even irony - a more complex approach like Natural Language Processing is needed.
Critically review our paradigm
Our view on the data was impaired by the idea, that the compared outlets will vary in the language used. As of now, we don’t know whether a larger, better structured data set and more complex methods of analyses will confirm the emerging signs, that we’re onto something.
We must critically review our look on the data, their aggregation and our analysis to really be able to judge a systematic pattern in the language of news outlets.