Big Data Analysis | December 15, 2016
We undertook the following analysis with the aim of retesting the thesis that there exists a close correlation between public opinion and a critical amount of content in online media, which was the subject of our initial analysis.
This study leverages two starting points repeatedly proven by Semantic Visions. Judging from the Big Data, which is transformed into Smart Data in our semantic system, two principle factors influence the election result:
a) Frequency of mentions of the respective candidates (this essentially amounts to the extent of the media profile of the candidates). Candidates with a significantly lower media profile do not have a chance of success, whereas candidates with a significantly higher amount of mentions in the media have far higher chances.
b) When the media profiles of candidates are relatively equal, Sentiment Balance is the decisive factor with trends thereof playing an important role in the final weeks and days prior to the vote.
Types of Sources Analyzed
In addition to articles from established online media sources, Semantic Visions also collects and analyzes content of webpages which publish news reports focused on specific topics: politics, the economy, business, security, science among others. Generally speaking, the authors of such articles and analyses publish facts, detailed information and answers to questions such as “who, what, when, where, why and how“.
Logically structured informative articles of this type contain an average of 3,100 characters. When processed by Semantic Visions‘ sematic analytical system, such articles provide much more
informative content for analysis than simple tweets which often lack logical structure and which have an average length of between 70 to 120 characters (source: MIT – Massachusetts Institute of Technology).
For the Semantic Visions, online social networks are an indivisible part of cyberspace, but provide only a limited amount of information useful for the purposes of deeper analysis. We monitor online social networks including Facebook and Twitter more effectively by using collective knowledge and intelligence of hundreds of thousands of editors and authors of articles who decide what is important and what is not.
However, in order to better understand the results of the US presidential election we conducted additional reverse analysis of Twitter, which produced some surprising results.
1. Input Data
Period of data collection and analysis: March 1, 2016 – November 7, 2016
Number of English-language documents analyzed: 116,291,957
Number of sources monitored: 277,604
Throughout the analysis period Semantic Visions processed over 232 million documents in 11 languages.
This report focuses on English-language sources only and all documents acquired were semantically processed in the Semantic Visions system. The output semantic metadata enabled us to conduct thorough analysis of the documents which were relevant for our purposes; in this case the subject being the US presidential election, which we detected with our predefined semantic concept.
The report also comprises quantitative analysis based upon the total of so-called “fragments” about the individual candidates. Sentences and phrases in close proximity to the person subject to analysis qualify as fragments. Several fragments can be found in a single document and therefore the quantity of fragments is more relevant than the quantity of documents.
The analysis period includes the primaries of the two main political parties in the US, the Democrats and Republicans. The primaries of both parties culminated in the conventions of both parties, which were held in July 2016 and at which the presidential candidates of both parties were nominated. The candidates for Vice President were also nominated at the conventions (it is traditional practice for the conventions to nominate the vice presidential candidates proposed by the nominated presidential candidate). The following part of the analysis was the pre-election campaign (including detailed analysis of the presidential debates) and Election Day.
The four-month marathon of the parties’ primaries began on February 1, 2016 with rallies of both the Republican and Democrat parties in Iowa, which was the first real comparison of strengths. The primaries are conducted differently by each party and also have different procedures on the state level. The aim of the primaries is to select delegates who will vote for the party’s presidential
candidate at the respective party conventions in July. As such the aspiring delegates proclaim their support for their candidate of choice. In addition to the delegates, so-called super delegates also vote for the presidential candidates at the conventions and the latter are free to vote for the candidate of their choice at their own discretion. Super delegates include congressmen, governors and other party functionaries. Of the two main parties it is the super delegates of the Democratic Party who have a greater influence in selecting their party’s presidential candidate.
In this year’s primaries, the Republicans had more candidates, though from the outset there were three clear favorites: Donald Trump, Ted Cruz and Marco Rubio. With the Democrats, there were two main contenders: Hilary Clinton and Bernie Sanders.
Already in the first week of the primaries in Iowa the real chances of the individual candidates became apparent and following the primaries in New Hampshire, the first candidates dropped out of
the race and declared their support for a party colleague still in the race. The next milestone in the primaries was so-called Super Tuesday, March 1, 2016, when 15 states selected their candidate of choice.
The Republicans Carly Fiorina and Jeb Bush had already dropped out in February and as a result of Super Tuesday, Ben Carson renounced his candidacy. On the basis of the documents analyzed, it is evident that media coverage of a candidate falls considerably after renouncing their candidacy. In terms of amount of media coverage, Donald Trump was the leader among Republicans for the whole period of the primaries.
Despite the fact that the results of sentiment analysis show overall coverage of Donald Trump was negative, while especially Ted Cruz enjoyed more positive coverage, Cruz was ultimately unsuccessful and renounced his candidacy on May 4, 2016.
The battle for the presidential candidacy among the Democrats was limited to two candidates, Hilary Clinton and Bernie Sanders. The Democratic Party primaries were considerably closer than those of the Republicans and were not decided until the final stages when Sanders eventually dropped out on June 17, 2016.
From this graph it is evident that Hilary Clinton received greater media coverage than Sanders, but for the primaries overall, this advantage was not as large as that of Donald Trump compared to his Republican rivals.
The primaries conclude with the party conventions where the parties’ presidential candidates are elected.
- July 18 – July 21, 2016 (Cleveland, Ohio)
- Candidate for the Presidency of the USA – Donald John Trump
- Candidate for Vice President – Mike Pence
- July 25 – July 28, 2016 (Philadelphia, Pennsylvania)
- Candidate for the Presidency of the USA – Hillary Diane Rodham Clinton
- Candidate for Vice President – Timothy Michael Kaine
Other candidates campaigned for the US presidency but with little chance of success:
- Gary Johnson – Libertarian Party
- Jill Stein – Green Party
- Darrell Castle – Constitution Party
- Evan McMullin – Independent
This analysis focuses on the candidates of the two main political parties in the USA – Donald Trump and Hillary Clinton.
Hillary Clinton vs. Donald Trump
From the analysis we can observe that quantity of media coverage was a decisive factor. From the beginning of March through to Election Day, Donald Trump’s media presence was significantly higher than that of Hillary Clinton. And in the preceding primaries this factor proved decisive.
This trend can also be observed during the campaign proper following the national party conventions. In terms of quantity of media coverage, Hillary Clinton trailed Donald Trump for
the entire campaign except for the final days when both candidates received pretty much the same amount of coverage.
For the entire period analyzed from March 1, 2016 to November 7, 2016, Donald Trump received almost twice as much media coverage than Hillary Clinton.
The following graph displays the development of media coverage of the two main candidates over the entire analysis period.
The following graphs illustrate the development of positive and negative sentiment and the resulting Sentiment Balance (percentage difference between positive and negative sentiment).
During the national party conventions both main parties experienced a large growth in positive sentiment, but a week after the conventions ended, the sentiment returned to previous levels. A
similar scenario is identifiable during the presidential debates but in this case there was a growth in negative sentiment.
As for the “positive sentiment peaks” for Donald Trump, we can identify the period around May 4, 2016 when he was first named as the leading Republican candidate. The case was similar for Hillary Clinton around June 8, 2016, when she was tipped as the victor of the Democrat Party primaries. Both candidates were officially nominated as their parties’ candidates at their respective conventions in July.
For the entire analysis period Donald Trump received more mentions, but with a few exceptions he received more negative sentiment than Hillary Clinton. Here we can observe an objective reflection of the fraught nature of the election campaign in which the supporters of both candidates were extremely critical of the opposition.
Sentiment analysis returned positive values for both Hillary Clinton and Donald Trump during the Democrat Party and Republican Party conventions respectively, and also for the latter at the
beginning of September when studies first emerged about his potential victory. At that point various polls and studies indicated that pre-election preferences evened.
Development of the Pre-Election Campaign
The three debates between the main candidates and the one between the two vice-presidential candidates are integral elements of the US election campaign. Candidates polling over 15%
participate in the debates though in this year’s campaign only Donald Trump and Hillary Clinton passed this threshold.
First Presidential Debate
September 26, 2016 – Hofstra University, Hempstead, New York
The debate was hosted by Lester Holt and the candidates responded to questions concerning national security, the future course of the USA, and the prosperity of the USA. According to the
mainstream media, Hillary Clinton won this debate. The results of our analysis including “long-tail” web news yielded a similar result, corresponding with the predominant opinion of mainstream media analysts and commentators.
The first debate pushed sentiment for both candidates into negative figures. Twenty-four hours later however, the impact of the debates subsided and sentiment returned to pre-debate values, albeit with a slight rise in positive sentiment for Hillary Clinton and a slightly more negative sentiment for Donald Trump. In this regard Hillary Clinton can be considered as the winner of the first debate.
Taking a closer look we can analyze how each issue discussed influenced sentiment during the course of the debates.
Taking as an example the subject of national security and related cyber-security, which was raised in the 62nd minute of the debate, we can see from the graph that the subject caused a growth in
negative sentiment towards both candidates.
Second Presidential Debate
October 9, 2016 – Washington University, St. Louis, Missouri
The second debate, hosted by Martha Raddatz and Anderson Cooper, was highly fraught with Donald Trump being forced to face criticism arising from the publication of recordings of his vulgar
comments about women, while Hillary Clinton had to face accusations about her using a private email server for work purposes when she was Secretary of State. Other subjects included healthcare
reform, taxation, national security and the threat of cyber-attacks. According to the media, Hillary Clinton was again the winner.
The second presidential debate again pushed both candidates’ sentiment ratings into negative figures. We can deduce that this debate caused reactions before it began, which in turn indicates a
level of tense anticipation greater than prior to the first debate. Sentiment returned more or less to pre-debate levels again after 24 hours.
We can observe that in the second debate several subjects caused greater reactions. The sentiment “peak” around the 30th minute relates to the reopening of the issue of Hillary Clinton’s private email server, which correlates to the growth in negative sentiment towards her.
Third Presidential Debate
October 19, 2016 – University of Nevada, Paradise, Nevada
The debate was hosted by Chris Wallace. The main subject was immigration. Another key issue was Hillary Clinton’s emails published recently beforehand by Wikileaks, which she attempted to deflect by criticizing Vladimir Putin and Russia. Donald Trump, however, used this to criticize Hillary Clinton’s foreign policy when she was Secretary of State.
A key element in this debate was the change in tone from Donald Trump who avoided personal attacks on Hillary Clinton and instead emphasized that she had been in politics for 30 years already
and thus had had plenty of time to implement her program. Donald Trump managed to coherently formulate his main message to voters: calling for reform in Washington he presented himself as the force required to bring change to politics. Media polls, however, again indicated that Hillary was the winner of the debate.
The development of the graph of resulting sentiment indicates that Donald Trump was portrayed more negatively than Hillary Clinton by the media both during the debate and in the immediate
period thereafter. Nevertheless, we can deduce that in the space of one day Donald Trump’s ratings returned to levels on a par with those of Hillary Clinton.
In the third debate we can also identify subjects which had a greater influence upon sentiment. For example we can show the segment after the 45th minute when host Chris Wallace reopened the issue of the recording of Trump’s vulgar comments about women and which resulted in a growth in negative sentiment towards the latter. The discussion about the invasion of Iraq at around the 70th minute had a similarly large impact on sentiment in media reports.
Eve of Elections
In the final days following the third debate Hillary Clinton gained positive sentiment ratings.
However, following the announcement by the Director of the FBI of the reopening of the investigation into her use of a private email server for work purposes, her sentiment ratings again
fell. Several days thereafter, growth in Clinton’s positive sentiment ratings resumed and again reached a positive aggregate.
A rise in positive sentiment for Donald Trump can also be observed in the last 14 days of the campaign. Although he did not attain a positive aggregate in this period, the rising trend of sentiment
towards him is clear.
Both candidates began Election Day with close positive and negative sentiment ratings. A fundamental shift occurred shortly after 19:00 EDT when a very sharp growth in positive sentiment in
the media for Donald Trump began.
And this had a major impact upon overall sentiment which followed a similar pattern i.e., while the development of negative sentiment was similar for both Hilary and Donald Trump, the growth in positive sentiment for Trump was crucial.
In his campaign Donald Trump effectively declared war on traditional media. From the outset of the battle for the White House the media favored Hillary Clinton, and Trump reacted by making greater use of social media.
In our analysis of social media networks we focused on Twitter and individual tweets which the candidates posted from their accounts in the final four weeks prior to the election. In all
approximately 850 tweets were sent from each of the main candidates’ accounts. These tweets were analyzed on the basis of frequency of words and phrases used.
Significant words and phrases most used by the candidates:
The statistics for the most-used words and phrases show that Donald Trump’s campaign opted for positive messages such as “Join me“, “Thank you“, “Make America Great Again“, unlike Hillary
Clinton whose core message was “Don’t vote for Donald Trump”.
We consider that both candidates used Twitter as their primary tool for communicating their messages to voters directly without passing through the traditional media, which naturally leads to
degrees of distortion, and to a significant extent – the imposition of the opinions of journalists, or the leanings of a given media outlet.
On the basis of the graphs and information presented above we are able to draw the following conclusions:
- When monitoring a large quantity of online news sources, in aggregate they publish articles almost as quickly as Twitter conversations develop.
- To effectively analyze the presidential debates, which typically play a major role in the election campaign, it is important to monitor sentiment not only during the course of the debates, but also
the “reverberations” lasting a number of hours thereafter. This is because the debates take place in evening hours and detailed analysis and typically more detailed reports are not published before the following morning.
- Our results based purely on Big Data from the presidential debates correlate with the conclusions of analysts and commentators within the mainstream media.
- Contrary to the widely reported conclusions of analysts, Hilary Clinton appeared as more inconsistent and divisive than Donald Trump.
- Following the debates, the sentiment balance for both candidates soon returned to their predebate values; this indicates that in this year’s election the debates did not have any fundamental
influence upon the final result.
- From the examples chosen it transpires that scandals such as the reopening of the FBI investigation into Hillary Clinton’s private email server and the publication of the sexist recording of Donald Trump, only have a short-term influence on sentiment towards both candidates. While these scandals resulted in a growth in negative sentiment, the effect was short-term and the sentiment soon returned to prior levels.
- Semantic Visions’ methodology (whereby in the case of a very large quantity of similar mentions, the resulting sentiment and the trend thereof in the final stage of the campaign is the decisive factor) our analytical data, which takes the USA as a single entity (as opposed to a model based on the results in individual states), indicated that Hillary Clinton would win by a slim margin. And indeed she did win the popular vote by over 2.5 million votes, although she lost the battle for the White House due to the system based on the Electoral College.
- Further analysis of Twitter activity by Donald Trump and Hillary Clinton shows the fundamental difference in style and content of the two candidates; in our opinion this difference greatly helped Trump to win the election. While Hillary Clinton’s tweets were for the most part aimed against Donald Trump (her tweets were essentially negative), by contrast Donald Trump’s tweets were more positive and his core message was “Join me and make America great again”.
- We believe that the voters generally prefer the bearer of a positive message, and that this was alsothe reason why Donald Trump triumphed in the U.S. Presidential Election.