Analysis methods (2021-02-08)

Tagged as: blog, analysis methods
Group: G_20/21 Description of the methods for the analyses

For the evaluation of the collected data we perform different analysis. Each of it will be executed quantitative with an additional qualitative analysis in detail for important data sets.

First we did general analyses like the amount of COVID-tweets compared to all tweets, the amount of verified accounts per party or the use of hashtags, mentions and links per politician and party. Now we are going deeper by using sentiment analysis, content analysis and network analysis.

Sentiment Analysis

In our first approach we wanted to use VADER1 for the sentiment analysis because it is said to be good for social media texts. However, because of translation issues we decided to use textblob2 which already comes with a german language extension. So we are able to get the sentiment, the polarity and the subjectivity of the tweets.

Content Analysis

We decided to identify the topics in the tweets through the LDA probability model. With the library gensim3 we are able to generate a LDA-Model. By varying the amount of topics and comparing the resulting coherence value, we determine the optimal number of topics. With the library pyLDAvis4 the topics can be visualized and clustered. Our aims are for example to demonstrate changes over the time or dominant topics of parties.

Network Analysis

Follower networks can be represnted by nodes (Twitter accounts) and edges (follower relations between the accounts). networkx5 is a Python library suited for this kind of data modelling. We plan to analyze the network created with our collected data further, but for the time being we simply created a first graph showing (very poorly) the relations between the Twitter accounts. The colored dots represent a account (color signifying party affiliation), while the arrows show the follower relations between the accounts.

Additionally, we want to associate COVID-data (like case numbers, deaths, etc) with the twitter data. For this purpose, we will use data provided by OurWorldInData.org


1https://pypi.org/project/vader-sentiment/

2https://pypi.org/project/textblob-de/

3https://pypi.org/project/gensim/

4https://pypi.org/project/pyLDAvis/

5https://pypi.org/project/networkx/