Initial literature research (2020-11-17)

Tagged as: blog, literature research
Group: G_20/21 A short overview of the research process and first findings of the initial literature research, as well as the current state of the research we found.

Research process

To structure our initial search we defined six different directions/topics to investigate:

  • Debate structures on Twitter
  • Social Network Analysis (in general)
  • Misinformation/Fake News on Twitter
  • Sentiment Analysis of Tweets
  • Categorisation of Tweets
  • Datasets of Tweets on COVID-19

We used the following search engines and digital libraries for the inital search:

  • Google Scholar
  • ACM Digital Library
  • Research Gate
  • Google (for datasets)

We use Zotero to store and organise our findings by utilising folders and tags to group and filter search results.

Initial findings and current state of research

The popularity of Twitter as social network, as well as its concise forms of interaction, make it popular subject for research in data science, informatics, and also social sciences. Research on Twitter has been done for some time now, which gives a good basis for this project. The oldest research on COVID-19, on the other hand, can by its own nature only be about one year old. Due to the pandemics enourmous impact on nearly all aspects of life there is nonetheless no shortage of published papers on the subject. The combination of these two highly researched topics gives us plenty of papers, articles and datasets to work with.

Debate structures on Twitter

An essential part of the usage of Twitter is the interaction with other users. The limited means of interaction, mainly following, liking, retweeting, and commenting allow for very structured analysis of discussions and debate on the platform. There is a plethora of research done on how users act when confronted with like-minded and opposing opinions, how they can be grouped, and how these online discussions can shape the public perception of the debated topics.

Social Network Analysis

Twitter as a social network is particularly suitable for social network analysis, as the focus of use is on interaction with other users. One area of research is concerned with mapping these relationships between users as a kind of network, whereby key individuals, groups within the network and / or associations between the persons can be identified.

Misinformation/Fake News on Twitter

Spreading false information through social networks has always been a problem, but nowadays it has even more serious effects. Riots were caused by misinformed tweets as well as false and dangerous “cures” for COVID-19 were carry out into the world, to give just two examples. The wide and rapid spread of misinformation, called infodemic, has a great reach in social networks like Twitter. Especially in the last year many researchers developed methods to identify misinformation on Twitter and did fact-checks on tweets regarding COVID-19.

Sentiment Analysis of Tweets

With natural language processing and text analysis it is possible to figure out the mood of a twitter user. Combined with checking for swearing, cyberbullying, codewords and smileys the results are very accurate. A lot of research is focused on the sentiment analysis of tweets, and lately many on COVID-19 tweets. To have a clue about the sentiment of the people is also important for the government considering the rising suicide rates coming along with the crisis. Also, to evaluate the reactions to government decisions can be useful.

Categorisation of Tweets

There is plenty of research on Twitter that deals with categorizing the tweets of users. The methods used for this are varied. For example, the reactions of the public to the Covid pandemic or prevailing myths regarding Covid-19 can be classified.

Datasets of Tweets on COVID-19

There are a lot of datasets which aggregate tweets concerning COVID-19. Many of these datasets focus on the big initial phase of the pandemic, early to mid 2020, although there are some projects which regularly update their data. Differences can also be made out by the way the data is annotated, focusing on geolocation data or already marking NLP entities. Most of the tweets in these datasets are in English, but large enough samples for other languages can also be found. Depending on the final assignment for this project, it is likely that an adequate dataset can be found.

References

  • Durmus, E., & Cardie, C. (2019). Modeling the Factors of User Success in Online Debate. The World Wide Web Conference, 2701–2707. https://doi.org/10.1145/3308558.3313676
  • Chowdhury, F. A., Allen, L., Yousuf, M., & Mueen, A. (2020). On Twitter Purge: A Retrospective Analysis of Suspended Users. Companion Proceedings of the Web Conference 2020, 371–378. https://doi.org/10.1145/3366424.3383298
  • Al-Rakhami, Mabrook & Al-Amri, Atif. (2020). Lies Kill, Facts Save: Detecting COVID-19 Misinformation in Twitter. IEEE Access. PP. 1-1. https://doi.org/10.1109/ACCESS.2020.3019600
  • Lwin, May & Lu, Jiahui & Sheldenkar, Anita & Schulz, Peter & Shin, Wonsun & Gupta, Raj & Yang, Yinping. (2020). Global sentiments surrounding the COVID-19 pandemic on Twitter. JMIR Public Health and Surveillance. https://doi.org/10.2196/19447
  • Mustafa, A., Ansari, I., Mohanta, S., & Balla, S. (2020). PUBLIC REACTION TO COVID-19 ON TWITTER: A THEMATIC ANALYSIS. EPRA International Journal of Multidisciplinary Research (IJMR). https://doi.org/10.36713/epra4518
  • Dimitrov, D., Baran, E., Fafalios, P., Yu, R., Zhu, X., Zloch, M., & Dietze, S. (2020). TweetsCOV19—A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic. Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2991–2998. https://doi.org/10.1145/3340531.3412765