arbeiten:computergestuetzte_bildanalyse_implementierung_und_evaluation_der_automatisierten_klassifikation_von_visual_frames_btw_21

Exploration of Image Classification with Multimodal LLMs in the Context of Political Communication with Visual Frames

Thema:
Exploration of Image Classification with Mulitmodal LLMs in the Context of Political Communication with Visual Frames
Art:
BA
BetreuerIn:
Michael Achmann
BearbeiterIn:
Viktoria Stasinski
Status:
in Bearbeitung
Stichworte:
Computational Social Science, Image Classification, LLM, Annotation, Evaluation, Political Communication
angelegt:
2024-04-04
Antrittsvortrag:
2024-07-29

Background

Political communication is increasingly taking place through social media, with Instagram being the most important platform. Instagram serves as a key tool for embodying political ideologies through image-based emotional communication, constructing an online identity that represents the visual frame of the political candidate[1]. Voters have a mental image of what an ideal political candidate should be like. Since visual information is processed via emotional pathways in the brain, it is inherently affection-laden and thus significantly shapes public opinion[2]. To study the image of political candidates on Instagram Grabe and Bucy´s framing theory is utilized[2]. Particularly in election campaigns, two visual frames stand out, which can be identified by specific characteristics: the ideal candidate and the populist campaigner[2]. Comparative studies analyzing these visual frames across seven different democratic countries provide valuable insights into how various visual strategies are employed in political communication[3]. Since the political communication is taking place on social media via visual content it is possible to analyze the data through multimodal Large Language Models (MLLMs). The use of MLLMs offers an interesting research area in image classification. MLLMs integrate not only text but also visual data using vision transformers[4]. Models like CLIP and GPT-4 Vision, combined with prompt engineering, are already being tested for classifying various images. CLIP is trained by matching each input image to the most relevant text description based on 400 million image-text pairs collected from the internet CLIP is designed for the classification of single features[5], while GPT-4 can classify multiple features at once[5, 6]. A combination of different models is also possible, with one version of GPT creating the prompts and CLIP performing the classification, taking benefit of the LLM-based knowledge for image classification[7]. Prompts, which are specific instruction signals or inputs directed at a model, play a crucial role in achieving precise and relevant results[4]. To evaluate the results it is important to compare the data with human annotations to ensure the accuracy and reliability[8].

Objective of the Work

Since political communication is increasingly taking place via social media, it is crucial to examine this communication method and identify specific patterns. Social media allows candidates to present a distinct identity. In this context, Visual Frames, which depict the representations of candidates, serve as an intriguing subject of investigation. This study will conduct image classification using GPT-4 Vision, utilizing an Instagram dataset from the 2021 German election campaign. The objective is to classify images based on the characteristics of two Visual Frames using prompts. Two approaches will be compared: classifying a single feature of a Visual Frames versus classifying all features of the Visual Frames. To validate the quality of the computer-based approach, the results will be compared with human annotations.

Specific Tasks

  1. Literature research
  2. Development of an annotation study to establish a Ground Truth dataset: Development of annotation guidelines and software-supported collection of annotation data using the VP hours system.
  3. Quality control of annotations and potential revision of annotation guidelines
  4. Implementation of the classification model / development of a suitable prompt
  5. Quality control of classification and potential revision of prompts / model
  6. Reporting and interpretation of annotation and classification quality (Results).Integration of results into existing literature (Discussion).

Expected Prerequisites

  • Proficiency in Python
  • Experience with Large Language Models (LLMs) and prompting is
  • Implementation in Jupyter Notebooks using Python and pandas

Sources

  • [1] Gordillo-Rodríguez, M. T., & Bellido-Pérez, E. (2023). The visual frame of the political candidate on Instagram: the 2021 Catalan regional elections. Dígitos. Revista de Comunicación Digital, (9). https://doi.org/10.7203/drdcd.v0i9.260
  • [2] Grabe, M. E., & Bucy, E. P. (2009). Image Bite Politics: News and the Visual Framing of Elections. Oxford University Press, USA.
  • [3] Steffan, D. (2020). Visual self-presentation strategies of political candidates on social media platforms: A comparative study. International Journal of Communication.
  • [4] Wu, T., Ma, K., Liang, J., Yang, Y., & Zhang, L. (2024). A comprehensive study of multimodal large language models for image quality assessment. arXiv preprint arXiv:2403.10854. https://doi.org/10.48550/arXiv.2403.10854.
  • [5] Deng, S., Wu, L., Shi, G., Xing, L., Jian, M., Xiang, Y., & Dong, R. (2024). Learning to compose diversified prompts for image emotion classification. Computational Visual Media, 1-15. https://doi.org/10.1007/s41095-023-0389-6.
  • [6] Achmann-Denkler, Michael. 2024. “Image Classification.” January 22, 2024. https://doi.org/10.5281/zenodo.10039756.
  • [7] Tzelepi, M., & Mezaris, V. (2024, June). Exploiting LMM-based knowledge for image classification tasks. In International Conference on Engineering Applications of Neural Networks (pp. 166-177). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-62495-7_13.
  • [8] Achmann-Denkler, Michael. 2023. “Human Annotations.” December 18, 2023. https://doi.org/10.5281/zenodo.10039756.

Visual Frames

  • [2] Grabe, M. E., & Bucy, E. P. (2009). Image Bite Politics: News and the Visual Framing of Elections. Oxford University Press, USA.
  • Gordillo-Rodríguez, M.-T., & Bellido-Pérez, E. (2023). The visual frame of the political candidate on Instagram: the 2021 Catalan regional elections. Dígitos. Revista de Comunicación Digital, 0(9). https://doi.org/10.7203/drdcd.v0i9.260