Acquisition and Exploration of Text Corpora for the Supernatural TV Series and Fanfictions via Computational Text Analysis

Acquisition and Exploration of Text Corpora for the Supernatural TV Series and Fanfictions via Computational Text Analysis
Thomas Schmidt
Nina Kleindienst
Christian Wolff
Text Mining, Fan Fiction, Digital Humanities, Computational Literary Studies, Fan Studies, Internet Studies


Fan fictions are an important part of today's reading culture and have gained more and more interest as data material in the Natural Language Processing community. However, fan fictions themselves have become objects of research in various research areas in the humanities like gender studies, fan studies, literary studies and internet studies. Furthermore, various projects in Digital Humanities explore fan fictions via computational text analysis. One interesting point is the question on how the fan community transforms the original work when creating fan fictions.

Zielsetzung der Arbeit

In this thesis, we want to investigate the transformation process between original and fan created fan fictions via methods of computational text analysis on large-scale corpora. As case study for the analysis, we select the popular tv show „Supernatural“ which is among the most popular and famous material for fan fictions. To investigate differences between original and fan fictions we look at the following variables: * character mentions and character networks * sentiment and emotion expression * frequencies of various word types like gender specific words * other metrics of intertextuality

Konkrete Aufgaben

  • Investigating the related work for this topic
  • Formulation of research question and task agenda
  • Acquisition and preparation of an adequately sized corpus of fan fictions
  • Acquisition and preparation of a corpus of Supernatural scripts
  • Analysis of the corpora via text mining methods to investigate differences between the material

Erwartete Vorkenntnisse

Experience in

  • Python
  • computational text analysis
  • statistics

Weiterführende Quellen

  • Barcellos, P. da S. C. C., Reategui, E. B., Polonia, E., & Black, R. (2020). DIGITAL LITERACY IN FOREIGN LANGUAGE THROUGH TEXT MINING AND FANFICTION WRITING. Revista X, 15(3), 101–125.
  • Büchler, M. (2018). The Re-Creation Of Harry Potter: Tracing Style And Content Across Novels, Movie Scripts And Fanfiction. 4.
  • Busse, K. (2006). Fan Fiction and Fan Communities in the Age of the Internet: New Essays (Illustrated Auflage). McFarland and Company, Inc.
  • Campbell, J., Aragon, C., Davis, K., Evans, S., Evans, A., & Randall, D. (2016). Thousands of Positive Reviews: Distributed Mentoring in Online Fan Communities. Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, 691–704.
  • Carvallo, A., & Parra, D. (2020). Analyzing Network Effects on a Fanfiction Community. arXiv:1909.02886 [cs].
  • Cuntz-Leng, V., & Meintzinger, J. (2014). A brief history of fan fiction in Germany. Transformative Works and Cultures, 19.
  • De Kosnik, A., El Ghaoui, L., Cuntz-Leng, V., Godbehere, A., Horbinski, A., Hutz, A., Pastel, R., & Pham, V. (2015). Watching, creating, and archiving: Observations on the quantity and temporality of fannish productivity in online fan fiction archives. Convergence, 21(1), 145–164.
  • Duggan, J. (2017). Revising Hegemonic Masculinity: Homosexuality, Masculinity, and Youth-Authored Harry Potter Fanfiction. Bookbird: A Journal of International Children’s Literature, 55(2), 38–45.
  • Duggan, J. (2020). Who writes Harry Potter fan fiction? Passionate detachment, „zooming out,“ and fan fiction paratexts on AO3. Transformative Works and Cultures, 34.
  • Dym, B., Aragon, C., Bullard, J., Davis, R., & Fiesler, C. (2018). Online Fandom: Boldly Going Where Few CSCW Researchers Have Gone Before. Companion of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing, 121–124.
  • Fast, E., Chen, B., & Bernstein, M. (2016). Empath: Understanding Topic Signals in Large-Scale Text. Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 4647–4657.
  • Fast, E., Vachovsky, T., & Bernstein, M. S. (o. J.). Shirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias in an Online Fiction Writing Community. 9.
  • Frens, J., Davis, R., Lee, J., Zhang, D., & Aragon, C. (o. J.). Reviews Matter: How Distributed Mentoring Predicts Lexical Diversity on 7.
  • Gunderson, M. (2017). What is an omega? Rewriting sex and gender in omegaverse fanfiction.
  • Jamison, A., & Grossman, L. (2013). Fic: Why Fanfiction Is Taking Over the World (Illustrated Auflage). Smart Pop.
  • Jing, E., DeDeo, S., & Ahn, Y.-Y. (2019). Sameness Attracts, Novelty Disturbs, but Outliers Flourish in Fanfiction Online. arXiv:1904.07741 [cs].
  • Kim, E., & Klinger, R. (o. J.). Frowning Frodo, Wincing Leia, and a Seriously Great Friendship: Learning to Classify Emotional Relationships of Fictional Characters. 10.
  • Liu, C., Osama, M., & de Andrade, A. (2019). DENS: A Dataset for Multi-class Emotion Analysis. ArXiv:1910.11769 [Cs].
  • Macskassy, S. A. (2011). Contextual linking behavior of bloggers: Leveraging text mining to enable topic-based analysis. Social Network Analysis and Mining, 1(4), 355–375.
  • Milli, S., & Bamman, D. (2016). Beyond Canonical Texts: A Computational Analysis of Fanfiction. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2048–2053.
  • Muttenthaler, L., Lucas, G., & Amann, J. (o. J.). Authorship Attribution in Fan-Fictional Texts given variable length Character and Word N-Grams. 9.
  • Pianzola, F., Rebora, S., & Lauer, G. (2020). Wattpad as a resource for literary studies. Quantitative and qualitative examples of the importance of digital social reading and readers’ comments in the margins. PLOS ONE, 15(1), e0226708.
  • Pope, J. H. (2020). “the rest is …”: Shakespeare and Online Fan Fiction. In J. H. Pope (Hrsg.), Shakespeare’s Fans: Adapting the Bard in the Age of Media Fandom (S. 99–132). Springer International Publishing.
  • Rebora, S., Boot, P., Pianzola, F., Gasser, B., Herrmann, J. B., Kraxenberger, M., Kuijpers, M. M., Lauer, G., Lendvai, P., Messerli, T. C., & Sorrentino, P. (2019). Digital Humanities and Digital Social Reading [Preprint]. Open Science Framework.
  • Rebora, S., & Pianzola, F. (2018). A New Research Programme for Reading Research: Analysing Comments in the Margins on Wattpad. DigitCult | Scientific Journal on Digital Cultures, 3.2, 19–36.
  • Rossdal, M. (2015). All of the Greek and Roman Classics. 32.
  • Thomas, B. (2011). What Is Fanfiction and Why Are People Saying Such Nice Things about It? Storyworlds: A Journal of Narrative Studies, 3.
  • Tosenberger, C. (2008). Homosexuality at the Online Hogwarts: Harry Potter Slash Fanfiction. Children’s Literature, 36(1), 185–207.
  • Van Steenhuyse, V. (2011). The Writing and Reading of Fan Fiction and Transformation Theory. CLCWeb: Comparative Literature and Culture, 13(4).
  • Vilares, D., & Gómez-Rodríguez, C. (2019). Harry Potter and the Action Prediction Challenge from Natural Language. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2124–2130.
  • Yatrakis, C. (o. J.). Fan fiction, fandoms, and literature: Or, why it’s time to pay attention to fan fiction. 72.
  • Yin, K., Aragon, C., Evans, S., & Davis, K. (2017). Where No One Has Gone Before: A Meta-Dataset of the World’s Largest Fanfiction Repository. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 6106–6110.
  • Zhang, W., Kit Cheung, J. C., & Oren, J. (2019). Generating Character Descriptions for Automatic Summarization of Fiction. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 7476–7483.