Cecilia Aragon

Winter 2021

Human Centered Natural Language Processing on Fanfiction Reviews (Ghosh, Aragon)

This research group will investigate the role played by shared or conflicting emotions in the process of relationship-building in online communities. We aim to achieve an understanding of that role through extensive qualitative coding of individual fanfiction reviews. This work will be the final quarter of an ongoing research project on this topic, utilizing subsets of a large dataset of fanfiction data previously collected by the Human-Centered Data Science Lab in previous years. The goal will be to finish our qualitative coding via a novel collaborative coding and visualization tool, and to contribute to a research paper to be submitted to CSCW this academic year.

We are looking for up to five students with some prior experience in qualitative coding or data visualization, or a keen interest in the same. Participants will be expected to have a time commitment of 2 to 5 hours a week, outside of a weekly group meeting time on Thursdays from 3 to 4:30 p.m. PST. If interested, please email G Ghosh ( an application consisting of an unofficial transcript, your resume, and a paragraph explaining why you're interested in this research project.

Winter 2021

Human Centered Natural Language Processing and Text Visualization

Note: This DRG is full for Winter and no longer accepting applications.

This DRG will be conducted remotely over Zoom.

This research group will apply human-centered techniques to the field of natural language processing (NLP) to study very large text corpora, with an additional focus on text visualization. We’re looking for students with experience in either (a) programming and analysis of large text datasets or (b) machine learning and data science. No NLP experience is required as we will be reading seminal papers in the field and applying those techniques to a text dataset.

We plan to use a previously-collected dataset of over 61.5 billion words (the largest fiction dataset outside of the Google Books corpus) of stories, reviews, and associated metadata from fanfiction sites as a test dataset for human-centered NLP techniques.

The group will meet on Wednesday or Friday afternoons at a time mutually agreeable to all participants.

Dr. Aragon's Research Group archive