Cecilia Aragon

Winter - Spring 2020

Human Centered Natural Language Processing

Note: This DRG is not accepting applications from new students.

This research group will apply human-centered techniques to the field of natural language processing (NLP) to study very large text corpora. We’re looking for students with experience in either (a) programming and analysis of large text datasets or (b) machine learning and data science. No NLP experience is required as we will be reading seminal papers in the field and applying those techniques to a text dataset.

We plan to use a previously-collected dataset of over 61.5 billion words (the largest fiction dataset outside of the Google Books corpus) of stories, reviews, and associated metadata from fanfiction sites as a test dataset for human-centered NLP techniques.

Dr. Aragon's Research Group archive