Cecilia Aragon

Autumn 2020

Human Centered Natural Language Processing and Text Visualization

This DRG will be conducted remotely over Zoom.

This research group will apply human-centered techniques to the field of natural language processing (NLP) to study very large text corpora, with an additional focus on text visualization. We’re looking for students with experience in either (a) programming and analysis of large text datasets or (b) machine learning and data science. No NLP experience is required as we will be reading seminal papers in the field and applying those techniques to a text dataset.

We plan to use a previously-collected dataset of over 61.5 billion words (the largest fiction dataset outside of the Google Books corpus) of stories, reviews, and associated metadata from fanfiction sites as a test dataset for human-centered NLP techniques.

We are looking for a relatively small group of people who are each interested in between 2 and 5 credit hours of credit/no credit grade in HCDE 496/596. Interested undergraduate and graduate students may apply. To apply, email Cecilia Aragon at explaining your interest in the project and programming experience, and attach a resume and an unofficial transcript.

The group will meet on Wednesday or Friday afternoons at a time mutually agreeable to all participants.

