Can GPT detect harassment, bullying and hate speech?
Meena Muralikumar (HCDE PhD candidate) & David Mcdonald (HCDE Professor)
Supportive, informative and civil interactions are often important to the growth and long-term viability of any online community. Unfortunately, there will be times where some individuals will harass, bully or target others. In those situations, moderating the community becomes an important task.
Content moderation is a moving target. Adapting to this challenge requires monitoring and updating an understanding of hate speech, toxicity, and harassment. Re-training a dedicated and specific machine learning model requires additional labor, money, and time. Using a Generative Pre-trained Transformer (GPT) Large Language Model (LLM) shows promise in adapting to this challenge because of its natural language generation capabilities.
How might we leverage LLM capabilities to detect toxic/hate speech? Could one customize LLMs using prompt engineering and few-shot training to fulfill specific moderation policies or to conform to human judgements? How would it compare to human judgements of toxic/hate speech? We will explore such questions in this DRG.
In this DRG, we will be working with Open AI’s GPT-4. We will be exploring both the moderation endpoint and how to customize GPT-4 for content moderation. The main objective of this DRG is to compare our results with human judgements and/or other popular toxicity detecting classifiers such as Perspective, primarily using quantitative analysis methods.
What students can learn from this DRG:
- Programming in Python, using Jupyter Lab Notebooks
- Introductory methods for quantitative data analysis
- Prompt engineering techniques for ChatGPT
Skills that would allow you to be successful in this DRG include :
- Prior coursework programming with Python
- A statistical methods course
This is a 2 credit DRG and will be conducted in-person on Tuesdays from 3:30 - 5:00 pm.
Interested undergraduate and graduate students should fill out this form by December 1 for earliest consideration. We will begin informing students accepted to the DRG by December 6.