Internet Censorship Course / Book Workshop
Platforms often use some type of automation to perform content moderation. In the realm of copyright, one way of moderating content is to use some type of matching algorithm, such as matching a hash or fingerprint of the content against a known databased of infringing content. There are different ways of performing these types of matches.
There are many tools and libraries that can be used to perform sentiment analysis. More recently, the advent of OpenAI’s GPT and various accompanying large language models have opened up new possibilities for sentiment analysis. A large language model that can be used to perform a variety of natural language processing tasks; models from OpenAI (e.g., ChatGPT) and Anthropic (e.g., Claude), for example, can perform a broad array of text generation and analysis tasks.
Try asking ChatGPT or Claude to perform sentiment analysis on the same set of phrases that you used with the Perspective API.
Follow Up: Experiment with moderating AI-generated content. For this follow-up, create or collect content that was generated by different AI models (ChatGPT, Claude, Bard/Gemini, etc.) and run it through various content moderation systems. Consider the following:
Follow Up: Experiment with local LLMs (e.g., UChicago Phoenix AI). For this follow-up, install and configure a local LLM like UChicago Phoenix AI or other open-source models such as Llama, Mistral, or Falcon. Then, consider some of the following questions or concerns:
Follow Up: Analyze the platform’s content moderation policy. As a follow-up activity, you could input a platform’s content moderation policy into an LLM, and subsequently input other types of content into the LLM to see how it might be classified.
As a follow-up activity, you could input a platform’s content moderation policy into an LLM, and subsequently input other types of content into the LLM to see how it might be classified. For this activity:
The Perspective API aims to help online communities detect and filter out toxic content. It is a machine learning model that can be used to score the likelihood that a comment is toxic. The model is trained on a variety of data sources, including Wikipedia talk page comments, and is able to distinguish between different types of toxicity, such as threats, obscenity, and identity-based hate.
One approach used for audio is to perform a so-called spectral or frequency-based, which does not match the content bit-by-bit, but rather matches how the audio “sounds”, by matching frequencies and beats through spectral analysis.
In this part of the hands-on assignment, you can download or compile the Echoprint code and perform some spectral hashes on audio files.