Exploring Censorship in AI Systems

Background

AI systems like large language models (LLMs) are designed to provide helpful responses to a wide range of queries. However, these systems also include mechanisms to filter or censor responses to certain prompts. This censorship is intended to prevent harmful, offensive, or unethical outputs. While these safeguards are important, they raise questions about how such decisions are made, who decides what is censored, and the potential implications for free expression.

In this assignment, you will explore how LLMs handle potentially sensitive prompts, identify patterns in their responses, and critically reflect on the ethical and practical challenges of censorship in AI.

Learning Objectives

By the end of this assignment, you will:

Understand how LLMs implement censorship or filtering of responses.
Identify and analyze patterns in the topics or language that trigger censorship.
Reflect on the ethical, social, and technical challenges of censorship in AI systems.
Discuss the implications of AI censorship for free expression and accountability.

Instructions

Step 1: Experiment with Prompts

In your group, brainstorm a list of prompts to test on an LLM. Your prompts should fall into the following categories:
- Controversial political topics (e.g., opinions on policy or global events).
- Ethical dilemmas (e.g., scenarios involving moral conflict).
- Misinformation or conspiracy theories (e.g., questions about debunked claims).
- Sensitive societal issues (e.g., discussions of discrimination or inequality).
- Non-controversial queries (e.g., trivia, factual information).

You should be able to come up with some questions that are sensitive or subject to filtering, yet still within the bounds of ethics. (As a bonus, you could read the terms of service of the LLM and see if any topics are explicitly disallowed, or allowed.)

Test each prompt on different (e.g., ChatGPT, Claude, Phoenix AI, etc.). You are free to try any LLMs you like, but try to test at least two different ones. Document:
- The input prompt.
- The LLM’s response.
- Whether the response was filtered, generalized, or flagged as inappropriate.

Step 2: Discussion

What types of prompts were censored or filtered, if any?
Did the LLM explain why it chose not to respond or provided a filtered response?
Were there any unexpected results, such as over-censorship or inconsistent handling of prompts?
Did you observe different results depending on the LLM that you tried?