Exploring Censorship in AI Systems
Background
AI systems like large language models (LLMs) are designed to provide helpful
responses to a wide range of queries. However, these systems also include
mechanisms to filter or censor responses to certain prompts. This censorship
is intended to prevent harmful, offensive, or unethical outputs. While these
safeguards are important, they raise questions about how such decisions are
made, who decides what is censored, and the potential implications for free
expression.
In this assignment, you will explore how LLMs handle potentially sensitive
prompts, identify patterns in their responses, and critically reflect on the
ethical and practical challenges of censorship in AI.
Learning Objectives
By the end of this assignment, you will:
- Understand how LLMs implement censorship or filtering of responses.
- Identify and analyze patterns in the topics or language that trigger censorship.
- Reflect on the ethical, social, and technical challenges of censorship in AI systems.
- Discuss the implications of AI censorship for free expression and accountability.
Instructions
Step 1: Experiment with Prompts
- In your group, brainstorm a list of prompts to test on an LLM. Your prompts
should fall into the following categories:
- Controversial political topics (e.g., opinions on policy or global events).
- Ethical dilemmas (e.g., scenarios involving moral conflict).
- Misinformation or conspiracy theories (e.g., questions about debunked claims).
- Sensitive societal issues (e.g., discussions of discrimination or inequality).
- Non-controversial queries (e.g., trivia, factual information).
You should be able to come up with some questions that are sensitive or
subject to filtering, yet still within the bounds of ethics. (As a bonus, you
could read the terms of service of the LLM and see if any topics are
explicitly disallowed, or allowed.)
- Test each prompt on different (e.g., ChatGPT, Claude, Phoenix AI, etc.). You are free to try any LLMs you like, but try to test at least two different ones. Document:
- The input prompt.
- The LLM’s response.
- Whether the response was filtered, generalized, or flagged as inappropriate.
Step 2: Discussion
- What types of prompts were censored or filtered, if any?
- Did the LLM explain why it chose not to respond or provided a filtered response?
- Were there any unexpected results, such as over-censorship or inconsistent handling of prompts?
- Did you observe different results depending on the LLM that you tried?