What Do LLMs Remember About You?

1. Overview

Large Language Models (LLMs) like ChatGPT, Claude, and open-source systems such as Phoenix AI (UChicago’s internal LLM) have raised important questions about privacy. Models trained on massive datasets can memorize rare sequences — sometimes regurgitating sensitive information. In this activity, you’ll explore how inference and memorization risks play out in real-world systems, and investigate what settings are (or are not) available to users to protect their data.

We’ll look at real case studies and then test a few LLM interfaces to see what privacy tools are built in — and how transparent they really are.

2. Learning Objectives

By the end of this session, you should be able to:

Explain the difference between memorization and inference in LLMs
Identify the types of data that LLMs are most likely to “leak”
Evaluate privacy risks when interacting with commercial or institutional LLMs
Locate and assess the privacy settings in real-world LLM tools

3. Activity

Step 1: Case Study Discussion

Read the short summary of a real case study (provided by the instructor) where an LLM appeared to memorize or infer sensitive data. Examples may include:

A user discovering leaked email addresses or API keys from training data
An LLM inferring that a user is part of a specific organization or demographic
Security researchers extracting phone numbers, passwords, or names via prompt engineering

In your group, discuss:

What kind of training data might have led to this leak?
Was this true memorization or just inference?
Could this have been prevented? If so, how?

Step 2: Hands-On: Privacy Controls in LLMs (20–25 minutes)

Pick at least two LLM interfaces from the list below and try to answer the following questions by exploring the UI, settings, or documentation:

What happens to your data after you use the model?
Can you opt out of having your chats used to improve the model?
Is there a clear way to delete past conversations or disable chat history?
Is your data encrypted or stored locally?
Does the system make any privacy guarantees?

LLM Interfaces to Explore:

ChatGPT (https://chat.openai.com)
Claude (https://claude.ai)
Phoenix AI (UChicago internal system — access via UChicago credentials)
Perplexity AI (https://www.perplexity.ai)
GitHub Copilot (optional)

Each group should take notes on their findings and compare how different systems approach privacy and user control.

4. Discussion

As a class, we’ll talk about:

Where are privacy protections strong? Where are they vague or absent?
Did any of the systems surprise you — positively or negatively?
What would it take to make these tools safe for sensitive tasks?
Do users know what risks they’re taking when they use these tools?

We’ll wrap up by considering: How can developers, institutions, or regulators create stronger norms and expectations around LLM privacy?