A camera moves through a cloud of multi-colored cubes, each representing an email message. Three passing cubes are labeled “k****@enron.com”, “m***@enron.com” and “j*****@enron.com.” As the camera moves out, the cubes form clusters of similar colors.
Last month, I received an alarming email from someone I did not know: Rui Zhu, a Ph.D. candidate at Indiana University Bloomington. Mr. Zhu had my email address, he explained, because GPT-3.5 Turbo, one of the latest and most robust large language models (L.L.M.) from OpenAI, had delivered it to him.
My contact information was included in a list of business and personal email addresses for more than 30 New York Times employees that a research team, including Mr. Zhu, had managed to extract from GPT-3.5 Turbo in the fall of this year. With some work, the team had been able to “bypass the model’s restrictions on responding to privacy-related queries,” Mr. Zhu wrote.
My email address is not a secret. But the success of the researchers’ experiment should ring alarm bells because it reveals the potential for ChatGPT, and generative A.I. tools like it, to reveal much more sensitive personal information with just a bit of tweaking.
When you ask ChatGPT a question, it does not simply search the web to find the answer. Instead, it draws on what it has “learned” from reams of information — training data that was used to feed and develop the model — to generate one. L.L.M.s train on vast amounts of text, which may include personal information pulled from the Internet and other sources. That training data informs how the A.I. tool works, but it is not supposed to be recalled verbatim.
In theory, the more data that is added to an L.L.M., the deeper the memories of the old information get buried in the recesses of the model. A process known as catastrophic forgetting can cause an L.L.M. to regard previously learned information as less relevant when new data is being added. That process can be beneficial when you want the model to “forget” things like personal information. However, Mr. Zhu and his colleagues — among others — have recently found that L.L.M.s’ memories, just like human ones, can be jogged.
In the case of the experiment that revealed my contact information, the Indiana University researchers gave GPT-3.5 Turbo a short list of verified names and email addresses of New York Times employees, which caused the model to return similar results it recalled from its training data.
Much like human memory, GPT-3.5 Turbo’s recall was not perfect. The output that the researchers were able to extract was still subject to hallucination — a tendency to produce false information. In the example output they provided for Times employees, many of the personal email addresses were either off by a few characters or entirely wrong. But 80 percent of the work addresses the model returned were correct.
Companies like OpenAI, Meta and Google use different techniques to prevent users from asking for personal information through chat prompts or other interfaces. One method involves teaching the tool how to deny requests for personal information or other privacy-related output. An average user who opens a conversation with ChatGPT by asking for personal information will be denied, but researchers have recently found ways to bypass these safeguards.
Mr. Zhu and his colleagues were not working directly with ChatGPT’s standard public interface, but rather with its application programming interface, or API, which outside programmers can use to interact with GPT-3.5 Turbo. The process they used, called fine-tuning, is intended to allow users to give an L.L.M. more knowledge about a specific area, such as medicine or finance. But as Mr. Zhu and his colleagues found, it can also be used to foil some of the defenses that are built into the tool. Requests that would typically be denied in the ChatGPT interface were accepted.
“They do not have the protections on the fine-tuned data,” Mr. Zhu said.
“It is very important to us that the fine-tuning of our models are safe,” an OpenAI spokesman said in response to a request for comment. “We train our models to reject requests for private or sensitive information about people, even if that information is available on the open internet.”
The vulnerability is particularly concerning because no one — apart from a limited number of OpenAI employees — really knows what lurks in ChatGPT’s training-data memory. According to OpenAI’s website, the company does not actively seek out personal information or use data from “sites that primarily aggregate personal information” to build its tools. OpenAI also points out that its L.L.M.s do not copy or store information in a database: “Much like a person who has read a book and sets it down, our models do not have access to training information after they have learned from it.”
Beyond its assurances about what training data it does not use, though, OpenAI is notoriously secretive about what information it does use, as well as information it has used in the past.
“To the best of my knowledge, no commercially available large language models have strong defenses to protect privacy,” said Dr. Prateek Mittal, a professor in the department of electrical and computer engineering at Princeton University.
Dr. Mittal said that A.I. companies were not able to guarantee that these models had not learned sensitive information. “I think that presents a huge risk,” he said.
L.L.M.s are designed to keep learning when new streams of data are introduced. Two of OpenAI’s L.L.M.s, GPT-3.5 Turbo and GPT-4, are some of the most powerful models that are publicly available today. The company uses natural language texts from many different public sources, including websites, but it also licenses input data from third parties.
Some datasets are common across many L.L.M.s. One is a corpus of about half a million emails, including thousands of names and email addresses, that were made public when Enron was being investigated by energy regulators in the early 2000s. The Enron emails are useful to A.I. developers because they contain hundreds of thousands of examples of the way real people communicate.
OpenAI released its fine-tuning interface for GPT-3.5 last August, which researchers determined contained the Enron dataset. Similar to the steps for extracting information about Times employees, Mr. Zhu said that he and his fellow researchers were able to extract more than 5,000 pairs of Enron names and email addresses, with an accuracy rate of around 70 percent, by providing only 10 known pairs.
Dr. Mittal said the problem with private information in commercial L.L.M.s is similar to training these models with biased or toxic content. “There is no reason to expect that the resulting model that comes out will be private or will somehow magically not do harm,” he said.