Getting math problems wrong isn’t the only way ChatGPT is becoming less smart — apparently, it is pretty easy to trick into sharing its secrets (including, potentially, yours).
What happened: Researchers from Google’s DeepMind and five universities discovered an “attack” prompt for ChatGPT that got the platform to share parts of its training data, revealing personal information of random people and copyrighted material.
- The researchers prompted ChatGPT to repeat certain words over and over again, which it would do, until it eventually deteriorated into what appeared to be random chunks of text, but were actually verbatim reproductions of training data.
- Some of that data was personally identifiable information, such as addresses, birthdays, phone numbers and crypto wallet addresses. One example showed getting ChatGPT to repeat the word “poem” eventually returned the info of a CEO that appeared to have been scraped from an email signature.
- Other prompts surfaced passages pulled from copyrighted works, like literature and research papers.
Why it matters: OpenAI and the many other companies developing their own chatbots likely aren’t thrilled to see that a closed-source, large language model can be tricked into revealing its training data.
- Also not thrilled: the people who may not even know that their work and personal information is part of ChatGPT’s knowledge base, let alone that AI could share it with a simple command.
Bottom line: The researchers spent only $200 to generate over 10,000 unique examples of ChatGPT sharing things it probably shouldn’t — and they point out that someone willing to spend more money could easily extract significantly more information from even the best language models.