Secure your small business:
Apps → Data →

AI Means Companies are Leaking More Confidential Data Than Ever Before

The use of generative AI has raised valid concerns about a resurgence in phishing and a new generation of deepfake attacks. But there is another, more insidious problem—data leakage.

Once a matter of science fiction, artificial intelligence (AI) has now been seamlessly integrated into our lives. Open your phone with facial recognition? AI. Use Google search? AI. There are countless instances of everyday AI in the world today that we don’t even think about.

Generative AI

Unless you’ve been completely unplugged for the past few months, you know that the rise of generative AI has captured the attention of individuals and businesses worldwide in an entirely new way.

ChatGPT and Google Bard, two generative AI heavy-hitters, have seen explosive growth in just a few short months. © Waingro / Dreamstime

This attention reached fever pitch when OpenAI released GPT-4 earlier this year—and immediately saw the most explosive userbase growth in the history of the tech industry, hitting 100 million users in just two months. Not to be outdone, Google opened public access to its own generative AI agent, Bard, within days of GPT-4’s release.

The tech industry, home to innumerable stories of sensational growth, has never quite seen growth of the kind that generative AI has now generated.

And what’s not to like about it? Used as an aid, it can be incredibly helpful.

Need a lengthy document summarized? AI can read it accurately in seconds and write a beautifully grammatical, highly engaging summary, saving hours of tedious, mistake-prone work. Want dynamic copy for an ad campaign? Use AI for the basic language and to generate contextually appropriate variations—in real time.

It’s no wonder that employees have quickly adopted generative AI in the workplace. A recent study of marketing professionals in the US showed that 73 percent of respondents now use AI tools.  Grand View Research reports that AI is expected to have an annual growth rate of 37.3 percent over the next seven years.  And in our own industry conversations, we’re hearing that employee AI use is already pervasive.

There’s no doubt that generative AI is here to stay. But there’s a darker side to its success.

The Darker Side

As with any technology, especially one with such a meteoric rise, ChatGPT and its competitors raise legitimate concerns.

The ability to generate persuasive, highly individualized emails in real time, at massive scale has major implications for both phishing and spear phishing attacks.

The rise of deepfake image and video raises the spectre of an entirely new kind of attack, like phishing, but now on Zoom and Slack calls while speaking with apparent co-workers—that aren’t actually there.

But there is an even darker, hidden danger that the media doesn’t seem to focus on—one that haunts the entire modern economy:

Corporate and government data leakage.

AI Prompts are Data Sponges

When using generative AI systems, users enter a question—called a prompt—and the AI then responds to this prompt with an answer. In some ways, this is like a conversational form of Google search, but with a twist—the “questions” can be complex—and at times, paragraphs or pages long. The answers can be equally intricate and involve direct responses to the question, rather than just “matches” from the online universe.

Finance data, human resources data, and intellectual property are among the kinds of data at risk as employees use generative AI. © Chormail / Dreamstime

All of the following AI uses are now not just plausible but likely as a result:

  • A finance employee at a publicly traded company needs to generate an earnings report. They have the data, but it will take time and care to synthesize, and they’re not confident in their writing, organization, and narrative skills. Their “question” to the AI is the raw financial data, along with a request to turn it into an earnings report, which the AI quickly does.

  • An HR manager needs to write a termination letter. Her workload this week has her underwater, and writing is among her least favorite tasks, especially of termination letters. Her “question” to the AI is the reprimand and the several poor performance reviews the employee has recently received, along with a request to write a termination letter to the employee, which the AI quickly does.

  • A software engineer is working overtime to hit a release date, but he is running behind and he’s unfamiliar with the particular language and set of libraries that a key component is written in. His “question” to the AI is several blocks of existing code, a description of the functionality that he is trying to implement, and a request for the AI to write the remaining code needed to complete the component, which the AI quickly does.

What do these cases have in common? In each case, sensitive data that absolutely must not be publicly exposed (unfiled financials, protected human resources data, proprietary intellectual property) is in fact willingly transmitted to a third party (the AI platform). The risks here are significant:

  • All three employees have likely committed serious policy violations

  • All three of the companies are now likely out of compliance with key standards

  • Confidential data in each case is now stored in a third-party system with which no formal relationship exists and for which no related compliance controls are in place

  • All of this data is now at the mercy of the AI platform(s) and their own practices, which are not transparent

All these are bad, but the last may be the worst, as the practices in question may already—or may in the future—include learning from this data, storing this data indefinitely, exposing this data to its own employees, or even accidentally leaking chat data to the public,  as recently occurred at ChatGPT.

In short, given the number of users already engaged with generative AI (hundreds of millions) and what Plurilock and others in the industry are unofficially hearing in whispers, it’s already the case that proprietary, confidential, and personal data from across the economy are flooding into AI—for compliance purposes, a public space—as you read this article.

In the Real World

In May, Samsung experienced three different incidents involving sensitive company data transmitted by employees to ChatGPT.  Two of these contained code and one a converted smartphone recording of an internal meeting, from which the AI was asked to generate minutes.

These incidents led Samsung to ban the use of ChatGPT  (and other generative AI platforms) on corporate devices and to warn employees that the uploading to ChatGPT of “company related data or other information that could compromise the company’s intellectual property” could result in termination.

Samsung has been very transparent about the issues they’ve had with AI use. Industry behavior suggests they’re not the only ones. © Michael Vi / Dreamstime

In recent weeks Apple, Amazon, Verizon, JP Morgan Chase, Bank of America, Goldman Sachs, and others have all reported that they are restricting ChatGPT use by employees.

The Trouble With Bans

Banning employees from using ChatGPT or similar AI platforms and apps doesn’t eliminate the threat of data leakage. In fact, bans are likely to increase it.

Why? Because employees are human. If they can save many hours of work they’d prefer not to do simply by having an AI do it instead, they’re likely to do just that. They’ll simply go offsite or use personal devices to generate their prompts and receive answers, which increases the footprint of the data leak to include personal devices and networks as well.

Meanwhile, if your employees aren’t using AI, but your competitor’s employees are, your company is essentially overpaying for labor—getting far less productivity for the same expenditure.

In other words, AI bans threaten to make data leaks worse and to render banning companies uncompetitive as AI use continues to grow. For this reason, bans aren’t likely to remain the go-to solution for very long.

Regulation and Response

Generative AI usage has grown so quickly that policy is likely years behind already.

Brookings recently asked four experts what regulation of generative AI could look like.  Their answers include suggestions for governments to enforce existing technology laws while considering new parameters that specifically address AI and delineating both technical and behavioral standards.

While those are good suggestions, companies across the global economy are leaking data into AI platforms now, today; waiting for policy to catch up simply isn’t an option.

For this reason, as has been the case with so many other innovations over the last half-century, the most relevant solutions to an emerging set of problems are likely to be technology solutions—not regulatory ones.

What new technologies will emerge to address what threatens, if left unchecked, to become a global economic problem?

We’re about to find out. ■

Subscribe to the newsletter for Plurilock and cybersecurity news, articles, and updates.

You're on the list! Keep an eye out for news from Plurilock.