Data in the age of AI: Is yours safe?
With the explosive AI growth we’ve witnessed over the last two years, the potential problems and pitfalls of this rapidly expanding technology can be more than the average online user is equipped to keep up with. While people remain highly focused on what they're getting out of AI tools, they're not paying much attention to what they're putting in.
It’s indisputable that AI can help companies work smarter and faster, analyzing data, automating tasks, and even helping make decisions, but AI requires data to work—and sometimes that data is personal or sensitive and needs more thoughtful consideration about its usage.
What's the big deal with AI at work?
Because of the complexities built into the tech platforms and applications we regularly employ, we could be inadvertently sharing sensitive data. For example, when the excitement around gen AI hit corporate offices, countless people experimented with using it to write internal content, some even trying their hand at employee reviews. Unaware that inputting personal employee information with no clear guidelines and policies (or training on limitations and biases) into a platform that collects data to feed its output landed them squarely in the middle of an ethical dilemma. Because it was so easy (and novel) to use, plus lacked qualifiers, many people weren’t aware of these four key reasons to be cautious when using AI tools:
AI can sometimes remember and repeat sensitive info by accident.
Your data might be used to improve the AI, even if that wasn't your intention.
There's a risk of exposing company secrets or personal information.
Using certain data might break privacy laws, leading to big fines.
More recently, there have been some growing concerns about how data—and what data—is being used to train AI models. Because baked within the demand for continued improvements in AI outputs is the demand for more data, and that data has to come from somewhere. Sometimes, it’s online content intentionally published and shared, and sometimes, it’s content unintentionally shared by users, including you and your employees, when using gen AI engines like ChatGPT, Perplexity, or Claude.
Sometimes, however, it’s from the most unsuspecting applications.
The quiet act of data scraping
Early last year, LinkedIn came under fire for switching on a feature that allows the company to scrape user data for AI training. While the UK's International Commissioner's Office required LinkedIn to stop, LinkedIn still scrapes US user data by default (side note: you can disable it by visiting Settings > Data Privacy > Data for Generative AI Improvement).
Because LinkedIn is Microsoft-owned, it’s not surprising that online watchdogs were quick to sound the alarm for potential data scraping when Microsoft recently implemented a feature called "Connected Experiences" in Office 365 applications, including Word and Excel.
This feature, which is enabled by default for US customers, allows Microsoft to collect and process document data, for what they say provides users with "connected experiences" that improve productivity.
While Microsoft explicitly denied using customer data from Microsoft 365 apps to train large language models and claims that the Connected Experiences setting only enables features requiring internet access, such as co-authoring documents, privacy advocates weren’t exactly put at ease.
Despite the explanation, Microsoft's Services Agreement includes a clause granting the company "a worldwide and royalty-free intellectual property license to use Your Content" for various purposes, including improving Microsoft products and services.
People took to social media to warn Microsoft Office users of the potential risk and encourage people to protect their data by using applications like Google docs. Even if it’s untrue as Microsoft states, it’s enough to put people on alert, especially anyone working with sensitive company content that needs more thoughtful handling.
Written by Rebecca Collins-Brown