By Brandon Yu I 8 min read

We are at the cusp of an incredibly exciting time in our technological history: the artificial intelligence (AI) revolution. But, along with this excitement comes a rightful and growing call for regulation and compliance.

You may have seen the open letter signed by Tesla CEO Elon Musk and Apple co-founder Steve Wozniak, along with over 31,000 tech leaders on halting the development of AI systems more powerful than GPT-4.

They fear that the AI’s human-competitive intelligence may cause profound risks to our existence. That’s a valid, but relatively more distant concern than the one we’re discussing today.

What does my data privacy look like in the age of AI? How can I protect my digital privacy during the AI revolution?

Let’s dive in.

The Way that AI Works

By now, ChatGPT has become a household name. In just April 2023 alone, there were over a reported 1.8 billion number of visits, with each user spending around 8 minutes and 32 second on the site.

But have we ever thought about the way that this AI model works?

The word “AI model” or “powered by AI” means that there’s some form of neural network that is altering data. What does that actually look like?

Well, the basic function of an AI application is the specific transformation of inputs into outputs. For instance, you feed it a color (eg. green), and the AI model will tell you a list of 5 different things that is that color (eg. grass, leaf, cabbage, emerald, jade).

A simplified model of an artificial neural network receiving a variety of inputs (left), transforming it through it’s pre-programmed algorithms (middle) to produce an output (right). Figure by Alex Castrounis.

The current appeal of AI, especially in the modern day economy, is that it can supercharge a businesses’ productivity by enabling repeated tasks to be performed at speed and at scale.

But, in order for it to do that, it has to use your data.

The biggest hesitation with AI at the enterprise level is data privacy

For any one of us that has used ChatGPT, Bard, or any other emerging AI chatbots, we know that without good inputs and effectively “a good data set”, the model won’t be able to generate a good output.

And that is precisely what’s hindering the eager enterprise adoption of AI technologies.

Major companies like Google and Facebook rely on buying and selling user data for a variety of purposes, including targeted ads. Google is known to collect the highest amount of user data, of which drives 80.2% of their revenue in 2022.

Take Reddit for instance. Reddit is cracking down on any third party apps as they are paying tens of millions of dollars to support these requests, sparking controversy but solidifying their stance on protecting their own data from data scraping third-party apps. 

This is likely to spark a wave of tools and platforms aiming to block down on these web scraping tools - but will this work? The speed of which new use cases and AI platforms are coming out is just simply too fast, that blocking every single tool is a futile feat.

68% of consumers globally are either somewhat or very concerned about their privacy online (IAAP).

In the context of AI, there is a clear incentive for AI models to buy more data. The more data that they have, the more they are able to “train their model” to produce better outputs. The better the outputs, the more the users are inclined to use their platform and continue paying for priority service.

ChatGPT is known to have contextual memory, meaning that it can store and keep track of all of the inputs that the user feeds it. Everything that the user inputs is able to be learned and stored in its locus of contextual memory to refine a future output.

There has been recent controversy around ChatGPT’s data collection process, including its breach of the contextual integrity of the information that has been made publicly available. In fact, ChatGPT cites Reddit as one of the primary training sources.

Companies need to communicate best practices on how to leverage generative AI and what or what not to feed AI models. Clear communication and policies are required, and ultimately they need to embrace it and allow employees to experiment with AI in order to tap into its potential (eg. via generative AI hackathons), where they can provide and share rules and guidelines, as well as identify use cases proactively.

At the enterprise level, when the inputs of information fed to an AI model has the ability to alter capital markets or influence policy, how does one feel safe to engage this resource?

Apple is a leader in privacy protection

It comes down to robust privacy protection and informed usage of AI models.

When you think of industry leaders fronting the fight for privacy, one name comes to mind: Apple.

Apple has been at the forefront of privacy protection since introducing its Intelligent Tracking Prevention feature in 2017. But in 2020 with the release of iOS 14, they forced app developers to be upfront with what specific types of data they were getting from the users, what they were doing with it, and also giving the users an opportunity to refuse to have their data used in this manner.

With the release of iOS 15, Apple took it one step further. Hiding your IP Address, Mail Privacy Protection, App Privacy Report, hiding your real email address, and Private Relays are just a few of the features that give the user back the control of their data.

5 ways to enforce data privacy as an AI user

There’s a lot to learn here from Apple in relation to AI systems.

At the enterprise level, most people will be interacting with AI chatbots like ChatGPT or Bard. Learning from Apple and other privacy-first organizations, here are five ways that you can enforce data privacy as an AI user.

Remove all sensitive or confidential information from any AI inputs

Before feeding any data into the AI system, it is essential to anonymize or pseudonymize sensitive information. This involves removing or replacing personally identifiable information (PII) with random characters or data, which prevents the AI system from identifying individuals or companies from the data the AI model processes.

Have specific access delegated to specific users

Limiting who has access to the AI system and the data it processes is crucial. Implement role-based access controls (RBAC) to ensure that only authorized personnel with the appropriate clearance can access or interact with the AI system. This minimizes the risk of data exposure to unauthorized users.

Ensure that you are using encrypted data

Ensure that data is encrypted both in transit and at rest. This means that as data is being sent to the AI system, it should be encrypted, and also, when the AI system stores data, it should be encrypted. This makes it difficult for unauthorized individuals to access or make sense of the data, even if they somehow manage to access it.

Encryption is used to protect data from being stolen, changed, or compromised and works by scrambling data into a secret code that can only be unlocked with a unique digital key (Google).

Maintain frequent audit trails and monitor diligently

Maintain detailed logs of all interactions with the AI system and continuously monitor these logs for unusual or suspicious activities. This helps in not only understanding who accessed the data and what they did with it but also aids in early detection and response in case of a data breach or unauthorized access.

Ensure regular reviews and compliance

Regularly review the AI system’s privacy policies and ensure they are in compliance with the latest data protection laws and regulations such as GDPR, CCPA, or any other applicable laws. This includes conducting Data Protection Impact Assessments (DPIAs) to identify and mitigate risks associated with data processing by the AI system.

The 10 ways to protect your company’s data from AI systems 

In addition to being especially conscious when using AI systems, enterprise executives must be conscious of various AI systems that are scraping their systems to train their models.

Here are some strategies your company can implement to protect its systems and maintain business integrity.

  1. Rate Limiting for External Tools: For companies using AI or other external tools, implementing rate limiting on API calls or usage can prevent over-reliance or excessive costs. This is more about managing resource usage rather than security, but it can be a part of maintaining business integrity.
  1. User-Agent Analysis: Monitor and analyze the user-agent strings in HTTP requests to the server. Many scrapers don’t change their user-agent string, which may reveal that they are bots. Blocking or serving altered content to suspicious user-agents can be an effective countermeasure.
  1. CAPTCHAs: Implement CAPTCHAs to distinguish between human and automated access. Although some sophisticated bots can bypass simple CAPTCHAs, using more complex or interactive CAPTCHAs can be effective.
  1. JavaScript Challenges: Many scraping bots are not able to process JavaScript. By using a JavaScript challenge, you can verify if a browser supports JavaScript and only serve content to those clients. This can help to filter out many scraping bots.
  1. Web Application Firewalls (WAF): Use a Web Application Firewall to monitor HTTP requests and block suspicious traffic. WAFs can use pre-set or customizable rules to identify and block scraping bots. Companies could also set up paywalls like what Reddit is doing to gate their proprietary data against third party providers.
  1. IP Analysis and Blocking: Regularly analyze the source IP addresses of traffic. If an IP address is generating an unusually high number of requests, it might be a scraping bot. Implement IP blocking or require additional verification for suspicious IP addresses.
  1. Honeypots: Implement honeypots, which are hidden links or pages that are not visible to normal users but can be detected by web scraping bots. If a bot accesses these honeypots, it’s a strong indication of scraping activity, and the source can be blocked.
  1. Content and Behavior Analysis: Analyze the access patterns and behavior of users. Bots often exhibit non-human browsing behavior, such as accessing pages in a predictable pattern or without a referer header. Utilize machine learning algorithms to detect anomalous behavior that may indicate scraping.
  1. Legal Measures: Include clauses in the Terms of Service of your website that explicitly forbid scraping. If you identify an entity that is scraping your data, you might pursue legal action against them.
  1. API Management: If your company provides data via APIs, then having an API management solution in place is essential. Enforce API keys, set quotas, and implement throttling to control access to the data you provide.

There will always be extremes with any technological inflection

There’s a rightful fear in the use of data usage pertaining to AI models. The discussion of privacy issues in AI often point back to the limitations of the systems and the algorithms. Algorithmic biases like Amazon’s experiment of an AI filtering tool that replicated the company’s existing disproportionately male workforce, can be avoided by appropriate policies and data regulation as we continue to leverage AI.

We are struck in the middle of one of the biggest technological inflection points of our time. This brings along a ton of promise in the age of innovation, creativity, and future advancements.  Millions around the world are using this actively to change the way we work and interact with each other. 

It doesn’t make sense to ban the usage of existing AI systems like some schools are doing, but rather encourage the ethical usage of this technology informed by sound policy. 

AI will have an estimated 21% net increase on the United States GDP by 2030 (Forbes). 

Don’t stop the use of AI. There’s a ton of use case innovation and theming potential that is yet to be discovered - applications in personalized medicine, inspiring the future of work, and improving the way we learn. Additionally, hackathons are a great way to rapidly learn about generative AI.

But, do ensure the proper guidelines, policies, and boundaries are in place. Aggressively set down the parameters by which these AI models are allowed to operate. Be proactively transparent to the user and customer and give them control over where their data goes.

And do this all with the mindset of building this technology to support our future.

Interested in seeing how we can support you and your business in your innovation initiatives? Book an introductory call with Victor Li, Founder & CEO of Onova.
Share on socials