Generative AI platforms like ChatGPT have emerged as a new frontier of data breaches, especially in the rise of hybrid work. Equipped with the function to generate various content and troubleshoot software bugs, these applications can leak training data and violate privacy.
In their research, Work From Anywhere, Fortinet found that about 62% of organizations experienced data breaches after offering a remote work option to their employees. It could have been prevented if they had worked in the office using on-premise devices and software. From another perspective, this problem requires a different approach: a strengthened DLP framework and implementation of best practices to control the use of chatbots.
Need a reliable cybersecurity partner to keep your business fully protected?
In this guide, we will look into every step you should take to protect your digital assets, starting from the importance of DLP tools and policies, proceeding with common cases when employees may leak sensitive data using chatbots. You’ll find out more about protecting sensitive data with ChatGPT and securing data while using AI tools.
So, how to prevent data leaks in AI applications? Let’s start with the basics and what you need to know about data loss prevention.
What is Data Loss Prevention?
A data loss prevention (DLP) framework is a set of tools, technologies, and practices that help organizations prevent sensitive or confidential data from being lost, stolen, or leaked. The main function of these solutions is to identify, protect, and monitor sensitive information through networks, storage, endpoints, and clouds. The data gets analyzed at rest, in motion, and in use to ensure maximum effectiveness.
There are four key points that you need to do with your data:
- Know. Classify data and assign security levels across the company’s network.
- Govern. It’s necessary to delete, store, and retain information in a compliant manner.
- Preserve. Introduce policies and regulations to educate the staff to handle information responsibly, avoiding accidental sharing or unauthorized access.
- Protect. Implement tools and solutions that perform regular analyses and monitor and detect phishing, ransomware, exposure, insider risks, and unintentional information exposure.
Technology is useless without implementing DLP policies that define how to handle and protect data at your company without exposing it to unauthorized users. It also ensures that your business maintains compliance with government regulations and industry standards about intellectual property (IP), financial information, customers’ details, confidential records, and other sensitive data.

4 types of DLP software you should know
After establishing policies, it’s necessary to enforce them to maintain safe data practices for AI. However, doing that manually is nearly impossible. There might be numerous devices, users, applications, and applied rules depending on risk level and other factors, which require constant monitoring and analysis. That’s why, depending on the data state, four main types of DLP software tools enforce security policies:
- Network DLP. These solutions monitor how data moves through, in, and out of a network. It includes such processes as downloading, transferring, synchronizing, sending, and moving through Wi-Fi or mobile networks. Artificial intelligence (AI) and machine learning (ML) are often used to detect suspicious traffic.
- Endpoint DLP. Applicable to the information processed, accessed, read, or erased by end users on devices connected to the network. It includes RAM or CPU cache, database applications, documents stored and edited in the cloud, etc. Such solutions must be installed directly on the device to stop the user in case of prohibited activity. Some apps can pause the transfer of data between devices.
- Cloud DLP. Such solutions are designed for data stored and accessed in the cloud. Usually, it contains various corporate files, backup files, file archives, databases, etc. It has data encryption, scanning, monitoring, and classifying features. Additionally, it enforces access policies both for users and cloud services.
- Email DLP. This type of software focuses on monitoring email communication inside the organization to prevent leakage of sensitive or damaging information. The main goal of such tools is to prevent unauthorized parties from accessing sensitive data. There are three possible ways the data could be lost: accidentally, non-accidentally, or because of a mailbox breach.

Another question that usually comes up about DLP tools is whether they are necessary or Extended Detection and Response (XDR) is enough. Here’s an essential difference between them:
- XDR combines multiple tools and technologies that form a comprehensive strategy to ensure monitoring, analyzing, detecting, and responding to security threats.
- DLP integrates into the security stack and focuses only on protecting sensitive information.
Simply speaking, DLP tools can help organizations classify data, reduce the risk of a breach, and protect their reputation.
Remote work and the urgent need for DLP
Over the past few years, more companies have moved to fully remote or hybrid work. It resulted in the massive adoption of cloud-based applications, and the question of how to protect sensitive information effectively became absolutely crucial. Behind data breaches stand numerous consequences for businesses, among which are:
- Reputational damage
- Regulatory fines
- Loss of revenue
- Customer outflow
MIT concluded in their case study that companies couldn’t shift the responsibility for the attack to a single person. Whether an employee clicked on a link in a suspension email or forgot to turn on some software, it’s not the only action that led to a successful cyberattack. It may start from a technical issue but go deep into weak spots in organizational and management-related actions, along with failures on different control levels, the Board of Directors, and even Government regulators.
Here are some examples of real-world incidents involving AI tool misuse:
📌 Samsung Data Leak (2023)
Samsung engineers reportedly pasted proprietary source code into ChatGPT while trying to debug software, inadvertently sharing confidential data with an external AI model. Samsung quickly banned the internal use of ChatGPT, citing uncontrolled data exposure.
📌 Apple’s Preventive Ban (2023)
Apple restricted internal use of ChatGPT and other generative AI tools after concerns about possible data leakage and employee mishandling of sensitive intellectual property.
📌 Amazon Internal Memo Leak
An Amazon internal memo warned employees not to share code or sensitive data with ChatGPT after discovering that some AI-generated outputs closely resembled internal Amazon data, raising concerns that inputs were being learned or remembered.

This case proves that protecting sensitive data is a complex task requiring a comprehensive approach and strategy. It’s necessary to have an expert who can assess the current situation and provide recommendations for improvement on every level, technical or managerial. So, when an attack happens, your company will be prepared.
Top 5 DLP best practices to secure sensitive data
AI tools and data protection are essential, so we have prepared a list of the most essential practices that can help you protect data and prepare for an attack or other type of emergency:
1. Identify sensitive data assets and conduct ongoing audits
The first and most crucial problem in DLP is to know what information you need to protect and where it is located. There are many tools available that can help with automatic classification and regular check-ups that allow discovering newly created information. The most advanced solutions can scan on-premise repositories for sensitive data and cloud storage. After scanning, you get a comprehensive report that can serve as the basis for establishing access control rules.
Also, your level of protection depends heavily on third-party risk management (TPRM). If you hire third-party vendors, you need to assess what data they can access and ensure they enforce the appropriate level of protection. The vendor’s compliance with government regulations and industry standards may help minimize risks for your organization.
2. Update software
You need to get protected against zero-day vulnerability and unknown or unaddressed flaws in software or hardware. It refers to some vulnerability you have zero days to fix, which malicious actors may already have exploited–stole data, caused damage, or ingested malware.
Also, reviewing and conducting a comprehensive analysis of your IT infrastructure allows you to detect vulnerabilities, provide risk assessment, and define high-risk practices. Implementing the latest security recommendations and updating software can also help against emerging threats. It’s better to have an additional solution to provide visibility on when, who, and what update is installed.
3. Enforce zero-trust rules
The ZT framework is designed to secure infrastructure and data by continuously monitoring every asset, user, and connection and vetting access before allowing it. Zero-trust rules and policies also assign certain attributes according to standards and recommendations defined by the government or industry and heavily rely on real-time visibility.
The Financial Data Risk Report published in 2021 by Varonis emphasized that over 64% of companies in the finance sector provide all employees with full access to over 1,000 sensitive files. Such an approach poses a high risk of data leakage and loss due to the absence of specific ZT rules and policies and indicates non-compliance with various government regulations.
Understandably, establishing ZT rules may cause delays and changes in business processes, but only building a correct hierarchy of access to specific files or applications can fully protect sensitive data in data storage, applications, or private networks.
4. Use multi-factor authentication (MFA)
This additional layer of security should be implemented with ZTA to prevent unauthorized users from accessing accounts, including stolen login information. MFA is used to validate the identity and ensure quick access for authorized users with one-time passwords (OTP), text messages, email codes, or fingerprints.
Alex Weinert, VP Director of Identity Security at Microsoft, said, “Based on our studies, your account is more than 99.9% less likely to be compromised if you use MFA.” It’s considered that phishing-resistant types of MFA, like FIDO2, are the best tools because malicious actors can’t intercept or trick users to access their accounts.
5. Conduct security awareness training
After limiting access to sensitive information and implementing MFA, you must regularly train all employees and third-party vendors to ensure security awareness. Some companies offer learning management tools that support several languages, align with compliance frameworks, can be deployed in minutes, and can be shared within your organization’s network.
The most common tips you can mention in staff training are using password management, making regular software updates, making data backups, using VPNs, following email safety recommendations, and using anti-virus programs and firewalls. Usually, personnel require different types of training: basic, security management, compliance, or technical, depending on the work specifics.
You can’t deny the importance of educating your staff about data protection best practices, especially on rules for handling sensitive information, recognizing potential threats, and reporting them. Awareness about security and the latest trends in cybersecurity should become an integral part of corporate culture and the onboarding process.

We have described only the bare minimum of options you can apply at your company. Still, consulting with cybersecurity professionals is the best way to assess possible risks, choose correct policies, raise employee awareness, and ensure minimal data breach risks. Next, we will discuss employees’ most common mistakes that provoked data loss while using Generative AI applications.
How to control data input in ChatGPT with UnderDefense
Data leakage for an organization can lead to severe consequences, such as legal liability, damage to brand reputation, and loss of intellectual property. In more severe cases, the results include years of legal proceedings and millions in regulatory penalties. On the other hand, individuals whose private information was lost, like in the case of Amazon or Capital One, can suffer from financial loss, reputational damage, or identity theft.
Let’s look at the most common ways how data is usually leaked to ChatGPT, Bard, and other Generative AI apps:
1. Unintentional Data Exposure
Employees may unknowingly paste proprietary code, customer data, or sensitive documents into ChatGPT prompts to get help, violating data protection policies.
2. Data Persistence in Prompts
Depending on how AI tools are configured, input data may be stored and used for future model improvement unless proper data-handling controls are in place.
3. Model Prompt Leakage
Attackers can use prompt injection or jailbreak techniques to extract prior user prompts or system instructions, potentially leaking private or confidential data.
4. Supply Chain Risk
Third-party integrations and plugins may extend the surface area of vulnerabilities, increasing the chance of data being exposed to untrusted entities.
5. Poor Role-Based Access Control (RBAC)
Without granular controls, users may gain inappropriate access to sensitive discussions or internal use cases within organizational AI platforms.

You need to remain vigilant when your staff utilizes chatbots for work, and the best way to do it is to control what type of information is fed to the app. UnderDefense can help safeguard your sensitive data with DLP for ChatGPT and other solutions.
UnderDefense not only protects your organization against external risks, such as detecting potential ChatGPT data loss while your employees are using it, but also calculates potential losses and identifies new vulnerabilities. MDR services from UnderDefense provide real-time visibility of all users and their actions, defines risk levels, and offers response options for you in case of threat detection. It ensures multi-layered protection and actively participates in maturing your security posture.
Watch the demo below to see how UnderDefense MAXI receives an alert about possible data leakage from the Microsoft 365 connector:
Summarizing what we showed in the demo, UnderDefense MAXI platform receives the correct info from Microsoft Purview about breaching a DLP policy on a certain device by employee, creates the incident and allows analytics to check the alert and respond accordingly.
The future of data loss prevention technologies
Recently, Check Point proclaimed a “surge in cybercrime” in their press release and gave key insights about mid-year statistics. They found that during Q1 2023, 48 ransomware groups breached 2,200 victims, and almost half of these targets were in the United States. That demonstrated the 8% surge in global weekly cyberattacks, the highest number in the last two years. These numbers should push organizations to get ready for attacks by preparing efficient security strategies and incident response plans.
We have already mentioned that organizations suffer not only from direct consequences of data breaches but also penalties for non-compliance. Since 2019, the fines for data breaches suggest that regulators are getting more serious about organizations that don’t adequately protect sensitive data. For example, in 2021, Amazon paid $877 million for breaching the GDPR requirements. The next year, the Irish DPC fined Instagram $403 million for violating particular articles of the GDPR privacy law.
Integrating DLP solutions into your security stack will provide more inclusive data protection. These tools don’t merely follow the changes and adapt to threats but anticipate them. Here’s the list of technologies that define the future of DLP tools and their efficiency:
- AI and ML: Designing advanced AI- and ML-based algorithms is becoming a prominent trend and an integral part of DLP solutions. Such an approach allows for detecting abnormal behavior more precisely, identifying previously unrecognized patterns, and automatically remediating vulnerabilities.
- Quantum computing: Provides incomparable computing speed and, as a result, faster task performance. Conversely, bad actors use this technology, so quantum-resistant encryption becomes another data protection requirement.
- Enhanced cloud security: Cloud investments keep growing, and cloud-based DLP tools should become an important element of your security posture. Organizations should learn to enforce policies and data protection guidelines through a cloud and multicloud environment, providing consistency in compliance and controlling data spread across applications.
DLP tools should be an essential part of your organization’s security strategy. By implementing these tools, you can streamline the work of other solutions, improve data protection, and reduce the risk of data loss. Additionally, you can cut off costs for IT infrastructure and elevate the capabilities to detect and respond to threats. Taking proactive measures to protect critical information at your company is vital to a successful strategy.

Looking for a cybersecurity partner to help you use AI safely?
Final thoughts
As powerful as ChatGPT is, it must be used responsibly. Understanding the risks and reinforcing AI use with robust DLP controls can help your organization unlock AI’s potential without compromising data security.
We suggest you look into the UnderDefense MAXI if you need to orchestrate various DLP tools with policies and rules. The platform helps you protect the digital ecosystem 24/7, regularly monitor your external perimeter, automate alert triage and incident response, and provide you with all the necessary context in minutes in case of an emergency. Our experts can also assist you with building a reliable data loss prevention framework for your company. Contact us today to ensure the future of your business is safe.
1. Can I enter sensitive data into a generative AI tool like ChatGPT?
You should not enter sensitive, confidential, or personally identifiable information (PII) into ChatGPT or any generative AI tool unless you’re using an enterprise-secured version with strict data governance controls. Public or free-tier AI tools may store and process input data to improve performance, which could pose a risk.
2. Is ChatGPT confidential?
Not by default. While OpenAI does not use API or ChatGPT Team/Enterprise data to train its models, inputs may be stored temporarily for abuse monitoring or system improvement unless you opt out or use an enterprise version. Always review the tool’s data usage policies before entering confidential information.
3. Where is the best place in a prompt to put sensitive data the end-user should never see?
Nowhere. Sensitive data should not be included in prompts at all, especially in tools that lack enterprise-grade controls. Instead, use placeholders or anonymize the input, and handle sensitive data processing outside of the AI workflow.
4. What are the risks of entering company data into ChatGPT?
Risks include unintentional data leaks, regulatory violations (e.g., GDPR, HIPAA), model training exposure, and the potential for future prompt leakage. Data entered without oversight can compromise internal IP or customer trust.
5. How can I safely use ChatGPT in a business environment?
-
Use enterprise versions with data retention controls.
-
Establish internal AI usage policies.
-
Educate teams on what not to input (e.g., passwords, financial data, client info).
-
Consider implementing Data Loss Prevention (DLP) tools for added protection.






