In the OWASP Top 10 for LLM Applications 2025, Sensitive Information Disclosure is ranked as LLM02, highlighting the growing risk of AI systems unintentionally exposing confidential information, including personally identifiable information (PII), financial records, health data, API keys, internal documents, and proprietary business data. From employees pasting confidential data into AI tools to poorly configured AI applications leaking sensitive information through responses, organizations are entering a new era of AI-related data exposure risks.
What is Sensitive Information Disclosure?
Large Language Models (LLMs) especially when embedded in applications, risk exposing sensitive data, proprietary algorithms, or confidential details through their output. This can result in unauthorized data access, privacy violations, and intellectual property breaches. Sensitive information can affect both the LLM and its application context. This includes personal identifiable information (PII), financial details, health records, confidential business data, security credentials, and legal documents. Proprietary models may also have unique training methods and source code considered sensitive, especially in closed or foundation models.
Common Examples of Sensitive Information Disclosure Vulnerability
PII Leakage
Personal identifiable information (PII) may be disclosed during interactions with the LLM. Examples of PII include: Full names, Social Security Numbers (SSN), driver’s license numbers, bank accounts, email addresses, and biometric records like fingerprints.
Proprietary Algorithm Exposure
Poorly configured model outputs can reveal proprietary algorithms or data. Revealing training data can expose models to inversion attacks, where attackers extract sensitive information or reconstruct inputs.
Example: The ‘Proof Pudding’ attack (CVE-2019-20634), disclosed training data facilitated model extraction and inversion, allowing attackers to circumvent security controls in machine learning algorithms and bypass email filters.
Sensitive Business Data Disclosure
Generated responses might inadvertently include confidential business information.
Prevention and Mitigation Strategies for Sensitive Information Disclosure
Sanitization:
1. Integrate Data Sanitization Techniques
Implement data sanitization to prevent user data from entering the training model. This includes scrubbing or masking sensitive content before it is used in training.
2. Robust Input Validation
Apply strict input validation methods to detect and filter out potentially harmful or sensitive data inputs, ensuring they do not compromise the model.
Access Controls:
1. Enforce Strict Access Controls
Limit access to sensitive data based on the principle of least privilege. Only grant access to data that is necessary for the specific user or process.
2. Restrict Data Sources
Limit model access to external data sources, and ensure runtime data orchestration is securely managed to avoid unintended data leakage.
Federated Learning and Privacy Techniques:
1. Utilize Federated Learning
Train models using decentralized data stored across multiple servers or devices. This approach minimizes the need for centralized data collection and reduces exposure risks.
2. Incorporate Differential Privacy
Apply techniques that add noise to the data or outputs, making it difficult for attackers to reverse-engineer individual data points.
User Education and Transparency:
1. Educate Users on Safe LLM Usage
Provide guidance on avoiding the input of sensitive information. Offer training on best practices for securely interacting with LLMs.
2. Ensure Transparency in Data Usage
Maintain clear policies about data retention, usage, and deletion. Allow users to opt out of having their data included in training processes.
Secure System Configuration:
1. Conceal System Preamble
Limit the ability for users to override or access the system’s initial settings, reducing the risk of exposure to internal configurations.
2. Reference Security Misconfiguration Best Practices
Follow guidelines like “OWASP API8:2023 Security Misconfiguration” to prevent leaking sensitive information through error messages or configuration details.
Advanced Techniques:
1. Homomorphic Encryption
Use homomorphic encryption to enable secure data analysis and privacy-preserving machine learning. This ensures data remains confidential while being processed by the model.
2. Tokenization and Redaction
Implement tokenization to preprocess and sanitize sensitive information. Techniques like pattern matching can detect and redact confidential content before processing.
Attack Scenarios
Scenario #1: Unintentional Data Exposure
A user receives a response containing another user’s personal data due to inadequate data sanitization.
Scenario #2: Targeted Prompt Injection
An attacker bypasses input filters to extract sensitive information.
Scenario #3: Data Leak via Training Data
Negligent data inclusion in training leads to sensitive information disclosure.
Secure AI Starts with Governance
At Reputiva, we help organizations move beyond AI experimentation toward secure and responsible AI adoption. The OWASP LLM02:2025 risk highlights an important reality: AI systems are only as secure as the data governance, identity controls, and security architecture surrounding them.
Organizations need clear AI usage policies, strong access controls, employee awareness training, and continuous monitoring to reduce the risk of sensitive information exposure in AI environments.
Real-World Examples
In April 2023, Samsung reportedly restricted employee use of ChatGPT after sensitive internal information was accidentally uploaded to the platform. Samsung engineers were reported to have inadvertently leaked sensitive company data, including confidential source code, meeting notes, and internal debugging information.
The Samsung ChatGPT data leak underscores issues such as a lack of AI governance, limited employee awareness, and unclear policies on which data can safely be shared with AI systems.
The Sensitive Information disclosure is becoming very rampant, and organizations are issuing memos and warnings to employees not to share confidential company data: Amazon, while others are restricting the use of generative AI tools: Apple, JPMorgan Chase, and other Wall Street Banks.
Secure Your Organization for the AI Era
Generative AI can unlock productivity, automation, and business value — but without proper security controls, it can also introduce new risks around sensitive data exposure, access management, and compliance.
Reputiva helps organizations move from uncertainty to secure implementation through:
- AI security and governance assessments
- Cloud security reviews
- Identity and access management (IAM)
- Microsoft 365 and Google Workspace security
- Security awareness and AI literacy training
- Continuous monitoring and security best practices
If your organization is adopting AI, now is the time to ensure security and governance are built into the process from the beginning.
Start a conversation with Reputiva today.


