As organizations generate and share growing volumes of files, documents, communications, and logs across distributed environments, traditional security and governance models are increasingly strained.
The Thales/CSA Survey Report: The Rise in Unstructured Data and AI Security Risk report examines how organizations are responding to the scale, growth, and sensitivity of unstructured data. It also explores how foundational capabilities remain inconsistently implemented and difficult to operationalize at scale, shaping current risk exposure and future security priorities.
Unstructured data is expanding rapidly and dispersing widely, while visibility, protection, and operational readiness lag behind. Although awareness of unstructured data risk is high, many organizations struggle to translate that awareness into scalable, effective security outcomes—particularly as cloud adoption, automation, and AI accelerate.
The report reveals concerning gaps:
- Only a minority of organizations have full visibility into where their unstructured data resides
- Classification and labeling practices remain inconsistent
- Real-time scanning and monitoring capabilities are still limited
In this environment, AI is effectively becoming a “new insider”—with access to vast amounts of sensitive data, often without the same level of governance applied to human users.
In this article, we uncover the key findings from the report and what they mean for organizations looking to secure their data, govern AI usage, and build a resilient foundation for the future.
Unstructured data has become a defining element of modern enterprise environments as it rapidly grows in organizations’ systems. As this data grows not only in scale but also sensitivity, established security and governance practices are increasingly strained.
Unstructured data refers to digital content that does not follow a predefined data model and is commonly stored in files, documents, messages, logs, and other formats across organizational systems and environments.

Key Finding 1: Unstructured Data Is Accelerating and Driving Enterprise Data Growth
Organizations estimate that unstructured data accounts for roughly one-third of enterprise data, with unstructured and semi-structured data together perceived to represent more than half of the data estate. Industry research suggests the true proportion is significantly higher (closer to 80%).

Nearly one-third of organizations report that unstructured data drives more than half of their annual data growth, fueled primarily by documents, communications, and logs that frequently contain sensitive information and are distributed across environments.
Nearly one-third (29%) of organizations report that unstructured data accounts for more than half of their annual data growth, indicating that unstructured data is now the primary driver of data growth for a significant segment of enterprises.
As data volumes increase and data continues to move across systems and usage contexts, maintaining consistent oversight and control becomes more difficult. Approaches that were effective when data growth was slower or more centralized are increasingly strained by these conditions.
Fifty-eight percent of organizations report storing sensitive unstructured data in cloud applications, 57% on file servers, 47% in public cloud environments, 46% in on-premises databases, and 45% in cloud collaboration tools to name a few of the most common locations.
Key Finding 2: Foundational Gaps Persist in Visibility, Classification, and Sensitive Data Understanding
Only 35% of organizations report full visibility into where unstructured data resides, while classification and labeling practices remain inconsistent. Security, governance, privacy, and compliance are top concerns, yet many organizations struggle to execute foundational controls such as sensitive data protection, access monitoring, and classification scanning. These challenges are magnified in cloud and SaaS environments, where data discovery and leakage concerns are widespread.

Classification practices remain inconsistent across organizations. One in ten organizations report having no sensitivity labeling for unstructured data at all. Without consistent labeling, distinguishing sensitive content from lower-risk data becomes difficult, reducing the effectiveness of prioritization, access control, and monitoring as data moves across environments.
One in ten organizations report having no sensitivity labeling for unstructured data at all.

As unstructured data continues to spread, these gaps make it increasingly difficult to apply security controls reliably or scale them across organizational environments.
Security is the top concern for 74% of respondents when it comes to unstructured data
Key Finding 3: Confidence Outpaces Security Posture, Leaving Gaps in Coverage and Investment
Despite persistent visibility and execution gaps, 75% of organizations express confidence in their ability to secure unstructured data. In contrast, more than two-thirds report that a significant portion of their unstructured data remains unprotected, and many are unsure of their actual coverage. Investment levels often lag behind data growth and sensitivity, while slow or absent scanning capabilities limit timely detection and response.

The combination of low reported coverage and uncertainty indicates that many organizations lack a clear understanding of how much unstructured data is actually secured. Even as awareness of unstructured data risk increases, limited visibility makes it difficult to assess where protections are effective and where gaps persist.
One in five organizations allocate less than 5% of their IT budget to unstructured data protection, and another 19% are unsure how much they spend.
As a result, security controls are often applied unevenly, leaving large volumes of sensitive unstructured data outside formal controls simply because they are not fully visible or understood. These unmanaged data pockets expand an organization’s attack surface, undermine compliance efforts, and create vulnerabilities that adversaries exploit first.

Twenty-three percent of organizations report that they cannot scan unstructured data for vulnerabilities or risk, while 36% require 25 hours or more to complete a scan, and only 9% report having real-time scanning capabilities.
As unstructured data expands alongside innovation initiatives, the lack of proportional investment in security capabilities can leave organizations exposed, especially as they adopt AI technologies.

Key Finding 4: Fragmented Tools, Manual Processes, and Shared Ownership Limit Scalability
Unstructured data security is commonly managed through a fragmented mix of tools and manual workflows. Nearly one-third of organizations use 11 or more tools, while responsibility for unstructured data security is distributed across governance, security, and business teams. This fragmentation reduces visibility, dilutes accountability, and makes it difficult to scale controls consistently as data volumes grow.

Security investments are concentrated in general-purpose technologies such as data encryption (62%), cloud security (60%), application security (59%), and identity and access management (56%). While these controls are foundational for protecting data and managing access, they are not designed to consistently support discovery, classification, risk detection, or data lifecycle management across unstructured data environments.
Security investments are concentrated in general-purpose technologies, they are not designed to consistently support discovery, classification, risk detection, or data lifecycle management across unstructured data environments.

Key Finding 5: AI Increases Both Opportunity and Risk—Foundations Determine Outcomes
Organizations increasingly view AI as both a leading future threat and a core security capability for unstructured data. While many plan to rely on AI for detection, classification, and automation, foundational gaps remain. Limited visibility, incomplete scanning, and immature detection processes raise concerns that AI may be deployed on top of weak foundations, amplifying existing blind spots rather than improving security outcomes.
AI-generated data and AI access risks are cited as notable, emerging drivers to unstructured data growth
The research indicates that AI is not only being used to defend against advanced threats, but is also contributing to a more complex threat landscape as attackers leverage AI to scale, automate, and adapt their techniques. This dynamic heightens the importance of governing how AI systems interact with sensitive unstructured data, including establishing clear policies for access, usage, and oversight.

Tool sprawl, fragmented ownership, and manual processes limit the ability to scale security controls consistently. While organizations are not lacking security technologies, many lack consistent, end-to-end controls for unstructured data.

Insights
-
Unstructured data security challenges are increasingly systemic rather than isolated.
-
Rapid growth, broad distribution, and rising sensitivity are colliding with incomplete visibility, fragmented tooling, and uneven execution.
-
AI is emerging as both a leading source of future risk and a central component of planned security strategies for unstructured data.
-
Foundational readiness—spanning visibility, classification, governance, and scalable operations—emerges as the critical factor in managing risk, enabling automation, and realizing value from advanced security capabilities.
Organizations that strengthen these foundations are better positioned to navigate growing complexity, while those that do not risk allowing innovation and scale to outpace control as technology and the threat landscape continue to evolve.
AI is only as secure as the data it operates on.
If your organization lacks full visibility, classification, and control over unstructured data, deploying AI may increase risk rather than reduce it.
At Reputiva, we help organizations secure their data and AI environments by strengthening data visibility, identity controls, and governance across AWS, Azure, and GCP.
Start with a Data & AI Security Assessment to identify gaps, reduce exposure, and build a secure foundation for AI adoption.


