Securing the Future: Why Knowledge-based Security is Essential for GenAI Applications
The rapid adoption of Generative AI (GenAI) in enterprise environments has exposed a critical vulnerability in how organizations secure their knowledge bases. Traditional document-level security models, originally designed for mainframe systems and adapted for web services, are proving inadequate against the sophisticated ways GenAI applications process and interpret information.
This security gap emerges from a fundamental mismatch: while organizations continue to manage permissions at the document level, GenAI applications like Microsoft Copilot and Amazon Q process information at a semantic level, understanding and correlating content across document boundaries. These AI systems can identify relationships between pieces of information that traditional security models never contemplated, potentially exposing sensitive data through indirect access patterns.
Consider a scenario where a financial document contains both public quarterly results and confidential future projections. While document-level permissions might restrict access appropriately, a GenAI system could inadvertently reveal sensitive projections through semantic connections with other accessible documents, creating an unintended data exposure vector.
This new class of security challenge requires a fundamentally different approach to data protection. Chunk Level Security, which aligns security controls with how GenAI systems actually process information, has emerged as a critical solution to this growing enterprise risk.
What are Chunks?
Large Language Models (LLMs) process information by analyzing discrete segments of content—known as chunks—which form the foundational units of data processing in modern AI systems. These chunks represent coherent information units, ranging from contextual text passages to structured data elements like table rows or image-caption pairs. This granular approach to data segmentation underpins Retrieval-Augmented Generation (RAG) pipelines, enabling precise context retrieval and enhanced response accuracy in enterprise AI deployments.
Chunks are converted into what’s known as a vector embedding and stored inside of a vector database where they can be retrieved at a later time.
RAG Pipelines use chunks, not documents. Using an ingestion pipeline chunks are stored in a vector database for quick retrieval at query time.
RAG pipelines leverage these chunks to provide accurate, up-to-date context for large language model (LLM) queries, significantly reducing hallucinations and misinformation. Moreover, chunks feed into semantic knowledge graphs, enabling LLMs to connect ideas and comprehend relationships effectively.
Copilot uses RAG to find chunks of information in Sharepoint. The chunks that match the query are used in the response to the user.
Chunking of data presents a security risk. Not all chunks contains the same information and some is more sensitive than others. A single document might have chunks pertaining to multiple topics. If the information in these chunks doesn't correctly map to the permissions of the document there is a security leak.
The Mismatch: Document-Level Security vs. Chunk-Level Knowledge
Documents are permissioned as a whole. When GenAI applications unwrap and process these documents into chunks, the nuanced security context often gets lost. For example:
A confidential financial report may contain a public-facing executive summary alongside sensitive projections.
A HR document might contains sensitive information about an employee related to company sales projections.
Traditional knowledge bases apply a blanket restriction or allow access to the whole document, potentially exposing sensitive chunks.
The author of most documents in a workplace does not have the proper business context to correctly classify the content they have created. This leads to errors in the access level of data in the document.
This mismatch between document-level and chunk-level permissions creates several vulnerabilities:
Data Leakage: If chunks are extracted without appropriate security controls, sensitive data may inadvertently become accessible.
Compliance Gaps: Misaligned permissions can result in non-compliance with regulations like GDPR or HIPAA, risking legal and reputational damage.
Inaccurate Security Models: Current tools often lack the semantic understanding to differentiate sensitive chunks from non-sensitive ones, leading to over-permissioning or under-permissioning.
How SDS Tools Solve the Problem
Semantic Data Security (SDS) tools are purpose-built to address the challenges of chunk-level security. Unlike traditional security tools, SDS tools understand the semantic context of the data, they analyze the meaning within chunks, identifying sensitive information based on the context alone not relying on predefined labels or document tags.
SDS tools ensure that each chunk aligns with the correct security and compliance requirements, even if it was part of a larger, less sensitive document. This allows SDS tools to map the security boundaries of an entire company with greater accuracy than traditional DSPM tools such as Microsoft Purview. If a chunk’s content doesn’t match its assigned permissions, SDS tools flag the discrepancy and recommend corrective actions.
By focusing on content rather than static permissions, SDS tools adapt to evolving security needs without requiring constant manual updates. The content of the documents defines the permissions not the metadata like tags, this means that there is never a need to sync permissions inside of the knowledge base. This semantic understanding gives SDS tools a significant edge in preventing data leakage while maintaining operational efficiency.
Redactive: Bridging the Gap in Chunk-Level Security
At Redactive, we’ve developed an industry leading SDS tool designed to secure and remediate permission mismatches in organizational knowledge bases. Redactive integrates seamlessly into your AI and security workflows, enabling:
Granular Security Controls: Apply fine-grained permissions at the chunk level to ensure only the right people access the right information.
AI-Ready Compliance: Prepare your organization’s data for AI applications like AmazonQ, Glean, or Bedrock without compromising on security.
Proactive Risk Management: Identify and resolve potential vulnerabilities before they become a problem.
The rise of GenAI is reshaping how organizations access and use knowledge. But without proper safeguards, the same advancements that drive innovation can also expose vulnerabilities. Chunk-level security isn’t just a best practice - it’s a necessity.
For business leaders, the time to act is now. Implementing tools like Redactive ensures that your organization stays secure, compliant, and ready to harness the full potential of generative AI.
Secure your knowledge, one chunk at a time. Contact us to learn how Redactive can transform your security strategy today.