Rapid Screening of Transformed Data Leaks with Efficient Algorithms and Parallel Computing

To minimize the exposure of sensitive data and documents, an organization needs to prevent cleartext sensitive data from appearing in the storage or communication. We invent a novel data-leak screening tool that can be deployed to scan computer file systems, server storage, and inspect outbound network traffic for exposed sensitive data. The tool searches for the occurrences of plaintext sensitive data in the content. It alerts users and administrators of the identified data expsure vulnerabilities. For example, an organization's mail server can inspect the content of outbound email messages searching for sensitive data appearing in unencrypted messages. Data leak detection imposes new security requirements and algorithmic challenges: data transformation and scalability. The exposed data in the content may be transformed or modified by users or applications, so it may no longer be identical to the original sensitive data. E.g., transformations may be insertion of metadata or formatting tags, substitution of characters, data truncation. The detection needs to recognize variations of sensitive data patterns. It also needs to efficiently process long sensitive data (e.g., megabytes) and large amount of content (e.g., gigabytes to terabytes). In automata-based matching used in anti-virus and IDS scans, the patterns to search for are known and static. Automata are not designed to support unpredictable and arbitrary pattern variations. In comparison, our solution handles arbitrary variations of patterns efficiently. In addition, our technology produces extremely low false positives, much more advanced than the state-of-the-art intersection method. Our prototype achieves 400 Mbps analysis throughput, which can support the security needs of a sizeable organization.
Patent Information:
For Information, Contact:
Li Chen
Licensing Associate
Virginia Tech Intellectual Properties, Inc.
(540) 443-9217
Danfeng (daphne) Yao
Xiaokui Shu
Fang Liu