Lushu: Obfuscating Sensitive Data via Language Recognition

Alexander T. M. Holmquist

The synthesis of grammars to recognize sentences from examples is a problem that has several practical applications, including the identification and encryption of sensitive information in computer systems. Existing techniques tend to create very large grammars, having a number of terminals symbols pro- portional to the number of words in the example sentences. This work proposes a technique to merge grammar terminals into regular expressions. The tech- nique uses a lattice built from a partial ordering of regular expressions. This lattice, and the language identification algorithm it entails, were used to build Lushu, a data protection tool that encrypts sensitive information produced by the Java virtual machine. A comparison between Lushu and Zhefuscator, a tool of similar purpose, demonstrates that the technique proposed in this work is not only efficient in terms of time, but also in space, producing grammars up to 10 times smaller than the current state of the art.


2022/2 - POC1

Orientador: Fernando Magno Quintão Pereira

Palavras-chave: Obfuscation, parser, synthesis, compiler

Link para vídeo

PDF Disponível