Jotai: a Methodology for the Generation of Executable C Benchmarks

This paper introduces a methodology to generate well-defined executable benchmarks in the C programming language. The generation process is fully automatic: C files are extracted from open-source repositories, and split into compilation units. A type reconstructor infers all the types and declarations required to ensure that functions compile. The generation of inputs is guided by constraints specified via a domain-specific language. This DSL refines the types of functions, for instance, creating relations between integer arguments and the length of buffers. Off-the-shelf tools such as AddressSanitizer and Kcc filter out programs with undefined behavior. To demonstrate applicability, this paper analyzes the dynamic behavior of different collections of benchmarks, some with up to 30 thousand samples, to support several observations: (i) the speedup of optimizations does not follow a normal distribution—a property assumed by statistical tests such as the T-test and the Z-test; (ii) there is strong correlation between number of instructions fetched and running time in x86 and in ARM processors; hence, the former—a non-varying quantity—can be used as a proxy for the latter—a varying quantity—in the autotuning of compilation tasks. The apparatus to generate benchmarks, plus a collection of 30K programs thus produced, is publicly available.

-

PDF Disponível