The project develops a technique for approximate schema discovery for noisy
data, for normalizing the data according to this schema, and for improving query processing. The input consists of a single, large relation, which may be noisy, inconsistent, incomplete, and the system discovers automatically a few candidate schemas that can represent the data with minimal loss and with high utility for downstream tasks.