DNA information is rapidly growing in importance, and blowing up in volumes. Most data compressors for DNA have extreme encoding complexities, which is prohibitive for low-cost and portable sequencers. We develop a compression scheme with minimal encoding complexity, taking advantage of the availability of DNA references and computation resources in the cloud.
When a distributed storage system is used by decentralized applications (for example: blockchains), accessing individual shards of large data units, new features are needed that are not offered by existing distributed storage systems. In particular, coding the data with standard erasure codes does not allow adequate access performance. We develop erasure codes specifically addressing efficient recovery and access in decentralized applications.
The common use of AI today is that data is provided to some central computing facility (in the cloud), where the learning tasks (training and inference) are performed. The main issues with this practice are high communication cost and compromised data privacy. Moving part of the learning tasks to the edges mitigates these issues. The key question is how to aggregate multiple unreliable outputs from the edge to one reliable learning output, where unreliability is manifested in: missing inputs (stragglers), wrong inputs, and malicious inputs.