This project will enable unreliable edge computing nodes to jointly provide a reliable storage service for unpredictable user workloads. Edge systems consists small-scale servers (nodes) at the edge of the network whose root is in the cloud-based datacenter. Their premise is to bring data and computing closer to time-critical applications running on e.g., cellphones and autonomous vehicles. We combine storage redundancy schemes with scalable algorithms for object mapping and request scheduling.
Data deduplication is one of the most effective ways to reduce data size in large-scale systems. In a nutshell, duplicate copies of data chunks in different files are replaced with pointers to a single copy of each unique chunk. Optimized deduplication mechanisms facilitated its adoption to online primary storage, introducing new complexities to which traditional solutions do not directly apply. Our objective is to optimize capacity planning, management and load balancing in such systems.
The infrastructure for the “big data revolution” is built of systems that support storing, processing, and delivering large amounts of data efficiently. Flash-based solid-state drives (SSDs) are a key component in such systems, thanks to their ability to support parallel I/O at sub-millisecond latency and consistently high throughput. We develop theoretically-optimal algorithms for the SSD firmware which is responsible for the internal management of data and resources within the storage device.