Prof. Soheil Mohajer has been named McKnight Land-Grant Professor for his work addressing “Challenges in Distributed Systems for Big Data Analysis.” He is one of ten awardees of the McKnight Land-Grant Professorship Program for 2019-21. His areas of research expertise include communications, signal processing, and networking. He works on developing technologies for storage, transmission, and processing of massive amounts of data in a distributed fashion, using mathematical analysis to design algorithms that are practical and reliable for distributed architectures.
Soheil leads a research team that addresses some of the challenges that have sprung from the rapid growth, and usage of digital data. With the explosion in data, its processing, storage, maintenance, transmission, and security have turned into a multibillion dollar industry. Two key challenges he and his research team are working on are, storage solutions that can cope with the data explosion, and transmission of such massive amounts of data on networks. With the prevailing shift in models, from centralized architectures to decentralized architectures, for data-driven applications, he and his team are addressing these challenges with theoretical and algorithmic solutions.
Distributed Data Storage Systems and Innovative Coding Techniques
At issue, for researchers, and practitioners (such as data storage firms), is the question of developing a cost-efficient storage system that can cope with the explosion in data. For Soheil and his team, the answer lies in harnessing the strength of distributed storage systems (DSS). Such a system, created by networking a large number of inexpensive devices, could be the solution to storing the massive amounts of data that are constantly being generated. To tackle the problem of unreliability or failure within the constituents of a network, a typical strategy is to build in redundancy. However, the conventional form of redundancy, which involves the replication of data in multiple storage nodes, is an expensive measure, when you account for hardware and maintenance costs of additional storage units. Soheil’s team however have turned to coding techniques, which when combined with hardware redundancy can offer greater reliability at the same hardware cost.
Innovative Coding Techniques
Coding techniques comprise algorithms for reconstructing data. To address the problem of storage node failures with coding techniques, the missing data is recovered by performing some operations on the repair data which are downloaded from several nodes. But some of the barriers to the successful and widespread implementation of coding techniques for repair are the computation and communication costs associated with them. Soheil’s team are working on addressing these barriers by developing high-performance codes for storage systems and efficient algorithms for data recovery. Their algorithms are also scalable which make them versatile regardless of changes to the size of the storage network. With sensitive data such as personal, financial, and health records also increasingly moving to cloud storage, the team is also working on the design of secure cloud storage systems to ensure data confidentiality and integrity. The algorithms they have developed in strike a balance between the necessary levels of security, fault tolerance, system maintenance, and performance. (Soheil’s research on cloud storage systems is supported by the NSF CAREER Award he received in 2018.)
Innovative Data Delivery
As access to and use of the Internet has evolved, demand for broadband data has grown exponentially. The nature of data travelling across the network has also changed to typically include large files such as videos, as opposed to short messages which were once the norm. Besides, these data requests are accompanied by specific characteristics such as repeat requests, specific request times, and most frequently requested files. Services such as Netflix are an excellent example of such requests. Most people will typically access movies at night, there might be multiple requests for the same file/movie, and certain types of content might be requested more frequently than others (newly released movies versus older releases).
Caching, based on predicting what content might be more popular as compared to others, is one way to reduce network traffic. However, its effectiveness is limited by the size of the local memory of each user. Another critical disadvantage is its irrelevance when users request other files. But Soheil and his research team are taking the concept of caching one step further by exploiting the ubiquity of mobile devices. Using the considerable combined storage that these devices present, they suggest coded caching as a novel solution to data delivery.
Coded caching operates as if each user will have access to the cache of other users in the network. The strategy presents some key advantages: reduction in network traffic, and provision of internet access in rural and semi-rural areas affected by poor infrastructure. Currently Soheil is working on developing algorithms that will make coded caching practical and implementable. These include algorithms that can determine what kinds of data have to be cached, and delivered to minimize network load during peak hours.
Prof. Soheil Mohajer’s research is of critical value, for individuals as well as corporations. His work has been supported by the National Science Foundation; he was a recipient of the Early CAREER Award in 2018 for the study of distributed computing networks and the development of models for data transfer that can reduce time delay in communication.
The McKnight Land-Grant Professorship Program was set up to advance the careers of assistant professors at a critical juncture in their professional lives. Award recipients hold the title of McKnight Land-Grant Professor for two years. Administered by the Office of the Executive Vice-President and Provost, the awards are made possible by generous donations from the McKnight Foundation.