Scalable Low-Latency Analytics

An integral part of many data-intensive applications is the need to collect and analyze enormous data sets, such as social network data, server log data, scientific data, and big bio data. Concurrently, new programming models and architectures have been developed for large-scale cluster computing, exemplified by recent MapReduce systems. In these big data systems, however, data needs to be loaded to the cluster before any queries can be run, resulting in a high delay to start query processing. Morever, answers to a long-running query are returned only when the entire job completes, causing a long delay in returning query answers.

Mass Scalla (SCAlable Low-Latency Analytics at Massachusetts) is an ongoing research project by the University of Massachusetts Amherst database group. In this project, we design, develop, and evaluate a scalable, low-latency analytics platform that fundamentally transforms the existing cluster computing paradigm into an incremental parallel processing paradigm, and further extends to near real-time analytics. We further develop a few applications in the domains of social network data analysis and big bio data analysis on the Scalla platform.

News

"Supporting Scalable Analytics with Latency Constraints" is published in PVLDB, 8(11): 1166-1177, 2015.
SCALLA 0.1 is released.
"SCALLA: A Platform for Scalable One-pass Analytics using MapReduce" is published in ACM TODS, 37(4), 2012.
"Massive Genomic Data Processing and Deep Analysis" is published in PVLDB, 5(12): 1906-1909, August 2012.