Science Unscrambled features "Frontiers in Massive Data Analysis" with Director Scott Weidman
The Board on Mathematical Sciences and Its Applications (BMSA) and its Committee on Applied and Theoretical Statistics released Frontiers in Massive Data Analysis. This report examines the frontier of analyzing massive amounts of data, whether in a static database or streaming through a system. Data at that scale—terabytes and petabytes—is increasingly common in science (e.g., particle physics, remote sensing, genomics), Internet commerce, business analytics, national security, communications, and elsewhere. But the tools that work to infer knowledge from data at smaller scales do not necessarily work, or work well, at such massive scale. New tools, skills, and approaches are necessary, and this report identifies many of them, plus promising research directions to explore. It discusses pitfalls in trying to infer knowledge from massive data, and it characterizes seven major classes of computation that are common in the analysis of massive data. Overall, the report illustrates the cross-disciplinary knowledge—from computer science, statistics, machine learning, and application disciplines—that must be brought to bear to make useful inferences from massive data.