A new institute at the University of Wisconsin-Madison will try to advance one of the most influential technological trends of the modern day: Big data science.
“Big data” is the catchphrase for the parts of computer science that deal with enormous troves of information. Over the past two decades, massive datasets have helped change almost every academic field and industry imaginable. In astronomy, researchers can use big data tools to analyze data from telescopes to gather new insights about the stars. In politics, party operatives have used big data analysis to more effectively target voters during campaigns.
The Institute for Foundations in Data Science, which will be part of the Wisconsin Institutes for Discovery, will re-examine the core mathematics, statistics and computer science that make big data science possible. The ultimate mission will be to come up with new ways to more efficiently and effectively use big sets of data.
Stephen Wright, a professor of computer science with the university, is leading the 14-member faculty behind the new initiative. He said that big data faces some looming challenges that make revisiting the fundamentals necessary.
For one thing, big data is getting bigger.
“The size of datasets keep exploding,” said Wright. “It’s hard to develop algorithms and computational frameworks to keep up with that scale.”
As it is, there are many aspects of large datasets that current methodologies struggle with, said Wright: “A lot of the data is really noisy. A lot of it is really fuzzy. A lot of it is missing.”
There are also limits on how powerful computers themselves can become. Just within the last five years, advancements in computer processor development have slowed.
“Data processing and data analysis is a very intensive task,” said Wright. “The fact that computers have stopped improving so rapidly is something we need to consider.”
In addition to Wright, the institute’s roster will include UW faculty who have records of innovation through data science.
Robert Nowak, an engineering professor, has used big data to create an app that maps out flavors of beer to accurately predict what kinds of brew a person will like — sort of like Pandora, but for IPAs and lagers instead of music. Sebastian Roch, a mathematics professor, has made a splash in “computational biology,” mapping out complex patterns of evolution among species. Rebecca Willett, an associate professor of electrical and computer engineering, has worked at the forefront of medical imaging.
Wright said that they would focus on three areas of research for the first phase of its work: the expression of big data problems as mathematical statements, graphing big data problems and figuring out better ways of collecting data.
"We don’t try to look at the entire dataset," explained Wright, regarding the focus on data acquisition. "We try pick out small amounts of it, and just look at that. Instead of picking out randomly, you can get clever about."
The group has received a small grant from the National Science Foundation to begin its work. If the institute makes inroads, said Wright, the hope is that it get more funding from the NSF to grow into a larger enterprise that could work with other groups around the world on data science innovation.