Skip to Navigation
University of Pittsburgh
Print This Page Print this pages

April 4, 2013

BIG DATA: Pitt joins consortium

dataThe University is part of a new consortium of universities, businesses, economic development groups and state and local government that aims to put the Pittsburgh area region on the map as a hub for big data education and jobs.

Data are ubiquitous, with mountains of data being generated in everyday life: by individuals engaging in such simple acts as swiping an affinity card at the grocery store or viewing a Web page, to researchers engaging in deep scientific inquiry.

Big data, as an applied field, manages that large-scale information which, once it is collected, needs to stored, organized, managed and analyzed in order to be useful.

Pittsburgh DataWorks, which was launched last month, will encourage investment in big data and plans to host training sessions, executive education programs and curriculum development initiatives to boost local workers’ skills in the field.

A March 21 Pittsburgh DataWorks kickoff event at the University Club drew some 250 people, said Pitt computer science faculty member Alexandros Labrinidis, who, along with colleague Panos K. Chrysanthis, serves on the Pittsburgh DataWorks advisory board. The two co-direct the University’s Advanced Data Management Technologies Laboratory (, which is involved in several big-data projects.

Among them is  “AstroShelf,” a collaboration with faculty in physics and astronomy that aims to help astronomers manage and annotate data from astronomical surveys; another is a simulation project on modeling turbulent combustion in conjunction with engineering faculty.

Pittsburgh DataWorks plans to take baby steps in its first year, establishing itself as a nonprofit entity and sponsoring events every month or so, said Chrysanthis. By the end of the group’s second year, the group’s impact should be visible — be it through students’ increased interest in the field, collaborations or venture capital for big-data initiatives, he said.

Along with Pitt, other founding members of the Pittsburgh DataWorks are The Allegheny Conference, Carnegie Mellon University, Draper Triangle Ventures, Google, IBM Innovation Works, Leech Tishman, Management Science Associates, Nettapp, Pittsburgh Supercomputing Center, Pittsburgh Technology Council, UPMC and the Urban Redevelopment Authority.

In a prepared statement, Pittsburgh DataWorks board member Jerome Pesenti, chief scientist for big data at IBM, said: “Pittsburgh is a growing market with leading providers of big-data solutions and organizations such as banks, retail enterprises, logistics firms and energy companies that are using the data science to optimize their businesses. We have hospitals, universities, nonprofits and a thriving entrepreneurial community. Pittsburgh DataWorks can bring these thought leaders together to solve real world business and civic problems.”

Labrinidis noted that the Pittsburgh area has both great talent and great needs when it comes to big data. In part, the brainpower at the universities and local IT companies contributes to the need: “Lots of university projects need this technology,” he said, adding that banks, health-care systems, manufacturers and others do, too.

Chrysanthis noted that the health-care industry is among the major users of big data in areas as broad as electronic health records data or as fine tuned as dealing at the DNA level in the practice of personalized medicine.

Even before the term “big data” came into fashion, Pitt has been involved in managing large amounts of data — in what for decades has been known as VLDB (very large database) analysis, Labrinidis said.

What has changed, Chrysanthis said, is the granularity in analysis. Improvements in computing power and data storage have made it possible to keep and use more detailed data than ever before.

When it comes to defining big data, it’s not all about size. A common rule of thumb had been “If you can’t do it in Excel, it’s big data,” quipped Labrinidis. However, said Chrysanthis, it’s more helpful to consider additional factors to better understand exactly what is meant by the term. The six V’s — volume, velocity, variety, veracity, variability and voracity — all factor into big data, with size, speed and the variety of data the major factors.

All that information — which is being generated in larger amounts and higher speeds all the time — has to be received, stored and analyzed in order to be put to use.

For instance, water quality data could be analyzed in real time to rapidly alert authorities to environmental quality issues or even homeland security threats. Analysis of retail data showing an unusual uptick in prescriptions for antibiotics or even thermometers and tissues could point to an emerging outbreak of illness that could impact public health.

Social factors also have an effect: While earlier generations may have guarded their privacy, Chrysanthis said, “Now the notion of privacy has been almost completely eliminated.”  Social norms make individuals more willing to share information — through tweeting or Foursquare check-ins, for instance.

Consciously or not, everyone these days is a generator of big data. Chrysanthis said, “We have become human sensors: We generate a huge volume of data,” ranging from lab results when we visit the doctor, to advertisers tracking the sites we visit when we surf the Web, to location data  that result from our cell phones’ communication with nearby cell towers.

“Data is treasure,” Chrysanthis said. And companies have a huge appetite for it, recognizing the business value in it.

Pittsburgh is a prime spot for developing big-data initiatives. “The infrastructure exists,” he said, pointing to the Pittsburgh Supercomputing Center, local universities and IT companies. What will be required is investment, especially in training people to work in the industry. The need exists for programmers who build the systems, those who build apps, experts in data mining and individuals who can train others in data analysis.

Opportunities in big data are expanding, with industry analysts predicting the sector will grow from $23 billion in 2011 to about $44 billion in 2016, with as many as 190,000 new analytical professionals expected to be needed in the United States in the coming decade.

The Pittsburgh DataWorks initiative stands to benefit the University community in a variety of ways: connecting students and companies for internship and job opportunities as well as by helping faculty from different institutions network with each other and with businesses to foster collaboration. In the bigger picture, the group aims to raise the region’s profile as a player in big data — along with Silicon Valley and Boston, among others.

Some supporters envision the big-data industry could become as vital to the region as steel was a century ago, Labrinidis noted.

“Pittsburgh is on the map now,” he said, adding that the University’s role among the founding members of the consortium stands to raise Pitt’s profile as a leader in the field, which could be a magnet for the University’s graduate and undergraduate programs.

At Pitt, information technology coursework in computing goes beyond the computer science department to encompass the areas of information science, computer engineering, bioinformatics, business information management systems, intelligent systems programming, public health and engineering.

However, people outside those fields also will need training in data analysis, Chrysanthis said. “Everything you do from this point on can’t be without data,” he said. “It will affect everybody now.”

—Kimberly K. Barlow

Leave a Reply