Skip to Navigation
University of Pittsburgh
Print This Page Print this pages

November 6, 2014

Plenary session tackles big data

Faculty experts from across the University took on the timely topic of big data and its implications in the University Senate’s fall plenary session, “Managing Research Data: Challenges and Opportunities at the University.”

Liz Lyon, former associate director of the UK Digital Curation Centre and now a visiting faculty member in the School of Information Sciences, presented the keynote at the Oct. 23 event, held at the William Pitt Union.

*

Liz Lyon delivers the keynote address at the Senate’s fall plenary session on managing research data.

Liz Lyon delivers the keynote address at the Senate’s fall plenary session on managing research data.

“This is not an easy thing to do, to manage the research data of an institution effectively,” said Liz Lyon, visiting professor in the School of Information Sciences, in her University Senate plenary session keynote, “Gearing up for Data? Institutional Drivers, Challenges and Opportunities.”

“We shouldn’t underestimate the explosion, the deluge, the tsunami … of data,” she said, citing the sheer scale among the reasons why research faculty should care about managing data.

Data can be lost, she said. In 2005 a fire destroyed the University of Southampton’s electronics and computer science departments. “A lot of optics and optoelectronics materials were lost, data was lost, PCs were lost, research was lost.”

Lyon’s laptop recently bounced off a conveyor in an airport security line. “Because it was in a case, it was okay and my data is in three different places, so it was okay. But accidents do happen. Laptops get left on buses or trains or in cabs, USB sticks get dropped in glasses of beer. … You need to think quite heavily about where your data is.”

Reputations on the line

Increasing concerns about research quality and the implications for a researcher’s — or his or her department or institution’s  — reputation are making news.

Lyon cited lessons to be learned from the trouble that ensued after other researchers couldn’t replicate cancer research results published by Anil Potti and colleagues from Duke University. Duke’s subsequent inquiry uncovered additional difficulties, leading to the article’s retraction from Nature Medicine.

She pointed to a “gold standard” for authors on the continuum of reproducibility promoted by Johns Hopkins biostatistician Roger Peng. “You publish your code, your data, everything that allows the reproducibility of the claims that you’re making in your article,” she said.

Lyon called attention to The Science Exchange, which provides reproducibility as a service to universities. “You can get your data reworked and checked by this service,” she said. The organization has received funding to take 50 of the most impactful cancer studies and rework them: “They’re going to go back to the data and rework those studies to make sure the results are truly valid.”

Managing research data in open form can facilitate partnerships and collaboration. An unusual agreement among academics and pharma companies to share data has produced results in advancing Alzheimer’s disease research, Lyon said.

Funding

Money is another issue. Many funding agencies are requiring data management plans or data sharing plans as part of the submission process, she said. “Data is squarely on their agenda too,” she said.

Institutions must pay attention as well. In the United Kingdom, government research councils have placed responsibility for data management on the institution, not the primary investigator.

Among the requirements set out in letters to vice chancellors was development of an institutional data roadmap.

“They required an institutional data policy, they required processes to be in place, they required adequate data storage, they required any datasets that were generated from EPSRC (Engineering and Physical Sciences Research Council) grants to be fully described, documented, structured metadata, digital object identifiers assigned to them and, best of all, securely preserved for a minimum of 10 years.

“That caused quite a lot of waves in the UK but universities have responded. Many have roadmaps … many are developing infrastructure, services and institutional policies on data, so there’s been activity to address the challenge,” she said.

Disciplinary diversity

Managing a variety of data is a particular challenge for large, complex universities like Pitt. “For an institution of this scale and size, coordination is absolutely essential,” Lyon said. The University needs “some sort of common strategy or roadmap, if you will, to take us through effectively into the future.”

Although processes for collecting and working with data may vary across disciplines, there are some commonalities that can be thought of as a data lifecycle, she said. “We think about how we create, collect, generate data; how it is processed; what software tools are used; how to visualize and analyze it and think about infrastructure for preserving it.

“We can think about how to manage access to it. It’s about intelligent access to data: We’re not just saying ‘Here it is, have it all.’ I think we have to be quite canny about it and how best to manage release of data,” she said, adding, “The final part of this life cycle is how to facilitate others to reuse our data.”

New infrastructure, new services, new tools and platforms are needed, although the cost of creating data management infrastructure remains a gray area, she said.

But there are benefits: “If we have the right services and infrastructure in place, it will help you as researchers to do your research more effectively and more efficiently,” she said.

And infrastructure creates new jobs that require new skills. “There is a shortage of people who are savvy with data,” she said, adding that she hopes Pitt’s iSchool can generate graduates who can fill those skills gaps.

Champions needed

Open-data culture must be championed, she said. “To me this is the biggest challenge of all, because you don’t change culture overnight, you don’t change people’s behaviors overnight.

“We must think about how our data is cited and how we can get credit for our data and how that can be built into career development,” she said. Noting that studies have shown a citation advantage from open data, she added, “There are now tools to track and monitor who’s using your data, who’s citing your data, who’s tweeting your data. We can start to realize the ability to be able to include this information in your CV and submit it to a tenure committee.”

“For me, if we have our CV and we have our publications and our other outputs, perhaps we should be thinking about putting our data in there: our datasets that we’ve used and reused, our software, electronic lab notebooks.”

The institution can benefit too. “We live in a world of rankings,” many of which reference research activity. “Maybe in the future, among the formulas for how they develop these rankings, might be research data and data outputs,” she said.

—Kimberly K. Barlow

The 2014 fall plenary session was streamed live and is posted online. The video can be accessed using the link and password posted under the “plenary” tab at univsenate.pitt.edu.

Filed under: Feature,Volume 47 Issue 6