Skip to Navigation
University of Pittsburgh
Print This Page Print this pages

November 6, 2014

Big data: Faculty panel weighs in

A panel of faculty members from across the University responded to Lyon’s keynote.

Senate plenary session panelists, from left: Michael Becich, Kelly Dornin-Koss, Barbara Epstein, Jay Graham, Alison Langmead, Jennifer Woodward and Liz Lyon.

Senate plenary session panelists, from left: Michael Becich, Kelly Dornin-Koss, Barbara Epstein, Jay Graham, Alison Langmead, Jennifer Woodward and Liz Lyon.

Michael J. Becich, chair of the medical school’s Department of Biomedical Informatics, discussed the department’s recent $11 million National Institutes of Health (NIH) award to lead a Big Data to Knowledge Center of Excellence. “It’s only a starting point. It’s just a grant,” he said.

“The focus of that grant is really to make intelligent analysis of data greatly impactful to health care and science. It’s not just about health care and it’s not just about the science,” he said.

“If I had to say what’s the DNA of the success to come out of this opportunity, it’s to promote wholesale sharing of data, software, of processes that people put together, to attach hardware to very hard problems that face us as scientists.

“The beautiful part about this grant is it will provide an opportunity for us to dig deeply both within the University of Pittsburgh and in our region with Carnegie Mellon as a partner, to start training data scientists on the use of the tools we’re going to produce. It’s my role in the grant to reach out to the 11 other funded sites across the country and bring their tools to life in this environment for our scientists in the region.”

Kelly Dornin-Koss, director of the Education and Compliance Office for Human Subject Research, commented on the compliance aspects of data sharing and touched upon Lyon’s point that while there is peer review of publications, rarely is there peer review of data. “That’s an area I think we need to develop here at the University,” she said.

Not only does her office present seminars on good research practices, it monitors and audits data. “We see the small datasets and what they look like. And throughout the years we have seen how data are maintained throughout the University. They’re kept in a variety of different ways: Some that we would promote, others we wouldn’t,” said Dornin-Koss.

“What we would like to see is that research is collected in a systematic manner and that there are sophisticated data collection tools used,” she said.

Perhaps surprising in the digital age, she noted, data often are maintained in hard-copy rather than digital form. “One concern with that is, if we’re working with investigators that are working as sponsors of an investigational new drug, there’s a whole set of FDA regulations surrounding electronic records and signatures,” she said.

Dornin-Koss noted that FDA regulations go further than those regulations for research supported by Health and Human Services. “There are additional requirements for products such as drugs, devices and biologics,” she said. “As an institute we have to start striving toward compliance with that set of regulations,” she said.

“I think the one difference that’s important to point out is that sponsors monitor the ongoing conduct of the study,” she said. “So, if it’s a study sponsored by a pharmaceutical company, they will send somebody out every six-eight weeks to look to ensure that the protocol is being followed and that there is good data being collected.” In contrast, only a few NIH institutes come to look at source data, she said.

Barbara Epstein, director of the Health Sciences Library System (HSLS), discussed what librarians bring to the table.

“The research librarian community in health sciences and general academia has been exploring how to meet the new needs of faculty and students and how to develop library-based research data services — and what services to offer and what we can realistically deliver,” she said.

“There’s also a new push in the curricula of information schools and library and information science programs on developing a new breed of librarians and enhancing the skills of the librarians that are there,” she said, noting that big data creates a need for data librarians, research services managers and data curators.

Most academic libraries — including Pitt — “are still exploring the demand for services, the types of services to offer, and the professional skills that are needed,” Epstein said.

These kinds of new roles will require funding, training and staffing.

She sees a hierarchy of potential services librarians could provide: easier services such as reference support and consultation on finding and citing data and datasets and data repository; helping faculty with data management planning, and data and metadata standards.

A higher level of services are technical and hands-on, “embedding librarians in research projects to provide support for data repositories, for discovery, for creating metadata,” she said.

HSLS reference services traditionally have emphasized support for faculty research activities, she said, adding that HSLS has a data management working group. A survey it conducted last year showed “considerable lack of knowledge and a demand for education among faculty, postdocs, graduate students and other researchers about data management, about conventions, file naming and just general knowledge about what people want to know,” Epstein said.

HSLS offers a workshop series on data management topics such as data management planning, data sharing and discovery and regulations on data management in collaboration with the Clinical and Translational Science Institute’s responsible conduct of research training center.

The University Library System, which has a coordinator of digital scholarship services, offers some longterm storage for certain datasets through the University’s data repository, d-Scholarship@Pitt.

ULS also provides consultation on such concerns as funding, publisher requirements, data sharing, data management planning, describing and citing data, locating appropriate disciplinary repositories and identifying sustainable formats for data.

Jay Graham, enterprise architect with Computing Services and Systems Development (CSSD), resonated with Lyon’s concern for where researchers’ data are.

“Fifteen or 20 years ago, a lot of people had data stored on a server under their desk within the departments,” where it wasn’t being backed up and was at risk for loss.

Data now are stored centrally through CSSD’s Network Operations Center. “We’re right now just completing a $5 million upgrade to the center so we can quadruple the capacity for power and cooling,” Graham said.

In addition to protecting against loss, better security protects the University’s reputation.

“You want to make sure that the collection of the data is valid, but you also want to make sure that the data stays secure and intact so that somebody doesn’t hack into a server and actually change the data. That’s probably one of my biggest fears, not really stealing the data …  but the changing or modification of that data,”  Graham said.

In addition, CSSD is looking into electronic lab notebooks in order to provide a centralized service “so that the data is centralized in one place and we can apply controls systemically,” he said.

CSSD also received a National Science Foundation grant to increase the network capacity between campus and the data center and with the Pittsburgh Supercomputing Center, “which gives us access to a lot of the nationwide research networks,” Graham said. It also will begin to overlay the campus with a framework for high-speed network connectivity, beginning with Old Engineering Hall. However, “There’s still a lot of infrastructure that needs to be acquired,” he cautioned.

“It was easy to secure data when it was on a USB key or a floppy disk using the old technology. Now we’re talking about data that’s going to be distributed, data that’s coming from laboratory equipment that’s housed all over campus, stored locally and transferred centrally for analysis and analytics,” Graham said. “I think it’s a really exciting time … I’m glad we’ll all be part of that.”

Alison Langmead, director of the Visual Media Workshop in the Department of History of Art and Architecture and a faculty member in the School of Information Sciences, discussed data requirements in the humanities.

“As humanists and social scientists go out into the field, the value of our data does not always come from its massiveness or the sensor-based collection of data out in the world. It comes from the amount of human time it takes to read the historical record and draw out pieces of our past. It isn’t perhaps exabytes of data, but it is hard-won data from the history of humanity that takes trained experts to go into the field and understand,” she said.

“The value of that data is high because of its value to us as human beings, the same as scientific data. It may have the appearance of being something different, but honestly it is the same.”

The humanities have fewer regulatory requirements than the sciences, but data management in highly image-based disciplines such as history of art and architecture presents its own challenges.

“Computers deal with those things relatively poorly so we spend a lot of time managing our data — oftentimes to plan for the computer that will come after this computer. We know we can only do so much with what we have now and we look forward to the day when we could actually reap greater benefits out of this stuff that we pull from the archives.”

Jennifer Woodward, associate vice provost for Research Operations, spoke to “the gorilla on our back” — how to pay for all of this.

There’s not a lot of history available to forecast what the cost is going to be, or how it will affect the budgeting, staffing and resources available. Other unknowns are which needs will be relevant as data management plans and services are developed.

“It cannot end up being just an add-on” to the services we’re already providing, Woodward said. “In order for it to be effective and efficient we really have to develop a plan for how to move this forward in an organized way.”

And, she said, “Once we’ve developed and implemented services and plans, how are we going to sustain it long-term when we really don’t know what we’re going to be projecting down the road?”

She offered some possible sources of funding:

  • Budgeting data management and data sharing plans into individual research grants.
  • Internal institutional support. “Priorities will need to be thought about and adjusted since we have constraints and limited financial resources,” she said.
  • Recouping some costs through fee-for-services infrastructure such as coordinated cores or technical support.
  • Full-service research management services that would charge costs back to investigators’ grants.
  • Cost-cutting through collaborations with other institutions or entities such as hospitals, research institutes or companies, to share expenses and resources.

“Probably all of these in some capacity will need to be used,” Woodward said.

—Kimberly K. Barlow

The 2014 fall plenary session was streamed live and is posted online. The video can be accessed using the link and password posted under the “plenary” tab at

Filed under: Feature,Volume 47 Issue 6