Introducing the Pitt Data Catalog: A Dataset Discovery and Sharing Tool
“I’d like to share my research data, but I don’t have time to process and format it properly for a data repository.”
“My research is too sensitive to be put out in the open. I’m happy to share my data with other researchers who ask, but no one knows what I’m doing!”
“I put one of my datasets in a subject repository, and it’s been downloaded 87 times. I have other datasets, but it’s important that I keep control over who can access them, so I haven’t made them public.”
Are any of those sentiments familiar? Making research data openly available provides many benefits to data creators and the public: It furthers transparency and reproducible research, encourages collaboration among researchers and aligns with publisher and funder interests in data management and access. However, many researchers who would otherwise be interested in sharing their data may find that the one-size-fits-all approach of data repositories does not meet their needs or is too burdensome. For this reason, the Health Sciences Library System (HSLS) recently launched the Pitt Data Catalog to support dataset sharing and discovery, with an initial focus on health sciences research.
The Pitt Data Catalog is a curated source of information about datasets created by University of Pittsburgh health sciences researchers. It is not a data repository, and it does not host any datasets on the website itself. Instead, it provides information about each dataset and instructions for accessing it, just as a typical library catalog provides information and access instructions for books in a collection. For some datasets, those instructions might include a link to a public repository where the data are freely available. For others, it might include a corresponding author’s name, contact information and specific instructions to request one-on-one access to the data. Others may require visitors to sign in with a University of Pittsburgh username and password. This flexible approach means that researchers who keep their data on private or University storage or who want to limit access to only approved collaborators can still take advantage of the data catalog’s listings to boost the visibility of their datasets.
What’s in a Data Catalog Record?
The image at right portrays an example Pitt Data Catalog entry created specifically for this article. Since the data catalog supports researchers from across the health sciences, the metadata fields can be adapted to use terms common to specific specialties; our goal is for bench researchers, surgeons and public health policy makers to find the data catalog equally helpful.
The blue text in the record pictured at right are links that would take a user to resources related to this entry, such as other datasets created by the same author or that use “caffeine” as a keyword. On the right column, the record links to known or associated publications that use this dataset. The data services team at HSLS review catalog entries on a regular basis to update or add new related publications as they appear, and can correct or expand entries by an author’s request.
How to Get Involved in the Data Catalog
Every entry in the Pitt Data Catalog is created in collaboration with the researcher with the goal of making the process as quick and easy as possible. Here it is in short:
- If you have datasets you would like to have described in the Pitt Data Catalog, contact the HSLS data services team by email at firstname.lastname@example.org or through the contact page on the Pitt Data Catalog website and tell us a little about your work.
- We will schedule a phone or in-person consultation to learn more about your datasets and discuss the most appropriate terminology to describe your data. If possible, we will create a draft data catalog entry before our meeting for your approval. (If your datasets are already freely available online, this may be a very short meeting!)
- After we create the data catalog entry to describe your dataset, we will send it to you for corrections, approval and acknowledgement of our legal notice and disclaimer. The entry will appear online to the public after your approval. Each entry will have a unique and stable URL which can be used to advertise accessible datasets.
- If you would like to request changes to your dataset record after it appears online, you can contact us at any time. We may also reach out to you periodically to make sure the record’s contact information and other metadata is still correct.
HSLS Data Services offers support, consultations and customized trainings to help you manage your research data throughout the data lifecycle. We are happy to visit individual health sciences researchers, departments or labs to discuss your research data needs and help you determine whether the Pitt Data Catalog would be a good match for your datasets.
The University of Pittsburgh Health Sciences Library System is a member of the Data Catalog Collaboration Project and has customized this data discovery tool in part with federal funds from the National Library of Medicine, National Institutes of Health, Department of Health and Human Services under cooperative agreement number UG4LM012342 with the University of Pittsburgh Health Sciences Library System. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Helenmary Sheridan is the data services librarian at the Health Sciences Library System.