Senate plenary session: The Role of Research Metrics in Faculty Evaluation
“There’s no way we can exist as a serious educational institution if we’re not focused always on appropriate ways of evaluating all we do and all we do to others,” said University Senate President Frank Wilson in opening remarks at the Senate’s spring plenary session, “The Role of Research Metrics in Faculty Evaluation.”
“Evaluation in and of itself is not enough. It’s do we evaluate using appropriate tools, with appropriate frames of reference?”
Chancellor Patrick Gallagher agreed. “We need to understand the limitations, both good and bad, for these new tools and enlighten ourselves so we can use them appropriately,” he said in welcoming remarks to the audience of more than 100 at the March 29 event.
“With the advent of interconnected computers, internet-based and web-based portals, we have gone from, in many cases, being in data-poor environments to being surrounded by easily readily available data and information. There are ways that we can use this information to improve what we do.”
However, he cautioned, “You can be dazzled by those things that are readily measurable and take your eye off the things that are important but may be more difficult to measure: things that are in fact best measured with human judgment, and through experience, and through expertise, and in areas where we have not reduced them into quantifiable, distributable tools. …
“With all the power that comes from data and information, it often comes with the potential for misuse and even abuse,” Gallagher said.
“We are going to be increasingly in a world of data ubiquity. The wise course of action is to embrace that potential and to use it to advance what we do: to make our mission more effective, our research more compelling, our teaching better,” he said.
“The upside is real, but getting this right matters,” Gallagher said, acknowledging that there are valid concerns.
“We should not retreat into the nostalgic call to go back to the days before the data was there, but to an enlightened call to understand the limitations and the proper use of these new tools, to make sure that we use them effectively.”
One response, the Leiden Manifesto for research metrics, underpinned the keynote addresses.
Diana Hicks, the manifesto’s first author, provided an overview.
Hicks, a faculty member in Georgia Institute of Technology’s School of Public Policy, said some or all of the principles have been adopted by four universities, only one of which is in the United States: Ghent University (Belgium), Indiana University-Bloomington (United States), Loughborough University and University of Bath (United Kingdom).
“I think the challenge for us going forward is to make the manifesto a living document so that it can evolve and accommodate the learning that happens through experience over the years,” said Hicks.
The second keynote speaker, Cassidy Sugimoto, was instrumental in incorporating the Leiden Manifesto into policy at Indiana University-Bloomington, where she is a faculty member of informatics.
IU worked to solidify its policy after learning that a Rutgers University faculty member had discovered inconsistencies in his productivity data provided by Academic Analytics, a company whose services IU uses. After much debate, the faculty council approved a policy last year (www.indiana.edu/~bfc/docs/circulars/15-16/B47-2016Amended.pdf.).
During her presentation, Sugimoto pointed out some shortfalls regarding research metrics. Gender differences can be observed in scientific research, she noted.
“About 30 percent of production is coming from women, and women tend to produce lower, each individually, as well. This has to be taken into account when we’re evaluating individual scholars at the institutional setting,” she said.
Men are more likely to be cited in journals with the highest impact factor and to have senior author roles in studies.
“Women are significantly more likely to be associated with performing experiments, a middle author role, whereas men are given authorship for designing the study or contributing reagents for the study,” said Sugimoto. “Simply put, women are the hands of science while men are choosing which questions to ask.”
Sugimoto warned faculty about the pitfalls of altmetrics, which measures tweets and Facebook posts, among other things. Many tweets about research come from scientists in the same discipline as a paper. About 48 percent of tweets about research are sent by social scientists. Geography also may skew results. Most researchers on Twitter are from North America and Europe. They tend to promote North American and European research, causing research from other continents to be underrepresented.
Attention isn’t the same as impact. She illustrated the difference with a story about a paper that had an altmetrics score of 3,668. “Can apparent superluminal neutrino speeds be explained as a quantum weak measurement?” had 4,464 retweets not for its findings but for its blunt abstract: “Probably not.”
“Those things that have the highest altmetrics scores are humorous. They’re topical. They were written by Obama. Right, these are the kinds of things that make something highly tweeted: not quality, not social impact. And so, we have to be more discerning, more critical; take the tools of our science to evaluating scientists themselves; and make sure that we remain critical,” she said.
Pitt faculty panelists Gordon Mitchell, Sanjeev Shroff and Stephen Wisniewski responded to the keynote talks with prepared remarks.
Mitchell, assistant dean of the University Honors College and a faculty member in the Department of Communication in the Dietrich School of Arts and Sciences, called on Pitt to follow in IU’s footsteps and develop its own policy on the use of research metrics.
“It’s my long-term view that the University Senate should strive to hone its own version of Indiana’s policy on faculty scholarly activity systems, creating a shared governance framework that will position Pitt at the forefront of an intensifying national and global discussion of responsible use of scholarly metrics in academia,” he said.
He suggested several initial revisions to the IU document, advocating first for closing “a huge loophole” in the principle that addresses the role of quantitative indicators in evaluating faculty research.
The IU policy’s section on complementary methods states: “Indicators are inherently reductionist and should be used to supplement, rather than replace, other forms of review (such as peer review) that more fully contextualize the varied nature of academic performance.”
Calling attention to the concept of supplementing rather than replacing, Mitchell offered alternative language: “Methods of qualitative, expert assessment (such as peer review) should anchor judgments about the quality and impact of academic research. Quantitative indicators should be used to supplement, not overshadow, these qualitative assessments.”
The term “‘supplement, not replace’ would seem to speak most clearly in instances where only a quantitative indicator is employed to assess research,” he said, citing examples in the Dietrich school’s 2017 strategic plan. There, he said, some 14 outcomes related to making progress toward the University’s strategic goal of “engaging in research of impact” are measured solely using Academic Analytics. “It functionally supplants other forms of assessment,” he said.
In other parts of the Dietrich school plan, although a quantitative indicator is among multiple metrics for assessment, he said there still is cause for concern. He likened the case to an assembly line worker being reduced from a 40-hour workweek to a two-hour week after the line is automated. “Good news,” says the manager. “You’re not going to be replaced because the new machines will only be supplementing, not replacing, your labor.”
“Technically, one can satisfy ‘supplement, not replace’ with an approach that shrinks the thing being supplemented down to the level of an inconsequential token,” Mitchell said.
He also advocated for a revision to the IU policy that would allow faculty to recommend changes on the use of faculty activity systems, rather than only to allow for a vote on whether to recommend discontinuing the system.
Shroff, Distinguished Professor of and Gerald McGinnis Chair in Bioengineering, outlined some of the Swanson School of Engineering’s methods of evaluation.
Transparency and inclusion are key when it comes to determining the school’s strategic plan goals as well as the protocols for evaluating individual faculty and departments, he said, noting that those are decided collectively.
Regarding individual evaluations, Shroff said faculty aren’t expected to excel in all aspects. Instead, the department as a whole must collectively deliver on the goals of the strategic plan — in much the same way that a basketball team needs some ball handlers but must have guards, centers and forwards, he said.
“We really, really believe it’s futile to ask every faculty member to dot every ‘i’ and cross every ‘t’ as far as department goals are concerned,” he said. “It’s much better to have a faculty member being passionate about and outstanding in selected areas.”
Said Shroff, “In our view, trying to dot every ‘i’ and cross ‘t’s potentially leads to mediocrity, not excellence.”
The engineering school also considers the strategic goal of impact. Measuring real impact — does your work matter? — as opposed to traditional bean counting is by definition a long-term measurement, he said. “In contrast, the decisions you’re trying to make in terms of evaluation are on a different time scale.
“So, while it makes sense conceptually that impact analysis and evaluation is a good thing to do, operationalizing that may be a problem,” Shroff admitted. “The question is: Can we have surrogate measures … early predictors of impact? And how that can be incorporated into evaluation to supplement other metrics of evaluation?”
Wisniewski, vice provost for data and information and professor of epidemiology in the Graduate School of Public Health, said that the exponential growth in access to bibliometric tools is magnifying the importance of the first commandment of data analysis: Know thy data.
“You need to understand how the data were collected, any inherent biases or confounds associated with the collection process. You need to understand how your data are coded so when you analyze it, you can analyze it appropriately. Also, if you’re using a tool you have to understand that tool and what the process is,” he said.
“If you don’t follow this commandment, it’s going to lead to errors. … You’re going to make some false conclusions, which can be problematic in the long run,” he said.
“It’s important that all of our end users — individual faculty, administrators — understand these tools and what they mean, what the numbers mean, how the data were collected, how they were analyzed,” Wisniewski said.
For example, an H-index — a measurement of an author’s scholarly publications and citation impact — can be calculated using many different online tools, but each may yield different results due to the way each tool retrieves the citations, Wisniewski said.
“It’s not clear that a lot of people are aware of that. We need to make this information available.”
Wisniewski said the University Library System (ULS) bibliometrics page (library.pitt.edu/bibliometric-services) provides in-depth details on using and interpreting the output of bibliometric tools.
In addition, he advocated for urging bibliometric tools’ creators “to not just produce numbers but produce numbers in context. …
“Don’t just provide an H-index, but provide that H-index and explain what it means,” Wisniewski said. “Any of these bibliometric tools can easily do that, but it requires a little bit of effort. But it should be done in order to help others and everyone understand what the tool’s doing and that we are all in agreement on the interpretation of results.”
Speakers Hicks and Sugimoto joined the panel and took questions from the audience in a wide-ranging hourlong conversation.
Wisniewski asked Sugimoto to elaborate on why bibliometrics shouldn’t be used on an individual level.
The tools were created for aggregate decisionmaking, collection development and retrieval mechanisms, she said. “They weren’t really intended for the individual level, and I don’t think they’re well-suited for it.”
The issue boils down to basic principles of statistics. “The individual N is too small,” with too little data, and too much volatility and variability in it, she said. “You’d have to account for such different nuances in one’s production that the artifact left behind would be almost meaningless. Once you account for subdisciplinary areas, topics, age, gender, background, institution, what’s left? There’s very little signal that’s not noise at that point.”
“What are the compelling stories to you when you’re evaluating somebody?” Lauren Collister, a ULS scholarly communications librarian, asked the panel. Collister, who assists University users with altmetrics, said they’re often overwhelmed with the results. “‘What do I do with this stuff?’” they ask. “I always encourage them to use it to tell a story about their own research — whether it’s in a grant proposal, a CV, a job application or tenure portfolio. But they struggle with what kind of stories to tell, and they’re so used to citations and impact factor,” she said.
Sugimoto said information on research that’s been downloaded, rather than research that’s been cited, can be especially useful in telling the story of a scholar’s impact in practice, particularly in such fields as nursing, social work or education.
Hicks encouraged searching online for “REF impact case studies” posted by universities in the United Kingdom as part of a 2014 Research Excellence Framework exercise aimed at demonstrating the economic and societal impact of academic research. “There’s this amazing wealth of stories across every field of scholarship about research and the evidence for it changing some aspect of society,” she said.
Several audience questions focused on the use of, as well as the limitations of, bibliometric evaluation tools.
Wisniewski said tools are available online and through ULS, adding that, “as far as we know, no department, no school is using any type of bibliometric tool as an evaluation for individual faculty.”
The University primarily uses Academic Analytics at the unit level, he said, noting that individual departments can decide whether it’s useful for benchmarking themselves in comparison with other institutions.
He acknowledged weaknesses, citing, for instance, problems that may arise in benchmarking the physics and astronomy department against peers with departments devoted solely to physics or solely to astronomy.
Wisniewski said the provider has indicated willingness to work in such cases to develop appropriate peer groups. Department chairs also can adjust weighting for various factors, such as awards or grants, to best represent what’s valued in their field, he said.
Communication department chair Lester Olson raised concerns about the use of Academic Analytics in his area of scholarship.
“For me it’s virtually useless,” he said, citing multiple issues. “If we’re genuinely concerned about impact, I would hope that we would be interested in the long-term impact as well as short-term impact,” he said.
“In our organizational culture, we value single-authored books. That’s true across a lot of the humanities,” he said. “If you have written a book that is used today but was written 20 or 30 years ago, I would be in admiration of the abiding impact of that scholarship, but the book doesn’t exist in Academic Analytics” after 10 years, he said.
Also problematic in his field is that edited collections are counted as equivalent to single-authored books, he said.
Olson said using Academic Analytics presents a difficult situation for a department chair. “To use this, I have to agree not to show my colleagues the data — they cannot check. I cannot let them see how they’re represented, or I’m in violation of the contract,” he said.
He said last year’s departmental activity report found zero citations for seven of 12 faculty members, although in actuality most of them had hundreds of citations — easily found, he said, using such standard resources as Google Scholar.
“Is this really the way to go about measuring scholarly output?” he asked.
In her closing remarks, Provost Patricia E. Beeson thanked the speakers for helping inform what will be an ongoing discussion at Pitt, adding that committees are being formed to examine data governance and data analytics governance here.
“Data are with us, we’re not going to be able to avoid them, so we should embrace and use them appropriately,” Beeson said.
She reiterated principles that are important to remember:
• Numbers shouldn’t substitute for judgment.
• We should be transparent in what we’re doing.
• We should recognize the importance of developing a full story, rather than a limited one.
• We should develop accurate data of our own, “which is something we’ve been trying to do here through our online faculty evaluation system, for those who use it,” Beeson said.
• We must be knowledgeable about the tools, and use them appropriately.
“This is important not just for administrators,” Beeson said. “I think it’s important for all of us when we go into tenure and promotion cases. Those cases are put together by the individual faculty. They’re evaluated by the faculty of the department. And they come up through the structure as something that has been developed and evaluated by the faculty.”
Beeson said she is seeing increasing — but still limited — use of data in tenure and promotion cases. “The judgment in a tenure and promotion case is still overwhelmingly a peer evaluation, where, as we’ve discussed, some of these metrics may come in as part of the supporting evidence.
“But we really have to understand what sort of evidence it is that’s coming in,” she said. “And we know that’s not just for the administration but for all the faculty.”
The provost noted that Wisniewski already has led conversations at Council of Deans meetings on the meaning and use of evaluation metrics, adding that he brought in a data ethicist to talk about the proper use of data. “Not just what they mean, but when should you be using them, in an ethical sense,” Beeson said.
Wisniewski’s data analytics governance committee will include faculty government representatives and administrators in discussion, she said.
“I think what we’re talking about now are really the same sort of guidelines and principles that have been set out in the discussions today, and that’s in addition to what we’ve already begun working on.
“I think this conversation is really setting us up well for that ongoing conversation about developing the principles for data analytics governance here at the University of Pittsburgh,” she said.
The plenary session can be viewed in its entirety via a link at www.univsenate.pitt.edu.
—Kimberly K. Barlow and Katie Fike
Photos by Aimee Obidzinski/Photographic Services