Science has been extraordinarily successful at taking the measure of the world, but paradoxically the world finds it extraordinarily difficult to take the measure of science — or any type of scholarship for that matter.
That is not for want of trying, as any researcher who has submitted a PhD thesis for examination or a manuscript for publication or an application for funding will know. Assessment is part of what we do and not just at the level of the individual. The UK research community has just prostrated itself for the sake of Research Excellence Framework (REF), an administrative exercise run by the Higher Education Funding Council for England (HEFCE) that has washed a tsunami of paperwork from every university department in the country into the hands of the panels of reviewers now charged with measuring the quality and impact of their research output.
The REF has convulsed the whole university sector — driving the transfer market in star researchers who might score extra performance points and the hiring of additional administrative staff to manage the process — because the judgements it delivers will have a huge effect on funding allocations by HEFCE for at least the next 5 years. The process has many detractors, though most might grudgingly admit that the willingness to submit to this periodic rite of assessment accounts at least part for the high performance of UK research on the global stage. That said, the enormity of the exercise is reason enough to search for ways to make it less burdensome. So before the current REF has even reached its conclusion (the results will be posted on 18th December), HEFCE has already started to think about how the evaluation exercise might itself be evaluated.
The metrics review
As part of that effort and at the instigation of the minister for science and universities, David Willetts, HEFCE has set up an independent review of the role of metrics in research assessment. The review is being chaired by Prof James Wilsdon and, along with eleven others from the worlds of academia, publishing and science policy, I am a member of its steering group (see this Word document for the full membership list and terms of reference).
Metrics remain a tricky issue. In 2009 a pilot of a proposal to supplant the REF (or rather, its predecessor, the RAE) with an assessment that was largely based on bibliometric indicators concluded that citation counts were an insufficiently reliable measure of quality. How much has changed since then is a question that will be considered closely by the steering group, although this time around the focus is on determining whether there are metrics that might be used meaningfully in conjunction with other forms of assessment —including peer review — to lighten the administrative load of the assessment process. There is no appetite for a wholesale switch to a metrics-based assessment process. To get an overview of current thinking on metrics from a variety of perspectives, I would recommend this round-up of recent posts curated by the LSE Impact blog.
One thing that has changed of course is the rise of alternative metrics — or altmetrics — which are typically based on the interest generated by publications on various forms of social media, including Twitter, blogs and reference management sites such as Mendeley. The emergence of altmetrics is very much part of the internet zeitgeist. They have the advantage of focusing minds at the level of the individual article, which avoids the well known problems of judging research quality on the basis of journal-level metrics such as the impact factor.
Social media may be useful for capturing the buzz around particular papers and thus something of their reach beyond the research community. There is potential value in being able to measure and exploit these signals, not least to help researchers discover papers that they might not otherwise come across — to provide more efficient filters as the authors of the altmetrics manifesto would have it. But it would be quite a leap from where we are now to feed these alternative measures of interest or usage into the process of research evaluation. Part of the difficulty lies in the fact that most of the value of the research literature is still extracted within the confines of the research community. That may be slowly changing with the rise of open access, which is undoubtedly a positive move that needs to be closely monitored, but at the same time — and it hurts me to say it — we should not get over-excited by tweets and blogs.
That said, I think it’s still OK to be excited by altmetrics; it’s just that the proponents for these new forms of data capture need to get down to the serious work of determining how much useful information can be extracted. That has already begun, as reported in a recent issue of Research Trends and I look forward to finding out more through the work of the steering group. Though I have already written a fair amount about impact factors and assessment, I don’t feel that I have yet come close to considering all the angles on metrics and claim no particular expertise at this juncture.
That’s why I would encourage people to respond to HEFCE’s call for evidence which is open until noon on Monday 30th June. The review may have been set up at the behest of the minister but it remains very much independent — as I can attest from the deliberations at our first two meetings — and will take a hard look at all the submissions. So please make the most of the opportunity to contribute.
Beyond the review
Although the remit of the review is strictly limited to consideration of the possible role of metrics in future assessment exercises, I can’t help wondering about the wider ramifications of the REF.
The motivation behind the process is undoubtedly healthy. The validation of the quality of UK research and reward of those institutions where it is done best, instills a competitive element that drives ambition and achievement. But however the evaluation is performed, the focus remains on research outputs, primary among which are published papers, and that narrow focus is, I think, problematic. I hope you will indulge me as I try to pick apart that statement; my thinking on this topic has by no means fully matured but I would like to start a conversation.
I know from my time making the arguments for science funding as part of Science is Vital that it is hard to measure the value of public spending on research. As shown in classic studies like those of Salter and Martin or the Royal Society The Scientific Century report, this is in large part because the benefits are multi-dimensional and hard to locate with precision. They include the discovery of new knowledge, realised in published papers, but within the university sector there are many other activities associated with the production of those outputs, such as training of skilled graduates and postgraduates, development of new instruments and methods, fostering of academic and industrial networks, increasing the capacity for scientific problem-solving and the creation of new companies.
There is a whole mesh — or is it mess? — of outputs. The latest incarnation of the REF has made a determined effort to capture some of this added value or impact of UK research but has wisely taken a pragmatic route. Realising that a metrics-led approach to measuring impact presents too many difficulties, not least for comparisons between disciplines, HEFCE instead asked departments to produce a set of impact case studies, which give a narrative account of how published research has impacted the world beyond academia. Although there has been much carping about the introduction of impact agenda, which many see as boiling the research enterprise down to overly utilitarian essentials, the retrospective survey of the wider influences of UK research output embodied by the REF has been a surprisingly positive experience, not least because it has unearthed benefits of which many university departments were previously unaware. Collectively, the case studies might even provide a rich resource with which to argue for continued and increased investment in the research base.
Even so there are other problematic aspects to the REF. In the past year, as well as generating all the paperwork needed for our REF submission, our department has undergone an external review of its undergraduate teaching. As the current Director of UG Studies (DUGS) I was required to take a leading role in preparing the voluminous documentation for this further assessment exercise — a 58-page report with no fewer than forty appendices — and organising a site visit by our assessors involving many different staff and students. As with the REF, the process is administratively onerous but the exercise nevertheless has significant value: it provides an opportunity to take stock and serves as a bulwark against complacency.
But the question that now looms in my mind is why are these assessment exercises separated? The division appears arbitrary, even if it makes some kind of logistical sense, given the strains that they place on university departments. From that perspective it might be difficult to argue for any kind of unification but there is a fundamental issue to be addressed: is it sensible to isolate research performance from other valuable academic activities?
These other activities include not just UG teaching but also postgraduate training, mentoring of young postdoctoral scientists, peer review of research papers, grant and promotion applications, institutional administration, the promotion of diversity, and involvement in public discourse. Arguably the separation (which in reality means the elevation) of research from these other activities is damaging to the research enterprise as a whole. It creates tensions within universities where staff are juggling their time, more often than not to the detriment of teaching, and is responsible for a culture that has become too dependent on publication. This distortion of the academic mission has been worsened by the reification of journal impact factors as the primary measure of scientific achievement.
Evidence published last Tuesday shows that, in biomedicine at least, the most important predictor of ’success’ for an early career researcher is the number of first-author papers published in journals with high impact factors. The measure of success here is defined narrowly as achieving the independent status of principal investigator running your own lab (usually by securing a lectureship in a university or a long-term post at a research institute). It should come as no surprise that the well-known rules of the game — publish or perish — should produce such an outcome. But what is missing here is consideration of the negative impacts of the artificial separation of research from other facets of the job of academic.
In recent months I have spoken to more that one young researcher who has abandoned the dream of leading their own research group because of their perception of the extreme intensity of the competition and the sure knowledge that without a high-impact paper they are unlikely to make it in such a world. A ‘pissing contest’ is how one memorably described it. Is anyone counting the cost of those broken dreams? Should not these losses be counted in our research assessment processes?
It has often struck me that an academic career is a tremendous privilege; it offers the chance to follow your curiosity into uncharted territory and to share your love of your discipline with the next generation. There are still plenty of people who derive great satisfaction from their work in the academy — even I have my good days — but I detect increased levels of stress and weariness, particularly since becoming DUGS. The responsibilities of that position have had some impact on my own research output but I was willing to take it on because I believe in the multifaceted role of ’the academic’ and in the broader value of the institution known as the university*. However, it has not been an easy task trying to promote the value both research and teaching in a culture — promoted in part by the REF — that places such a supreme value on research output. In such an environment, research cannot do other but conflict with teaching and that is ultimately to the detriment of both. And to the student experience. And to the quality of life for staff.
These issues are not new, and have been addressed previously by the likes of Peter Lawrence and Ron Vale. The San Francisco Declaration on Research Assessment, which has just celebrated its first anniversary (please sign up), is the latest attempt to rein in the mis-measurement of research achievements. But while there may be local efforts to hire and promote staff based on performance across the whole range of academic activities, research remains an international business involving the exchange of people and ideas across national boundaries, so a coordinated effort is required to solve these problems, or at the very least to identify and promote instances of best practice.
To that end, what are the chances that the REF might take a lead — perhaps even by using metrics? If we are going to take some account of citations or downloads in discussions of research quality, why not consider adding other measures designed to capture the student learning experience, or staff satisfaction, or good academic citizenship, to create a basket of measures that might rebalance the incentives for universities and their staff? There are huge and obvious problems with such an approach that need careful consideration; I am not proposing that we submit thoughtlessly to the whims of student satisfaction surveys, but am intrigued by how measures of workplace quality might play a role).
There are no easy answers. I anticipate some will argue that switching the tight focus of the REF away from research risks undermining the power of the UK research base. But to those tempted to follow that line, please evaluate the cost of not doing so and report back.
Footnote
*Though I cannot deny that my motivation for applying for a lectureship back in 1995 was to secure a permanent foothold that would enable me to start a career as a PI. At the outset I was prepared to pay the quid pro quo of teaching hours demanded but was advised not to get over-enthusiastic about teaching if I wanted to get promoted.