The harms of hidden research

A killer whale labelled 'killer facts' - the first thing I found when searching Google Images for 'Killer Facts'...It’s all about the ‘killer facts’. If you want to get social science into policy, then – as Alex Stevens’ wonderful covert ethnography of high-level policymaking shows – killer facts are the name of the game.  And we try hard on the blog to get these across to you, as often and clearly as we can.

But sometimes it’s necessary to take a step back, and think about whether these killer facts are, well, ‘facts’ at all. These issues come up repeatedly on the blog, not least when debating the effects of inequality on society; indeed, the very first post on the blog was about the Spirit Level debates, and we’ve come back to this since.

In this pair of posts, I want to challenge every researcher (and every user of research) to demand another bit of credibility in fact-creation: full transparency.

Center_for_Global_Development logoThis is all prompted by a friend pointing out that the Center for Global Development now has a policy of publicly sharing ALL the data and computer code that underlies all the numbers they create in their publications (where they are allowed to). I think everyone should do this – and in fact, I would go as far as saying that it is morally dubious not to do this. In this post I’ll explain why this is necessary, and then next week I’ll show why the opponents of transparency are misguided.

In favour of transparency

My favourite argument in favour of transparency comes from Jeremy Freese, to which I’ve tacked on my own thoughts – but his paper is great reading [(see also the whole SMR special issue on this from 2007 including papers by King, Firebaugh, Abbott and a response by Freese; Jeremy’s paper is freely available, and I’m happy to send on any of the other papers to anyone who wants them – transparency cuts in many ways…).  This Fraser Institute 2009 report looks good too].

The essential reasons for transparency are to overcome three problems.  Firstly – and as anyone who has done any quantitative social science will know – there are a whole load of arbitrary decisions in doing any analysis.  Which part of the sample do I use, and do I weight them?  How do I turn the raw questions into the variables in my model?  Which exact outcome do I look at?  Which variables do I include in the model?  What sort of statistics do I run?  Etc. And sometimes people just make plain mistakes.  In fact, the number one rule I was told as a new researcher was, ‘if you get a really exciting result, you’ve probably done something wrong’ – and they were right.

Secondly, in an ideal world, the errors created by these arbitrary decisions would be random, and – on average – research results would still be the truth.  Ah, to live in such an ideal world. In reality, researchers are biased towards a particular answer. This does not mean that they set out to fib; it’s mainly a matter of unconscious biases, and the desire to create a ‘statistically significant’ result that will get you published in a good journal, and thereon to fame and fortune.

Finally, data collection usually happens with public money, or (to a lesser degree) using money from charities. This money is often being spent to create knowledge and promote the public good. If someone has gone to the trouble of spending all this money and bothering people, then it seems best to share this data for other people to use, helping to further scientific understanding at the lowest possible economic and ethical cost.

Does all this matter

In practice, getting hold of other people’s data (let alone the code that sets out their analyses) is often a challenge.  Freese cites Wicherts et al (2006) who found only 27% of authors in the American Psychological Association’s top journals complied with repeated requests for data for verification purposes.  Even worse replicability was found historically in economic journals. Wicherts et al’s latest (2011) paper moreover finds that worse papers (weaker evidence, more apparent errors) were less likely to agree to share their data – which sounds a lot like covering up to me.

And this matters. There are many examples of authors trying to replicate published findings that are being bandied about in public debate and finding themselves completely unable to recreate the results (see the examples cited in Freese and particularly McCullough et al) – including debates around abortion & crime, school choice, and the impact of policing.

More widely, I love Lehrer’s piece in the New Yorker on the ‘decline effect’ – someone finds a new ‘truth’, there’s a delay before people get the data to replicate it, and then it turns out it wasn’t true to begin with. But only after everyone has got excited about the original finding.  Again, this it not deliberate fraud, but rather is the result of chance findings and data mining as described by Ionaddis 2005, in paper with the name ‘Why most published research findings are false’.


That said, there are several strong arguments against transparency, which are the subject of much debate between Freese, King, Abbott and Firebaugh (and others). In the second part of this post, I’ll look at these arguments against transparency, and see if they stand up to scrutiny. And finally I’ll conclude with a message about how all of us interested in inequalities – researchers, policymakers, and people who just want to know the truth – should change the way they do things as a result.

12 responses to “The harms of hidden research”

    • Thanks Mel – just had a skim, and look forward to reading it properly! For those who can’t access the UCL, the paper is Mel’s ‘Do we need a strong programme in medical sociology?’. The ‘strong programme’ in the Sociology of Science is the idea that we cannot justify what scientists believe merely by invoking ‘truth’; instead, we need to sociologically understand why some knowledge claims are accepted as truthful and others aren’t, and this is influenced by the messy real world of people. Mel’s paper looks at the way knowledge around unemployment is constructed, and Katherine Smith has done some similar, more recent work on the health inequalities field.

  1. Hi Ben,

    A V interesting post and, as usual, I agree with much of what you say. However, whilst I agree with the sentiments you’re expressing in relation to transparency, I do think there are some major issues that you haven’t touched on here (though perhaps you will come to them in the next installment).

    First, it is a bit unclear whether you are referring solely to quantitative data or whether you also think that it’s ‘morally dubious’ not to share qualitative data. This is important as, whilst it’s often relatively easy to anonymise quantitative data, it can be a far harder task for qualitative data (in fact, almost impossible for some kinds of ethnographic data or interview data in which the community of participants is small). For example, whilst I feel I can sufficiently anonymise extracts from my own interview data (with health inequalities researchers, policymakers, funders, etc), if the whole transcripts of conversations were to be made public, a lot of people within these small communities would be able to guess the identity of other speakers, merely by linking different sections of the interview and/or recognising common phrases used by people they know. As I have given each interviewee a commitment of anonymity and confidentiality, and as the interviews include some contentious information (e.g. personal attacks on colleagues), I feel it would be ethically wrong to release the full transcripts (especially as some interviewees specifically sought reassurance that I would not do this). I could potentially go back and try to further anonymise each of the 100+ transcripts I have now collated and try to contact each interviewee individually to ask if they would agree to the more anonymised version of the transcripts being made public. However, this would be extremely time consuming for me and the interviewees (it would take months). I would also be unclear how to treat the transcripts of the people who have sadly passed away since the interview was conducted. Overall, I guess I see it as morally dubious to break confidentiality agreements on the grounds of transparency.

    A second ethical tension with transparency is highlighted by Philip Morris’ attempts to access the University of Stirling’s data relating to young people’s attitudes to smoking and packaging. If the young people (and/or their parents) consented to participating in this research on the basis that it was being undertaken as public health research, is it ethically justifiable to subsequently share this data with a major tobacco manufacturer (where the data could potentially inform marketing efforts targeted at young people)? I know that if I’d provided information to university based researchers with a track record in public health, I wouldn’t want them to then share this with vested commercial interests.

    The above two points highlight the potentially significant tensions between research ethics and transparency. It is possible that by approaching future research projects differently from the start (e.g. making it clear to participants that interview data will be made publicly available and what the consequences of this might be), that these ethical tensions will be reduced. Yet, there is also a chance this will limit participation in various ways.

    Finally, whilst I agree with the ethos underlying your point that public funding seems to aid the case for public access to data, do we really want a scenario in which all publicly funded research is available for anyone to re-analyse, whilst privately funded data are not? I’m particularly cautious here because major tobacco companies have lobbied for precisely such a scenario: it means they can re-analyse any publicly funded data relating to claims about the harms of their products, whilst preventing access to data relating to research they have funded themselves (see & Perhaps unsurprisingly, businesses are one of the biggest users of FoI requests in the US ( This is a massive power imbalance – do we really want private interests, including those creating harmful products, to be able to re-analyse (and potentially ‘torture’) publicly funded data, whilst publicly funded researchers are unable to do the same with privately funded data? If we’re going to have a commitment to transparency in research that really works to benefit public health and social well-being, then I think it needs to be balanced. Perhaps, for example, journals should consider requesting authors to make the data sets on which they are drawing in articles available, as part of the publication process. This would then apply to equally to privately funded and publicly funded data (although this would still be difficult to monitor and it would only relate to research published in those journals).

    This is too long as a response, I realise! Though if you’re interested, I’ve written about these tensions in more detail in relation to the Philip Morris case on OpenDemocracy ( I don’t think there are any easy answers to these issues: I’m definitely not against transparency in research but I do think there needs to be more of a discussion about managing some of the tensions involved. So I’m looking forward to your next instalment…

    Best wishes,

    • This is exactly the reason I love blogging – you write something, then get an incredibly interesting response that helps you develop your thinking… I’ll think about this for next week’s follow-up post anyway Kat!

    • Thanks much for bringing up questions about qualitative (I am largely thinking interviews & ethnographic) data — it’d be great to see something on that in the future!

  2. As a note to myself and anyone else who stumbles onto this page in future – a great post on reproduceability by Simply Statistics. They link to a couple of useful resources:

    1. The Reproduceability Project – a collaborative effort to replicate key findings from 2008 papers in Journal of Personality and Social Psychology, Psychological Science, and Journal of Experimental Psychology: Learning, Memory, and Cognition.

    2. – a place where you can post results from replications of existing research, and as discussed in the journal Science.

    There’s lots of enthusiasts for replicability and reproduceability other than me – let’s see if it becomes a mainstream movement…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: