I feel the need to bring up an issue in neuroimaging research that has affected me directly, and I fear may apply to others as well.
While in the process of analyzing a large fMRI (functional magnetic resonance imaging) data-set, I made an error when setting up the contrasts. This was the first large independent imaging analysis I had attempted, and I was still learning my way around the software, programming language, and standard imaging parameters. My mistake was not a large one (I switched a 1 and -1 when entering the contrasts), however it resulted in an entirely different, but most importantly, still plausible output, and no one noticed any problems in my results.
Thankfully, the mistake was identified before the work was published, and we have since corrected and checked the analysis (numerous times!) to ensure no other errors were committed. However, it was an alarming experience for a graduate student like myself, just embarking on an exploration of the brain – an incredibly powerful machine that we barely understand, with revolutionary high-powered technology that I barely understand – that such a mistake could be so easily made and the resulting data so thoroughly justified. The areas identified in the analysis were all correct, there was nothing outlandish or even particularly unexpected in my results. But they were wrong.
Functional MRI is a game of location and magnitude. The anatomical analysis – looking for blobs in the brain that light up where we think they should – can be confirmed with pre-clinical animal models, as well as neuropsychology research in patients who have suffered localized brain damage and related loss of function. Areas involved in motor control and memory have been identified in such a manner, and these findings have been validated through imaging studies identifying activation in these same regions during performance of relevant tasks.
The question then remains as to the direction of this activation. Do individuals “over activate” or “under activate” this region? Are patients hyper- or hypo-responding compared to controls? FMRI studies typically compare activation during the target task with a baseline state to assess this directionality. Ideally, you should subtract neural activity levels during a similar but simpler process from the activation that occurs during your target cognitive function, and presumably the resulting difference in activity is the neurocognitive demand of the task.
An increase in activation compared to the baseline state, or compared to another group of participants (i.e., patients vs. controls) is interpreted as greater effort being exerted. This is typically seen as a good thing on cognitive tasks, indicating that the individual is working hard and activating the relevant regions to remember the word or exert self-control. However, if you become expert at these processes you typically exhibit a relative decrease in activation, as the task becomes less demanding and requires less cognitive effort to perform. Therefore, if you are hypo-active it could be because you are not exerting enough effort and consequently under-performing on the task compared to those with greater activation. Or, conversely, you could be superior to others in performance, responding more efficiently and not requiring superfluous neural activity.
Essentially, directionality can be justified to validate either hypothesis of relative impairment. Patients are over-active compared to controls? They’re trying too hard, over-compensating for aberrant executive functioning or decreased activation elsewhere. Alternatively, if patients display less activity on a task they must be impaired in this region and under-performing accordingly.
Concerns about the over-interpretation of imaging results are nothing new, and Dr. Daniel Bor, along with a legion of other researchers in the neuroscience community, have tackled this issue far more eloquently and expertly than myself. My own experience, though, has taught me that we need greater accountability for the claims made from imaging studies. Even with an initially incorrect finding that resulted from a technical error, I was able to make a reasonable rationale for our results that was accepted as a plausible finding. FMRI is an invaluable and powerful tool that has opened up the brain like never before. However, there are a lot of mistakes that can be made and a lot of justifications of results that are over-stretched, making claims that can not be validated from the data. And this is assuming there are no errors in the analysis or original research design parameters!
I am particularly concerned about the existence of other papers where students and researchers have made similar mistakes to my own, but where the results seem plausible and so are accepted, despite the fact that they are incorrect. I would argue that learning by doing is the best way to truly master a technique, and I can guarantee that I will never make this same mistake again, but there does need to be better oversight, whether internally or externally, during the reporting of methods sections, as well as in the claims made while rationalizing results. Our window into the brain is a limited one, and subtle differences in task parameters, subject eligibility, and researcher bias can greatly influence study results, particularly when using tools sensitive to human error. Providing greater detail in online supplements on the exact methods, parameters, settings, and button presses used to generate an analysis could be one way to ensure greater accountability. Going one step further, opening up data-sets to a public forum after a certain grace period has passed, similar to practices in physics and mathematics disciplines, could engender greater oversight to these processes.
As for the directionality issue, the need to create a “story” with scientific data is a compelling, and I believe very important, aspect of reporting and explaining results. However, I think more of the fMRI literature needs to be based on actual behavioral impairment, rather than just differences in neural activity. Instead of basing papers around aberrant differences in activation, which may be due to statistical (or researcher) error, and developing rationalizing hypotheses to fit these data, analyses and discussions should be centered on differences in behavior and clinical evidence. For example, the search for biomarkers (biological differences in groups at risk for a disorder, often present before they display symptoms) is an important one that could help shed light on pre-clinical pathology. However, you will almost always find subtle differences between groups if you are looking for them, even when there is no overt dysfunction, and so these searches need to be directed by known impairments in the target patient groups. A similar issue has been raised in the medical literature, with high-tech scans revealing abnormalities in the body that do not cause any tangible impairments, but the treatment of which cause more harm than good. Instead of searching for differences in activation levels in the brain, we should be led by dysfunction that results from these changes. Just as psychiatric diagnoses from the DSM-IV are supposed to be directed by symptoms relating to pathology only if they cause significant harm or distress in the individual, speculations made about the results of imaging studies should be influenced by associated impairments in behavior and function, rather than red or blue blobs on the brain.
(Thanks to Dr. Jon Simons for his advice on this post.)
Great stuff Dana! Can\’t believe all the kerfuffle caused by such a seemingly simple thing. As for \”opening up data-sets to a public forum after a certain grace period has passed, similar to practices in physics and mathematics disciplines\”… I know someone who will be a big fan of this comment 😉
Thanks Alison! And you’re right, I was inspired by RJ’s ideals for scientific integrity in that last bit!
Yeah, it’s pretty scary. I bet there is plenty of published papers with hidden problems in the data. Too many inconsistencies in the literature. A friend of mine, Professor of Psychology, used to make this joke: “What’s the difference between theory and data? Nobody believes your theory except you, but everybody believes your data except you.” It’s funny, and I used to think it’s also true. I don’t think that anymore. I know what’s going on in my lab, but I have no idea what’s going on in other labs. We have seen phenomena replicated over and over again in different experiments. Some labs do not report those things. How is that possible? It makes you incredulous, which is a serious problem, because if you don’t believe what you read, what’s the point of reading the literature. But the literature is what makes communication in science possible. So, it’s a serious problem.
I agree, you have to hope that new research builds on prior results, confirming previously established findings and giving newer discoveries some validity in regards to methods, etc. I suppose in part we all just need to be more rigorous with supervisions and the review process, first and foremost within our own labs, as well as when peer reviewing other researchers’ work. Unfortunately, you can never be entirely sure when reading the literature, which is where the replication of studies comes in and is most important. The “replication renaissance” psychology is currently going through, spear-headed by the Reproducibility Project (http://openscienceframework.org/project/EZcUj/wiki/home), is a great first step in this direction, and hopefully will be recognized and supported by funding bodies and journals.
(There’s a good article summarizing the project here: http://chronicle.com/blogs/percolator/is-psychology-about-to-come-undone/29045?sid=at&utm_source=at&utm_medium=en)
Good point. I do know the project and article (I even tweeted about it!) The bias to publish only positive results and new findings is certainly contributing to the messy state of brain imaging. The other problem in brain imaging is the recent trend for data driven, exploratory projects with no clear a priori hypothesis. I don’t mind exploratory projects, but they are clearly more prone to bogus results. I wonder how many of the resting state findings are due to motion differences between groups, as clearly hinted by the recent paper on 1000 scans.
The trend toward network science doesn’t make it any better. While I think network science is pretty cool, we don’t really know what those metrics mean, which makes studies even more prone to results determine by some mistake in the pipeline or undetected ‘artifacts’.
Great post Dana – and I agree, fMRI can be a tricky and difficult mistress at times! I once saw a fairly high-profile professor who-shall-remain-nameless give a talk which included a long justification about why they’d got an activation in pre-frontal cortex, with a corresponding de-activation in the occipital lobe for a particular task. Unfortunately it was completely clear to a lot of people in the audience by the shape and position of the blobs that what was being discussed so earnestly was an obvious front-to-back head-motion artefact. Oops.
I tend to perform basic ‘sanity-checks’ of my data fairly religiously – things like checking that I get sensible activations in visual cortex to visual stimuli, motor-cortex blobs with button presses, etc. This often helps reveal if there’s something wrong with the design/analysis, but not always. I’ve also made many, many mistakes with analyses – hopefully I caught most of them before writing them up, but you never know… I tend to think that if an effect is genuine, then it should show up reasonably clearly no matter what tweaks are made to the analysis parameters – I’m always a bit suspicious of effects which appear suddenly when I change something minor.
Thanks Matt! That’s a great point, you would hope that the biggest and most ‘authentic’ findings will be pretty robust and resistant to slight modifications of task or analysis parameters.
I knew another PhD student (who shall also remain nameless) who saw hippocampal activation during a decision-making task and got excited he’d discovered a new locale for affective decisions. Turns out his task was so complicated the participants were just struggling to remember it! Fortunately this was pointed out to him before he went too far with the idea, but just goes to show you need to be suspicious of those random blobs throughout the brain. I think clinical/pre-clinical based a priori hypotheses, and checks like you mentioned, are the way to go to avoid fishing expeditions for activation.
Ah yes, well… good task design is a whole other minefield. No matter how high-tech the approach, if your design is bogus, you’re going nowhere. We’ll always need cognitive psychologists – well, that’s what cognitive psychologists like me like to believe anyway.
I’m generally a fan of using basic localiser tasks and functional ROIs for my experiments – that does help to constrain the ‘random blob’ issue. That’s a whole other discussion though…
Pingback: The Scienceblogging Weekly (July 13th, 2012) | World Bulletins
Pingback: The Scienceblogging Weekly (July 13th, 2012) | Social Media Blog Sites
Pingback: The Scienceblogging Weekly (July 13th, 2012) | News Bulletins
Pingback: The Scienceblogging Weekly (July 13th, 2012) | the latest technology
Pingback: The Scienceblogging Weekly (July 13th, 2012) Balochistan Online
Pingback: The Scienceblogging Weekly (July 13th, 2012) | News in 2012
Pingback: The Scienceblogging Weekly (July 13th, 2012) - News of 2012 | News of 2012
Pingback: The Scienceblogging Weekly (July 13th, 2012) | Stock Market News - Business & Tech News
Pingback: #Quackcocaine! | SIGN WITH AN E
Pingback: Resisting temptation in the brain | Brain Study