I saw the (negative) sign: Problems with fMRI research

I feel the need to bring up an issue in neuroimaging research that has affected me directly, and I fear may apply to others as well.

While in the process of analyzing a large fMRI (functional magnetic resonance imaging) data-set, I made an error when setting up the contrasts. This was the first large independent imaging analysis I had attempted, and I was still learning my way around the software, programming language, and standard imaging parameters. My mistake was not a large one (I switched a 1 and -1 when entering the contrasts), however it resulted in an entirely different, but most importantly, still plausible output, and no one noticed any problems in my results.

Thankfully, the mistake was identified before the work was published, and we have since corrected and checked the analysis (numerous times!) to ensure no other errors were committed. However, it was an alarming experience for a graduate student like myself, just embarking on an exploration of the brain – an incredibly powerful machine that we barely understand, with revolutionary high-powered technology that I barely understand – that such a mistake could be so easily made and the resulting data so thoroughly justified. The areas identified in the analysis were all correct, there was nothing outlandish or even particularly unexpected in my results. But they were wrong.

Functional MRI is a game of location and magnitude. The anatomical analysis – looking for blobs in the brain that light up where we think they should – can be confirmed with pre-clinical animal models, as well as neuropsychology research in patients who have suffered localized brain damage and related loss of function. Areas involved in motor control and memory have been identified in such a manner, and these findings have been validated through imaging studies identifying activation in these same regions during performance of relevant tasks.

The question then remains as to the direction of this activation. Do individuals “over activate” or “under activate” this region? Are patients hyper- or hypo-responding compared to controls? FMRI studies typically compare activation during the target task with a baseline state to assess this directionality. Ideally, you should subtract neural activity levels during a similar but simpler process from the activation that occurs during your target cognitive function, and presumably the resulting difference in activity is the neurocognitive demand of the task.

An increase in activation compared to the baseline state, or compared to another group of participants (i.e., patients vs. controls) is interpreted as greater effort being exerted. This is typically seen as a good thing on cognitive tasks, indicating that the individual is working hard and activating the relevant regions to remember the word or exert self-control. However, if you become expert at these processes you typically exhibit a relative decrease in activation, as the task becomes less demanding and requires less cognitive effort to perform. Therefore, if you are hypo-active it could be because you are not exerting enough effort and consequently under-performing on the task compared to those with greater activation. Or, conversely, you could be superior to others in performance, responding more efficiently and not requiring superfluous neural activity.

Essentially, directionality can be justified to validate either hypothesis of relative impairment. Patients are over-active compared to controls? They’re trying too hard, over-compensating for aberrant executive functioning or decreased activation elsewhere. Alternatively, if patients display less activity on a task they must be impaired in this region and under-performing accordingly.

Concerns about the over-interpretation of imaging results are nothing new, and Dr. Daniel Bor, along with a legion of other researchers in the neuroscience community, have tackled this issue far more eloquently and expertly than myself. My own experience, though, has taught me that we need greater accountability for the claims made from imaging studies. Even with an initially incorrect finding that resulted from a technical error, I was able to make a reasonable rationale for our results that was accepted as a plausible finding. FMRI is an invaluable and powerful tool that has opened up the brain like never before. However, there are a lot of mistakes that can be made and a lot of justifications of results that are over-stretched, making claims that can not be validated from the data. And this is assuming there are no errors in the analysis or original research design parameters!

I am particularly concerned about the existence of other papers where students and researchers have made similar mistakes to my own, but where the results seem plausible and so are accepted, despite the fact that they are incorrect. I would argue that learning by doing is the best way to truly master a technique, and I can guarantee that I will never make this same mistake again, but there does need to be better oversight, whether internally or externally, during the reporting of methods sections, as well as in the claims made while rationalizing results. Our window into the brain is a limited one, and subtle differences in task parameters, subject eligibility, and researcher bias can greatly influence study results, particularly when using tools sensitive to human error. Providing greater detail in online supplements on the exact methods, parameters, settings, and button presses used to generate an analysis could be one way to ensure greater accountability. Going one step further, opening up data-sets to a public forum after a certain grace period has passed, similar to practices in physics and mathematics disciplines, could engender greater oversight to these processes.

As for the directionality issue, the need to create a “story” with scientific data is a compelling, and I believe very important, aspect of reporting and explaining results. However, I think more of the fMRI literature needs to be based on actual behavioral impairment, rather than just differences in neural activity. Instead of basing papers around aberrant differences in activation, which may be due to statistical (or researcher) error, and developing rationalizing hypotheses to fit these data, analyses and discussions should be centered on differences in behavior and clinical evidence. For example, the search for biomarkers (biological differences in groups at risk for a disorder, often present before they display symptoms) is an important one that could help shed light on pre-clinical pathology. However, you will almost always find subtle differences between groups if you are looking for them, even when there is no overt dysfunction, and so these searches need to be directed by known impairments in the target patient groups. A similar issue has been raised in the medical literature, with high-tech scans revealing abnormalities in the body that do not cause any tangible impairments, but the treatment of which cause more harm than good. Instead of searching for differences in activation levels in the brain, we should be led by dysfunction that results from these changes. Just as psychiatric diagnoses from the DSM-IV are supposed to be directed by symptoms relating to pathology only if they cause significant harm or distress in the individual, speculations made about the results of imaging studies should be influenced by associated impairments in behavior and function, rather than red or blue blobs on the brain.

(Thanks to Dr. Jon Simons for his advice on this post.)