Brain Complexity Predicts TMS Response Inconsistently, Limiting Clinical Personalization

The brain's reaction to stimulation was inconsistent, even though the brain's baseline state was not.
Researchers found that individual responses to brain stimulation varied unpredictably between sessions, despite stable baseline brain activity.

For years, the promise of transcranial magnetic stimulation has rested on an unanswered question: why does it help some patients and leave others unchanged? A team of researchers turned to machine learning to find the answer in the brain's own electrical rhythms — and discovered that the question itself may be harder than the tools we have to ask it. What looked like a solvable prediction problem turned out to be a window into something more unsettling: the brain's response to stimulation is not a stable fact waiting to be measured, but a shifting relationship that resists capture.

  • Machine learning models trained on brainwave complexity achieved 75–84% accuracy in predicting stimulation response — results promising enough to fuel real clinical optimism.
  • When those same models were tested on an independent group under nearly identical conditions, accuracy collapsed to near-chance levels, exposing a fundamental gap between internal validation and real-world generalizability.
  • The culprit is not flawed algorithms but unstable biology: individual brains responded differently to stimulation across sessions even when their baseline electrical activity remained consistent — a phenomenon researchers call concept drift.
  • This instability suggests that a single stimulation session may be too brief and too variable to anchor any reliable biomarker, undermining the premise of personalized prediction from baseline brain scans alone.
  • The field is now being pushed toward a harder path: multi-session, individualized protocols that adapt to how each brain actually responds over time, rather than how it appears at rest.

Transcranial magnetic stimulation has earned FDA approval for depression, OCD, and smoking cessation, and researchers are testing it across a widening range of conditions. Yet the treatment carries a persistent frustration — some patients improve, others do not, and no one can reliably predict which outcome awaits before treatment begins.

A research team attempted to close that gap using machine learning. They trained models on EEG recordings taken before stimulation, searching for patterns in brain complexity — the intricate electrical rhythms of a resting mind — that might forecast who would respond. Within their original dataset, the approach looked compelling, reaching 75 to 84 percent accuracy. Then they tested the models on an entirely different group of participants, collected under nearly identical conditions. Accuracy fell to 48 to 54 percent — barely better than a coin flip.

To understand why, the researchers examined the stability of the baseline brain features themselves. Those held steady: a person's resting brainwave patterns looked essentially the same weeks apart. The proportion of people who responded to stimulation also remained consistent across groups. But when the researchers asked whether the same individuals responded the same way on two separate visits, the picture broke apart. Only about half did. The brain's baseline state was reliable; its reaction to stimulation was not.

This is what researchers call concept drift — the relationship between a predictor and an outcome that shifts in ways the model cannot follow. The baseline EEG features were capturing something real, but not the mechanism that actually determines whether stimulation takes hold.

The clinical implications are difficult to absorb. A single stimulation session may simply be too transient to produce effects stable enough to predict or measure reliably. The researchers suggest that meaningful personalization will require not just better biomarkers but a different kind of protocol altogether — multiple sessions, parameters adjusted in response to how each brain actually behaves, rather than how it looks before the first pulse is delivered.

Transcranial magnetic stimulation has become one of neuroscience's most promising tools. The FDA has approved it for depression, obsessive-compulsive disorder, and smoking cessation. Researchers are testing it for anxiety, bipolar disorder, and stroke recovery. Yet the treatment remains stubbornly unreliable. Some patients improve. Others don't. The variability is so pronounced that clinicians cannot predict in advance who will benefit.

A team of researchers set out to solve this problem using machine learning. If they could identify which brain features predicted a good response to stimulation, they reasoned, they could personalize treatment before patients spent weeks in the clinic. They built models trained on EEG recordings—measurements of electrical activity across the scalp—taken before stimulation began. The models looked for patterns in brain complexity, the intricate rhythms and oscillations that characterize a resting brain. When they tested these models on the same group of people, the results looked promising. Accuracy reached 75 to 84 percent. The models seemed to work.

Then the researchers did something crucial: they tested the models on a completely different group of people, collected under nearly identical conditions. The performance collapsed. Accuracy dropped to near chance levels—48 to 54 percent. A model that could predict responses within one group could not predict them in another. This was not a failure of the machine learning itself. It was a failure of the underlying biology to cooperate.

The researchers dug deeper to understand why. They checked whether the baseline brain features—the EEG measurements taken before stimulation—had changed between the initial session and a retest session weeks later. They hadn't. The brain's resting electrical activity remained stable. They checked whether the proportion of people who responded to stimulation had shifted. It hadn't. Yet when they looked at whether individual people showed the same response on both visits, the picture fractured. Only about half of participants responded the same way twice. The brain's reaction to stimulation was inconsistent, even though the brain's baseline state was not.

This inconsistency—what researchers call concept drift—appears to be the core problem. The relationship between what the brain looks like at rest and how it responds to stimulation is not fixed. It changes from person to person and from session to session in ways that current science cannot predict or control. The baseline features that seemed so promising in the training data were not actually capturing the mechanisms that determine whether stimulation will work. They were capturing something else, something that did not generalize.

The implications are sobering for clinical practice. A single session of stimulation may simply be too brief, too transient, to produce reliable neurophysiological effects. The brain's response to one pulse or one burst of pulses appears to be highly variable—influenced by factors that remain unmeasured and perhaps unmeasurable in the moment. Clinical treatments, by contrast, typically involve multiple sessions spread over weeks, allowing for cumulative effects and sustained changes in neurotransmitter systems and gene expression. The researchers suggest that personalized approaches will require not just better biomarkers but fundamentally different protocols: multiple sessions tailored to each individual's unique neurophysiology, with parameters adjusted based on how the brain actually responds rather than how it looks at baseline.

Single session of rTMS isn't effective and multiple sessions with personalized rTMS parameters are needed to show reliable neurophysiological effects
— Study authors, concluding remarks
La Conversación del Hearth Otra perspectiva de la historia
Inventor

So the models worked beautifully on the training data but fell apart on new people. What does that tell us?

Model

It tells us the models weren't actually learning the biology. They were learning noise, or at least patterns that don't repeat. The baseline brain features were stable—we confirmed that statistically. But whether a person's brain responded to stimulation was all over the place.

Inventor

Even in the same person, tested twice?

Model

Exactly. Only about half the time did someone respond the same way on a retest. That's the real problem. The brain's resting state isn't what determines the response. Something else is, and we're not measuring it.

Inventor

Could it be that the stimulation itself is just inherently noisy? That there's no signal to find?

Model

That's one possibility. But it might also be that a single session is too weak to produce reliable effects. Clinical treatments use dozens of sessions over weeks. We're looking at one pulse, one burst. The plasticity might be too fleeting to predict.

Inventor

So you're saying the biology is real, but it's hidden?

Model

Or it requires time to stabilize. Multiple sessions might create cumulative changes—in neurotransmitters, gene expression, network organization—that are more robust and more predictable than what happens in the first hour after stimulation.

Inventor

What would it take to actually solve this?

Model

Probably personalized protocols. Not one-size-fits-all stimulation, but parameters adjusted based on how each brain actually responds, measured across multiple sessions. And better outcome measures—we might be looking at the wrong neurophysiological markers entirely.

Contáctanos FAQ