The Tilted Mirror, the Invisible Reader
What George J. Borjas and Nate Breznau’s reanalysis reveals about the invisible path that turns data into evidence
Imagine the data as a statue in the center of a public square: the same object on the same pedestal, day after day. Now imagine dozens of teams receiving the same assignment: to photograph it in order to answer the same question. The statue, of course, doesn’t change. The photograph does: lens, framing, exposure, filter; above all, what the image admits and what it leaves outside the frame.
In empirical science, these choices are referred to by other names, such as variables, samples, specifications, and models, but they perform the same function. They decide how the world becomes legible as evidence.
“Ideological bias in the production of research findings,” a paper published in Science Advances, argues that in politically charged subjects, preferences (what researchers, in more technical language, call priors and expectations) can latch onto decisions that, from the outside, look like mere matters of craft: how a concept is operationalized, which cases make it into the sample, which controls are deemed necessary, and which family of models comes to feel “appropriate.”
And through that path, different empirical results can emerge from the same dataset, without fraud, without invented numbers, without anyone stepping outside what the discipline itself would call a defensible method.
Friction is not necessarily a matter of scientific integrity. It is a matter of process: the sequence of legitimate choices that turns a dataset into “evidence” and sometimes turns evidence into a mirror.
The study reanalyzes a collaborative experiment known as BRW, in which independent teams were given the same dataset and the same question. Its conclusion is unsettling because it shifts attention away from the result and onto the path.
Ideology doesn’t appear only as an interpretive gloss at the end of a paper; it can enter earlier, in the engineering of analysis in what is measured and named a variable, in what is controlled (and why), in what is excluded as “noise,” in what comes to count as the right model.
It is there, in that chain of defensible choices, that a common base begins to yield divergent findings.
So what does it mean to “produce evidence” in a world saturated with biases, algorithms, beliefs, and narrative disputes that compete for attention as if oxygen were scarce?
If every study entails choices, the urgent question may not be whether bias exists, but how it organizes itself; what methodological, institutional, and rhetorical filters make some results seem more robust, more publishable, more citable, and therefore more real than others. In other words: not only what the statue “shows,” but which photographs of it we learn to call evidence.
Before the First Regression
In photography, the difference between two images often says less about the statue than about the photographer’s hand and eye on the camera. One lens pulls you closer and flattens the background; another widens the field and bends the edges. One frame focuses on a detail of the protagonist; another allows the context to dissolve into the periphery. The rest falls into shadow. None of this implies deceit; it implies choice. And choices, even when defensible, have consequences.
In science, the gesture is similar. A dataset, like the statue, doesn’t offer a single angle; it opens a constellation of defensible paths: how to measure a concept, which units make it into the sample, which controls count as pertinent, which family of models comes to feel most appropriate.
What Borjas and Breznau do is make that path visible under rare, quasi-laboratory conditions: a large collaborative experiment in which independent teams answer the same question using the same data.
In the reanalysis published on January 1, 2026, George J. Borjas and Nate Breznau return to BRW, which placed 71 teams (158 researchers) before the same dataset and the same question: Does immigration affect public support for social programs and the welfare state?
The teams worked independently, without coordinating with one another. After reproducing an earlier result, they were invited to extend the analysis “in whatever way they thought best,” with broad latitude to define variables, samples, operationalizations, and models.
The raw material was the same: five waves of the ISSP (International Social Survey Programme), from 1985 to 2016, with updated measures of immigration. But the path to evidence was left open like a square where the statue is singular, and the viewpoints are, inevitably, plural.
Before the first regression, before the spreadsheet could even begin to “answer,” participants were asked to state their starting point: in the country where they lived, should immigration laws be tightened or relaxed?
The metascientific question then ceases to be just a curiosity and becomes a mechanism: do these prior positions align with the results each team produces? And if they do, through which intermediate choices, through which samples, controls, and operationalizations does that alignment become evidence?
The study suggests that this association doesn’t appear as “opinion” pasted onto the footnote of interpretation. It appears to be a regularity in the architecture of design: teams with different priors tend to adopt different combinations of methodological decisions, each defensible in isolation, that, taken together, bias the estimate in one direction or another.
It isn’t that “the data change.” It’s that the path deemed acceptable for reading them changes, and with it, the kind of result that reaches the reader.
A second metaphor clarifies why this matters. Think of map projections: the planet is the same, but each projection (Mercator, Gall–Peters, others) redistributes it, distorting areas, compressing shapes, deciding what looks “large” and what looks “peripheral.”
If someone insists there is “no distortion” because they statistically controlled for the projection, they may be making precisely the error Borjas and Breznau identify in the BRW debate: treating as mere noise what is, in fact, part of the mechanism.
Specification choices are not an external “adjustment” to the result. They are endogenous to the process. They compose the path through which evidence is produced.
This discussion doesn’t call for cynicism; it calls for light. When the path is hidden, a single “click”, a modeling choice, a sampling decision, a package of controls, can acquire the aura of universal evidence.
When the path is shown, the alternatives considered, specification curves, and reanalyses by different teams are presented, science moves closer to what it promises to be: not a definitive portrait of the world, but a public method for turning disagreement into measurement and reducing uncertainty step by step.
When Seventy-One Teams Answer the Same Question —and Disagree
BRW was built to expose that distance: what happens when many competent hands take the same data and follow it down different paths. In BRW, the “statue in the square” is almost literal: the same dataset, handed to dozens of teams, with the same analytical brief.
The experiment was designed to shed light on what the published paper typically leaves in the shadows: the distance between data and conclusion, the zone where routine choices begin to determine what will be called a result.
In science, that zone goes by technical names—operationalization, sampling, specification, and model—but its function is old and human: to turn the world into evidence.
This is the material George J. Borjas and Nate Breznau return to in their January 2026 paper in Science Advances. The question—thorny, and unmistakably contemporary—is easy to phrase and hard to face: can researchers’ preferences and priors be associated with the kind of estimate that reaches the reader even when there is no fraud, no gross error, no visible hand bending the number?
To observe that mechanism from the inside, they rely on a rare opportunity: BRW, in which 71 teams (158 researchers) received the same data and the same hypothesis to test whether immigration reduces public support for the policies that constitute the welfare state.
The teams worked independently, with wide latitude to decide how the world would be represented in the model: how to measure concepts, which countries and survey waves to include, which controls to adopt, and what statistical structure to use. In short, how to choose the “design of the photo.”
Before any modeling, participants recorded their stance on immigration policy in their country of residence: whether immigration laws should be tightened or relaxed.
At first glance, it appears to be a side variable, almost a demographic detail. In Borjas and Breznau’s design, it becomes a key for mapping a pattern more unsettling than the psychology of any single team: not what someone “wanted” to prove, but how prior beliefs can line up with final specification choices. The point at which the technical path tilts the estimate before it becomes a public conclusion.
The result is an analytical multiverse. As the teams extended the analysis, they estimated 1,253 regression models, which BRW translated into a common metric: the AME (Average Marginal Effect), the change in the probability of supporting social policies associated with a one–percentage-point increase in the share of immigrants.
Instead of a single number, a distribution emerged. The AMEs cluster around zero, but they don’t end there; the spread includes numerically large and statistically significant values.
At the extremes, the paper itself gives a sense of scale: the 10th percentile is −0.071 and the 90th is 0.052—which, in the language of public debate, means that a shift from 10% to 11% in the share of immigrants could be associated with something like a seven-point drop or a five-point rise in support for social programs, depending on the analytical path chosen.
But the study’s most provocative point isn’t merely that the estimates vary. It’s the shape of that variation, the pattern that emerges when you look at the whole constellation rather than a single, isolated star.
Where the Results Tilt
Variation alone isn’t the story. The question is whether the variation has a signature: whether it aligns, in a repeatable way, with something the researchers brought with them before they opened the spreadsheet.
The central finding is straightforward: researchers’ ideological positions were systematically associated with the direction of the estimates. Teams composed of more pro-immigration participants tended to report more positive effects of immigration on support for social programs, whereas anti-immigration teams reported more negative effects.
What makes this association intellectually unsettling and journalistically combustible is its manner of occurrence. The authors argue that the difference arises because teams adopt different specifications and follow different paths within what the discipline still recognizes as a defensible method.
The most important and intellectually honest note is negative: the paper does not need to posit bad faith to explain the phenomenon. What it describes is a subtler mechanism, and therefore a harder one to police. Ideology does not appear only in the final paragraph, when a coefficient is “interpreted”; it can slip in earlier, in the design of the analysis itself: how immigration is measured, which countries and survey waves are included, and what statistical structure is adopted.
“In sum: research design is endogenous,” the authors write: design is part of the process, not external noise. It is through that channel, through choices that look merely technical, that prior beliefs can align with the kind of result that reaches the reader.
There is a decisive limit, one that the authors themselves emphasize: the experiment captures the final specification, not the full path that led to it.
The data do not allow us to observe how many alternatives each team tested, when it abandoned one path for another, or why; the garden of forking paths of empirical work, where hypotheses branch and some versions of the world die quietly. Nor can we determine whether researchers, consciously or unconsciously, gravitated toward models more compatible with their preferred conclusions, a question the paper deems crucial and one the BRW design cannot address.
What they can show, given the available data, is an association: final specification decisions correlate with ideological priors measured before any analysis, in the first wave of the questionnaire. That supports the thesis of bias-by-path, but it does not license strong causal inference (ideology was not randomized as a treatment).
To make this path less abstract, Borjas and Breznau perform a surgical cut. From a universe of 103 specification decisions recorded in BRW, they single out five crossroads. Choices that, in combination, explain much of how the same dataset can generate estimates that contradict one another without anyone “forcing” anything.
1. How to scale the outcome: Do responses about different government responsibilities (health care, housing, unemployment, etc.) become a composite index (by averaging or factor analysis), or remain as separate items?
2. How to measure immigration: as stock (% foreign-born/foreign nationals) or as flow (net migration)?
3. What statistical structure to use: whether or not to use multilevel modeling to capture variation in country–year units.
4. Which countries to include: all available countries in the dataset, or a subset.
5. Which survey waves to include: whether to include the 2016 wave—beyond the waves anchoring the original analysis—and thereby shift the historical window being observed.
In combination, these five decisions generate 58 “non-empty” alternative specifications. An atlas of possible paths to the same question.
The point is not that any one choice, in isolation, “determines” the result. The authors insist on the plural: what matters are combinations, second-order interactions, and paths that make sense only once decisions lock into one another.
When Borjas and Breznau compress this space into a comparable set of recurring specifications, 58 “non-empty” paths are ranked by the expected AME (the mean of the AMEs within each specification), and the pattern appears. Anti-immigration teams were the only ones to adopt the combinations that produce the smallest expected effects; pro-immigration teams, the only ones to use those that produce the largest.
And this five-decision package, taken together, explains a substantive share of the distance between the extremes: about 68% of the average gap between pro- and anti-immigration teams in the experiment, according to their decomposition between “observed” AME and “expected” AME.
The Numbers That Travel
Not every estimate has the same afterlife. In public debate, the center of the distribution rarely gets the microphone; the edges do. Estimates cluster near zero. But they don’t end there. Borjas and Breznau describe a landscape in which the histogram clings to the null and still yields numerically large, statistically significant results at the extremes.
In percentile terms, the contrast is almost pedagogical: the 10th percentile is −0.071, and the 90th is 0.052. In the language of public debate, this suggests that a shift from 10% to 11% in the share of immigrants could be associated with a seven-point drop or a five-point rise in support for social programs, depending on the analytical path taken.
This is where the tails matter. Not because they represent the majority, but because they are legible, publishable, politically reusable. Two antagonistic readings can be drawn from the same dataset, and both can sit, with some comfort, inside the universe of plausible choices the experiment makes visible.
The invitation, here, is to treat the tails as a social phenomenon, not merely a statistical one. Extreme results travel better: they become headlines, slogans, policy “evidence” with a speed the null effect, shy, conditional, thick with footnotes, rarely achieves.
The paper does not measure headlines. But it measures something that feeds them: the production of extremes is not randomly distributed across teams. In models that ask who ends up in the tails with large, significant effects, anti-immigration teams are less likely to appear in the positive tail, while pro-immigration teams are less likely to appear in the negative tail; the odds of landing in a tail aligned with one’s orientation differ substantively across groups.
Hence, a sentence that should unsettle any hurried reader: given that all teams began by replicating the same null reference result, the most ideological teams tend to move away from it, adopting specifications that push the estimate toward one extreme or the other. Toward the kind of number that crosses, more easily, the border between academia and public argument.
The implication is double, and uncomfortable at both ends.
First: even a science conducted through technically defensible choices can inadvertently become a vector of polarization when different analytical paths generate “ammunition” in opposite directions, each stamped with methodological plausibility.
Second, the study underscores the need for infrastructures of transparency that can reveal the full distribution, not only the most photogenic spikes, the easiest results to circulate, so that public debate does not mistake the tail for the rule.
And if this is what happens under quasi-laboratory conditions, it’s worth asking what the same mechanism does in the open air, where incentives, institutions, and moral stakes are louder.
A Stress Test Outside the Lab
From here, the piece shifts terrain. We are no longer within Borjas and Breznau’s study, nor within its immediate object: immigration and the welfare state.
What follows is an editorial extrapolation: applying the mechanism they describe—the idea that legitimate design choices, compounded by institutional filters, can steer what circulates as “evidence”—to an analogous, highly politicized domain. This is not a substitute for reporting. Any specific claim here would require its own legwork: independent sources and point-by-point verification.
If metascience insists that data do not speak for themselves, what remains is a political question in the broad sense: who is authorized to speak for them, and under what rules does that speech become valid?
Tobacco control, science, public health, and industry meet on ground where the dispute is rarely limited to toxicology or epidemiology. It also intersects with moral grammars and political and economic ends: abstinence as the sole regulatory ideal, or harm reduction as a pragmatic strategy for those who cannot—or do not wish to—quit entirely.
In this environment, “method” and “moral” often intersect. Not because one side invents science, but because science is always filtered through the rules of passage.
Evidentiary standards, publication norms, outcome priorities, and the language of guidelines themselves can operate as gates: they determine which questions enter the agenda, which comparisons seem acceptable, which risks become tolerable, and which are treated as politically unassimilable.
The literature on non-combustible nicotine products puts this tension in sharp relief.
On one side, studies suggest that for smokers who move from cigarettes to devices such as vaping, exposure to certain toxic substances tends to be lower than under tobacco combustion, and that for some users these products can function as tools for cessation or substitution.
On the other side, none of this amounts to innocence. These products are not risk-free, and use by non-smokers, especially very young people, is treated by many policymakers as a public-health crisis: a reopened gateway and a normalization of nicotine.
Some countries—the United Kingdom is often cited as the leading example—have taken a relatively more pragmatic stance, incorporating such devices into cessation repertoires, with caveats and a focus on adult smokers.
In some multilateral forums, the tone tends to be more cautious. Alongside calls for draconian regulation, these bodies warn of a risk less chemical than rhetorical: narrative capture, when the language of “harm reduction” becomes an instrument of commercial strategy, with possible side effects such as the initiation of new users and the erosion of controls.
In some regulatory debates, concern has also drifted toward non-combustible products that are less conspicuous than vaping: nicotine pouches—small sachets of nicotine—are frequently raised as a point of worry, particularly because of high concentrations and their perceived appeal among adolescents.
The case of snus—and, by extension, oral nicotine products—functions as a laboratory of controversy. Sweden is often cited for historically low smoking prevalence in certain strata and for enviable indicators linked to the burden of mortality and disease attributed to cigarettes.
Part of the literature reads this picture as, at least in part, an effect of substitution: less combustion, more nicotine in oral forms. At the same time, the ground is contested. Critics point to the risk of simplifying narratives, conflicts of interest, and hasty extrapolations from one national context to others.
And the clinical literature is not a single block: for many researchers, some outcomes, especially cardiovascular and population-level ones, remain surrounded by uncertainty, methodological heterogeneity, and interpretive dispute.
The point here is not to arbitrate a winner. It is to watch the mechanism.
Before it becomes consensus, evidence passes through filters: editorial rules, guideline committees, norms about what counts as a “good question,” reputational barriers, legitimate fears of regulatory capture, real conflicts of interest, and, beneath it all, societal values about what a society decides to tolerate.
In fields like this, the mirror doesn’t merely distort. At times, it legislates: it decides which reflections are allowed to exist.
When Controls Swallow the Mechanism
If this is what the mechanism looks like in the open air, what happens when the debate returns to the lab, where reassurance can come in the form of a regression, and where “controlling for everything” may end up controlling away the very channel at issue?
Borjas and Breznau’s paper doesn’t merely reexamine a rare trove of evidence; it also challenges BRW’s more soothing reading of its own experiment. In the original study, the organizers concluded that “researcher characteristics do not explain outcome variance”; that is, researchers’ characteristics did not account for the variation in results.
Borjas and Breznau argue that this peace is illusory. It may be purchased at the price of a classic statistical move—controlling for almost everything—which, in this setting, risks controlling away the very mechanism producing the divergence.
Their reanalysis identifies a substantive reason for the disagreement.
BRW privileged a kitchen-sink strategy: it regressed each team’s final estimate (the effect of immigration on an indicator of “social cohesion,” operationalized as support for social policies) on ideology, along with a broad vector of controls that included, among other things, descriptors of the team’s own specification (for instance, logit versus OLS; the inclusion of country and year fixed effects). In that expanded setup, the coefficient for ideology was not statistically significant.
Here is the critique that changes the game: if modeling choices are endogenous, if they are part of the mechanism by which beliefs and priors become estimates, then “controlling” for those choices may mean controlling the very channel through which ideology operates.
This is not a technical quibble; it is a causal inversion. That is why, they write, their result conflicts with BRW’s: the reanalysis “takes into account the possibility that variables that indicate design choices are endogenous to the process.” Put differently, when a kitchen-sink regression throws specification descriptors into the bundle, it risks neutralizing what it set out to test because “the variables that indicate aspects of the specification are endogenous and are the mechanism by which ideology influences the estimates.”
Intuitively, a kitchen-sink regression can be useful for describing how much variation is associated with a broad bundle of observable features. But when some of those “controls” are not noise—when they are the mechanism itself—the model can erase the effect it is meant to detect.
In the language of causal inference (offered here as an editorial translation, not as the paper’s own label), this is the classic mistake of controlling for a mediator: you neutralize the channel, and then conclude the channel never existed.
The implication is epistemological.
If bias in the weak sense—inclinations, intuitions, priorities—can be inscribed in the decisions between data and model, then “who controls what” stops being a technical detail and becomes an architecture of knowledge.
At the same time, the authors underscore a limitation inherent in what BRW can show: the experiment records the final specification, not the full workflow. We do not know how many alternatives were tried, when and why certain routes were abandoned, or whether researchers, “consciously or unconsciously,” gravitated toward models that supported preferred conclusions.
What remains observable, with some confidence, is a temporally ordered association: final specification choices correlate with ideological priors measured before any analysis, in the first wave of the questionnaire, and it is this fit (not proof of intention) that sustains the bias-by-path hypothesis.
Between Method and World
Once the possibility of “controlling away” the channel is on the table, the argument ceases to be a dispute over models and becomes a dispute over what evidence is and what we ask it to do.
There is something deeply unsettling about Borjas and Breznau’s reanalysis, not because it uncovers scientific misconduct, but precisely because it doesn’t need to. What it puts on stage is more disturbing: the possibility that legitimate, technical, defensible decisions can produce distinct analytical worlds, and that part of this divergence aligns, systematically, with prior beliefs about the subject.
The paper itself, however, taps the brakes. Ideology was not a randomized “treatment”; it was a measured characteristic across a moderate number of teams, which limits the strength of causal inference.
In other words, the study does not claim, with the rigidity of a classic experiment, that “ideology causes X.” It shows a persistent association between priors and the choices that culminate in estimates, and it acknowledges uncertainty, including wide confidence intervals.
Still, the finding returns empirical science—especially in politically sensitive topics—to a stubborn tension: data do not speak for themselves. Not because “anything goes,” but because there is a labyrinth of plausible choices between the spreadsheet and the result. And if research design is endogenous, that labyrinth is not neutral: the final specification can correlate with beliefs that predate the analysis.
Here, metascience ceases to be a debate about robustness and becomes a debate about process. The authors themselves acknowledge a decisive gap: the experiment does not record the full path. It does not indicate how many alternatives were tried and discarded, when they were tried, or why. Hence, the agenda they gesture toward: observing and documenting workflows tracking the garden of forking paths that still remains, largely, outside the published paper.
There is another uncomfortable detail in any discussion of what counts as “valid evidence.” Within the experiment, each modeling strategy underwent randomized, double-blind peer review, yielding a referee score for each model.
Under that filter, moderate teams earned the highest mean score (0.35), above anti-immigration teams (0.03) and pro-immigration teams (−0.33). This doesn’t decide who is “right.” The authors themselves warn that the run-of-the-mill may be merely the most comfortable, not necessarily the most true. But it does suggest something structural: filters of quality—or of conventionality—also participate in selecting what rises to the status of an “acceptable” result, especially when “non-traditional” choices produce outliers and are, for that reason, penalized in peer judgment.
None of this has to curdle into cynicism. The authors insist on an exit that is, at once, more laborious and more honest: treating robustness as aggregated evidence, not as an anointed coefficient.
To that end, they run an “agnostic” robustness test: they estimate the effect of ideology across 883 models, covering all combinations of specifications used in the main tables and supplementary material, and display a specification curve.
The result is eloquent in scale: ideology appears with a statistically significant effect (P < 0.10) in 88.2% of models, a share that rises to 92.4% when models potentially affected by omitted-variable bias are excluded (those that do not include disciplinary fixed effects).
In other words, the association between ideological position and the estimate produced does not hinge on a single way of modeling, nor on a particular sampling choice that could be dismissed as a special case.
As the authors traverse the multiverse of specifications—swapping, combining, recombining defensible decisions—the signal returns along the overwhelming majority of paths: you change the lens, you change the frame, you change the bundle of controls, and ideology still shows up as a factor associated with the result.
And when a subset of models more vulnerable to distortion by omission is removed, the pattern becomes more frequent, suggesting, in plain terms, that this is not a fragile artifact of specification, but a feature that persists precisely when the test becomes more demanding.
After hundreds of models and a curve that refuses to settle into a single story, the argument widens. It’s no longer only about robustness; it’s about who gets to see the path, and who benefits when it stays hidden.
The Mirror, the Angle, and the Invisible Reader
Ultimately, Borjas and Breznau’s reanalysis may say less about immigration and social policy than about the nature of scientific evidence in charged domains. Those in which empirical reality arrives at the lab are already accompanied by dispute.
It suggests that “statistical truth,” like an image in a curved mirror, shifts with the angle of the viewer—or, more precisely, with the sequence of choices that determine what will be measured, compared, controlled, and published.
And there is a third vertex, almost always outside the frame: the invisible reader (the institution, the newsroom, the court of public opinion) for whom certain images circulate more easily than others.
There are no heretics in this story.
Each team adhered to established protocols, applied methods taught in graduate programs, and used tools accepted by disciplinary tradition. The problem is not bad faith; it is the invisibility of the route: the decisions that don’t fit in the final paper, the models tried and abandoned, the alternatives that disappear without leaving a trace, and that, taken together, determine what the reader will receive as evidence.
Like a newsroom in which everyone is handed the same raw material but only one story makes the cover, scientific production is also a process of selection and disappearance. Some forks close without a sound; others, through repetition and reputation, become the norm. What remains, the final coefficient, the “clean” figure, the number that becomes an argument, is only one among many possible versions of the same world.
So the challenge ahead may not be to insulate science from bias—an impossible task—but to build institutions capable of living with it without naturalizing it.
That calls for less fetishizing of neutrality and more transparency about choices; less idolatry of a single result and more attention to distributions; less worship of the closed paper and more openness of process.
If there is an antidote—one I learned from Borjas and Breznau—it doesn’t look like censorship, but like its opposite: pluralism.
More teams, more angles, more models, more cross-examination, not to turn every conclusion into opinion, but to make the route visible, light up the forks, and reduce the quiet power of a single tilted mirror.
Borjas, G. J., & Breznau, N. (2026). Ideological bias in the production of research findings. Science Advances, 12(1), eadz7173. https://doi.org/10.1126/sciadv.adz7173
About the Researchers:
George J. Borjas is affiliated with the Harvard Kennedy School and the National Bureau of Economic Research (NBER).
Nate Breznau is affiliated with the German Institute for Adult Education—the Leibniz Centre for Lifelong Learning in Bonn (Department of Organization and Program Planning).
An Annotated Interview / Backstage Notebook
In the Multiverse’s Editing Room, with Borjas and Breznau
On the screen, science looks like a calm object. A specification curve, a row of models, arranges chaos into something like a landscape. From a distance, it’s easy to believe the debate is about numbers. But in my exchange with Borjas and Breznau on February 6, the subject kept returning to the same place: before the number, there is a path. And that path is where evidence acquires direction.
I. Endogeneity of Design: Unconscious or Deliberate?
I began with the phrase that, in their paper, works like a hinge: endogeneity of design. Is this process mostly unconscious—the researcher’s quiet drift—or is there room for more deliberate (yet still legitimate) alignments between theoretical expectations and modeling choices?
They didn’t answer with psychology. They answered with architecture:
“The process of conducting empirical research has many steps: How to frame the question? Which data to analyze? How to measure the variables to be used? Which methodological technique to use? And so on.
Each of these decisions opens a fork in the road, and the combination of all the different decisions leads to very specific results that differ from what would have been obtained with another set of decisions.
Everyone who does empirical research knows this and ‘sees’ the specific impact of a set of decisions during the research process.
Unfortunately, the decisions can be manipulated by someone who wishes to reach a particular endpoint. That is why it is extremely important for the researcher to be totally transparent in what research design choices were made.”
Here, transparency doesn’t read as a decorative virtue; it reads as an institutional counterweight to a banal, explosive fact: there are many plausible routes, and some routes make the world look like something else.
II. Learning What “Pushes” Results
The next question presses on a more delicate point: in contexts of high analytical flexibility, do researchers inevitably learn which decisions “push” results in one direction or another?
Their reply arrives without padding:
“If, during the research process, a researcher does not learn that certain research design choices tend to push results in a particular direction, that researcher is not very competent to begin with.”
It’s a sentence that invites slow reading. It doesn’t merely say that learning is inevitable; it defines competence as the ability to see, in real time, what choices do to the estimate.
The problem, then, is not discovering that choices tilt outcomes. It’s what one does with that discovery, and how much of it stays outside the final paper, invisible to the reader who receives the coefficient already cleaned up, already edited, already converted into a conclusion.
III. Ideological Asymmetry: A Dominant Bias in the Social Sciences?
BRW’s ideological asymmetry—few anti-immigration teams—raises a question that is hard to keep purely technical. Does it suggest a dominant ideological bias in certain fields?
They answer with a conjecture anchored in an American backdrop:
“Given the widely publicized data in the US of how nearly all the money donated to political parties by faculty in universities goes in a single direction (left), it is difficult to dispute the conjecture that ‘social sciences today operate under a dominant ideological bias’.”
Editorial cut: they call it a conjecture, not a result of the experiment. But the line carries an infrastructural implication: if a field is asymmetric, then the plurality of routes (and of questions) may be filtered from the start, not only by method, but by the sociology of who gets into the lab.
IV. An Epistemological Problem—or a Sociological One?
If the asymmetry is real, I ask, what is it an epistemological problem in itself, or a reflection of the scientific community’s composition?
They step back into terrain where metascience still lacks a firm footing:
“We don’t know.”
Then a hypothesis:
“There’s probably a lot of ‘ideological bias’ in faculty hiring in universities.”
And from there, a research agenda:
“We need more research into how ideology shapes the entire research process from epistemology through methods onto results.”
The arc is clear: from recruitment to method, from method to result, a whole chain of selection in which what we call “evidence” is also an institutional photograph of what was permitted to be asked.
V. Tails, Extremes, and Polarization
When the conversation returns to the tails, the extreme, statistically significant results that travel most easily into policy and public argument, I ask what this does to polarization.
The answer comes as an avowed prior:
“Our prior is always to be skeptical of evidence in highly contentious and politicized fields.”
Then the critique shifts from the dataset to the ecosystem:
“Unfortunately, the hiring process in universities and the peer review process are also contaminated by ideological bias, so what one gets to read is already heavily filtered.”
And then comes the line that lands with the force of an accusation, even if it’s offered as a diagnosis:
“So it would not be too far-fetched to say that a lot of the distrust in science has probably been ‘earned’ through the years of scientists selling results that might have been manipulated to reach specific conclusions.”
Editorial cut: the paper measures associations and patterns; here, the conversation brushes up against public trust and the way a “result” becomes a political commodity. The line is not a statistic. It is a moral reading of an ecosystem of filters.
VI. Mitigating Selective Circulation—Without Censorship
If extremes travel, can anything be done about selective circulation without sliding into censorship or neutralizing dissent?
They don’t offer a simple solution, but they point in a direction:
“Although we don’t see a simple solution, we would say that the answer (if it exists) is exactly the opposite of censorship. Let a question be analyzed by many researchers from many different angles.”
Pluralism as an antidote, not to say that “anything goes,” but to insist that when there are many plausible routes, intellectual honesty is not choosing one and pretending the others never existed; it is making the forks visible.
VII. If Many Results Are Possible, Does Replicability Collapse?
If “the” empirical result is only one realization among many, does the classical ideal of replicability fall apart?
They resist the clean answer:
“We don’t know.”
But an ethos appears:
“But the scientific method involves checking and re-checking results, and paying attention to robustness.”
And with it, a criterion for discomfort:
“If there are many results in many different directions, then we either have no clear effect of something, or we need to do more work to identify the correct test.”
Replicability, here, isn’t a stamp. It’s repeated labor, and a diagnosis: either the effect isn’t there, or the test has not yet found its proper form.
VIII. Controlled Pluralism: Multiverses, Curves, Radical Transparency?
Instead of replication in the strict sense, should the field move toward controlled pluralism, multiverse analyses, specification curves, and radical transparency?
They don’t pick a side. They pick both:
“Both. We should perform replication and reanalyses—where we adjust previous models.”
This isn’t methodological fashion; it’s discipline: replicate and reanalyze and, in reanalysis, adjust earlier models, making choices and consequences explicit.
IX. Does This Apply to Public Health—Nicotine, Tobacco, Drugs?
I push the mechanism outside the lab and into domains where method and moral judgment knot together —public health, tobacco control, drug policy, etc. Is this a special case?
They refuse the comfort of the exception:
“Show me a field of science where evidence and moral judgments do not intertwine. This is not a problem unique to any one area.”
If the entanglement is structural, then the question isn’t where it happens, but under what rules it is governed.
X. When the Risk Is Epistemic Hegemony—Without Falling into Relativism
Finally, I ask whether, in areas of strong institutional consensus, the main risk may not be individual bias but the epistemic hegemony of certain frameworks, and how science can address that without sliding into relativism.
The answer ends at a limit:
“No idea.”
That last sentence functions like a window. It suggests that when the conversation reaches its hardest point—not the ideology of individuals but the hegemony of structures—the language we have remains insufficient.
The paper measures what it can measure. The interview, at times, brushes up against what we do not yet even know how to ask.
And that is where this notebook finds its purpose: to show that behind every coefficient there is a system of choices; behind every choice, a set of filters; and behind the filters, an invisible reader: the institution, the editor, the guideline, public policy, us; waiting for the number that travels best.








