Polygenic scores across ancestries: why the accuracy gap is a methods problem, not a biology problem

If you're not of European descent and you've looked into polygenic screening, you've probably hit the same wall: most genetic risk scores were built on data from white people, and their accuracy drops (sometimes by more than half) when applied to anyone else. That's a real problem. Cross-ancestry accuracy, meaning how well genetic predictions perform across different ancestral backgrounds, has been the field's biggest equity failure. A 2019 study by Martin et al. found that polygenic risk scores were several times more accurate in Europeans than in anyone else, with accuracy losses exceeding 75% in African populations. For years, it looked like polygenic screening might be fundamentally limited to one slice of humanity.

It wasn't. The accuracy gap turned out to be a methods problem, not a biology problem. The genetic variants that cause type 2 diabetes, breast cancer, and coronary artery disease are the same in a person of West African descent as in a person of Finnish descent. The causal biology is shared. The models weren't failing because disease genetics differs across ancestries. They were failing because 88% of the training data came from Europeans (GWAS Diversity Monitor), and the statistical shortcuts those models relied on broke when applied to populations with different DNA structure.

Most genetic scores fail non-European families

Let's be precise about what went wrong.

Traditional polygenic scores don't directly measure which genetic variants cause disease. Instead, they use a shortcut: they find variants that are correlated with disease in large datasets, then add up those correlations into a score. The correlations come from patterns called linkage disequilibrium (LD): the tendency for nearby stretches of DNA to be inherited together as a block. If a variant near a disease-causing mutation is common in the study population, the model learns to use that nearby variant as a stand-in for the real one.

This works fine when everyone in the study has the same LD structure. But they don't. African populations have the shortest LD blocks because they're the oldest and most genetically diverse human populations. European populations have longer blocks because of bottlenecks during migration out of Africa. So a score trained on European data picks up proxy correlations specific to European LD structure. Apply that to someone of West African ancestry, and the proxy-to-causal link breaks. The model is pointing at the wrong spot in the genome.

That's why accuracy losses of 40 to 75% show up in African, South Asian, and East Asian cohorts. Capalbo et al. (2024) documented reductions exceeding 75% in African populations. The ACMG's 2024 statement attributed poor score portability to bias toward European variant discovery. These are real findings, and they point to the same root cause: old methods couldn't find the causal variants when the surrounding DNA landmarks looked different.

But notice what the problem actually is. It's not that the biology of disease differs across ancestries. What failed was the method's ability to find causal variants when the surrounding LD structure changed.

Why the gap is solvable

If the causal variants are shared but the LD patterns around them differ, the fix is conceptually straightforward: stop relying on LD proxies and get closer to the causal variants themselves.

That's what SBayesRC does. Instead of blindly tagging statistical correlations and hoping the right variant is nearby, it layers biological knowledge on top of the statistical signal. Think of it this way: rather than saying "this region of the genome is correlated with diabetes in Europeans," SBayesRC asks "does this variant actually sit in a gene regulatory region? Does it alter a protein?" Those functional annotations (drawn from more than 7 million variants) help the model identify which variants are genuinely causal rather than just correlated bystanders.

Why does this matter for cross-ancestry prediction? A variant that disrupts a protein in a European person disrupts the same protein in an African person. The biology doesn't care about ancestry. So a score that weights toward functionally annotated, likely-causal variants transfers much better across populations than one built on LD-dependent shortcuts.

The improvement is measurable. Zheng et al. reported that SBayesRC improved trans-ancestry prediction by up to 33% compared to the previous generation of methods, and outperformed LDpred-funct, PolyPred-S, and PRS-CSx by 12 to 15%. Weissbrod et al. (2022) showed their PolyPred method improved prediction accuracy by up to 32% in African populations. BridgePRS (2023) and X-Wing (2023) demonstrated similar gains. Martin et al.'s 2019 call to diversify the field was heard, and researchers responded with real solutions.

If you're considering polygenic screening and want to understand how these improvements apply to your ancestry, our counselors can walk through the ancestry-specific data for your background.

What the data shows now

Herasight's disease polygenic scores, described in Moore et al. (2025), put this approach into practice. The scores use SBayesRC with functional annotations across 7+ million variants, cover 17 diseases, and were calibrated across 8+ ancestry groups using UK Biobank and All of Us data. And they were validated on sibling pairs, the only validation context that directly mirrors embryo screening. Cross-ancestry performance improved substantially over previous methods, particularly in African and East Asian populations.

For type 1 diabetes, Herasight published a dedicated cross-ancestry evaluation using the NIH's All of Us cohort. The HLA-ARC method achieved an AUROC exceeding 0.91 in European individuals and exceeding 0.89 in non-European groups. It outperformed all existing methods across every ancestry group tested. The European/non-European gap still exists, but it's a 2-point spread, not the 40-to-75-point chasm the field was dealing with five years ago.

The most telling result from Moore et al.: when screening among 10 embryos in families where both parents have type 2 diabetes, the risk difference between the highest and lowest risk embryos can exceed 23%. That holds across ancestry groups. It's a direct, data-backed answer to the question "does this work for people like me?" For the conditions and ancestry groups tested, screening produces comparable benefit regardless of your background.

These aren't theoretical projections. They're measured results from within-family validated scores tested on real data from diverse populations.

What this means for your family

We don't claim the cross-ancestry problem is completely solved. Some ancestry groups still have less data than others. Some conditions have stronger cross-ancestry evidence than others. And Herasight reports ancestry-specific performance metrics precisely so you can see what the data supports for your situation rather than getting a one-size-fits-all number.

But the gap between "this technology works for Europeans and basically no one else" and "this technology works across populations with measurable, calibrated, reported performance" has been closed by better science. Not by lowering the bar, but by building methods that target what all humans share: the same causal biology underlying disease.

We calibrate across 8+ ancestry groups for 17 diseases because accurate genetic screening shouldn't depend on where your ancestors lived. The science to make that real now exists. The data shows it works.

If you're considering polygenic screening and want to know how the predictions perform for your ancestry, reach out to our team.