← Back to Journal Trends

Investigating missing country information in OpenAlex — and a method to recover some of it

How author affiliation history can significantly close the country-attribution gap

Journal Trends · 25 May 2026

If you build a tool that visualises which countries publish in a given academic journal — and you source your data from OpenAlex — you'll eventually open a journal page and find a country-stacked-bar chart that adds up to far less than the journal's actual paper count. The remaining papers haven't disappeared; they show up in the "papers per year" line above the stack but contribute nothing to the country trends because OpenAlex's institutional graph doesn't yet have a country recorded for them.

For one particular journal we audited while building Journal Trends, 1,156 of its 1,456 OpenAlex-indexed works (≈ 79 %) were of unknown country origin — leaving the country chart showing only about a fifth of the actual catalogue. We don't yet know how typical this is across publishing as a whole (we plan to find out — see the end of the post), but the size of the gap on a single journal was striking enough to investigate properly.

This post lays out:

  1. Which OpenAlex fields we audited — and why only one provides a meaningful fallback
  2. The author-affiliation-history trick that can recover an additional ~12 percentage points of country coverage (~24 % of the missing papers)
  3. The trade-offs and what we're shipping in Journal Trends
  4. A planned journal-wide audit

What we audited

For that one journal, we pulled the full OpenAlex record (not the slim projection we usually fetch) for every paper and asked: is there ANY field, anywhere, that gives us a country signal we're not already using?

SourcePapers coveredUseful as a fallback?
authorships[].institutions[].country_code300 / 1,456 (21 %)— baseline
authorships[].countries[]300 / 1,456 (21 %)No — exact same papers, derived from the same data
Top-level countries_distinct_count300 / 1,456 (21 %)No — same
raw_affiliation_strings[]349 / 1,456 (24 %)Marginal — +49 papers via brittle text-parsing for country names
Top-level institutions_distinct_count > 01,026 / 1,456 (70 %)No — counts the idea of institutions but reveals nothing about which countries

So the standard fields are a dead-end. Any approach that stays within a single OpenAlex Work record is bounded by the 21 % baseline for this journal.

The author trick

OpenAlex's /authors/<id> endpoint returns each author's affiliation history: a list of {institution, years[]} entries showing where they've been affiliated and when. So even if Paper X (2010, by Author A) has no institution attached, Paper Y (2010 in a different journal by Author A) might — and A's profile records that affiliation with the year range.

For each paper with authors but no country attribution, we can:

  1. Pull the author IDs from authorships[].author.id
  2. Fetch each author from /authors/<id>
  3. Walk their affiliations[] list looking for an entry whose years[] covers the paper's publication_year
  4. If found, take the institution.country_code and credit it to the paper

Recovery rate on a 50-paper sample

We ran this on a 50-paper sample drawn from the journal's 726 "papers-with-authors-but-no-country" subset (the 430 papers with no authors at all are unrecoverable through this method).

OutcomeShareNote
Country recovered via year-matched author affiliation24 % (12 papers)Reliable; year-matched institutional history is OpenAlex's own data
Still no country after fallback76 % (38 papers)Author either has no profile (affiliations[] empty) or no entry covering the paper's year

Of the 96 distinct authors fetched, only 45 (47 %) had any populated affiliations[] history at all — the rest are "ghost" author records with just an ID and a name. The recoverability of any given paper depends heavily on whether its co-authors happen to be actively-profiled researchers.

Extrapolated to the whole journal: coverage would move from 21 % → ~33 % — a 12-percentage-point lift, every point of which is grounded in OpenAlex's own institutional graph.

Recovered countries from the sample looked plausible — researchers showed institutional affiliations consistent with their field and era. Unrecovered cases skewed heavily toward older single-author papers where the author has no OpenAlex profile beyond a name.

Trade-offs to be honest about

What we ship in Journal Trends

For the immediate release we're doing the honest minimum:

For a follow-on release:

What's next: a journal-wide audit

A single-journal observation tells us the problem exists; it doesn't tell us how widespread or severe it is across academic publishing as a whole, nor how much of it the author-fallback method actually fixes at scale.

We plan to run this audit across every journal currently in Journal Trends to measure:

Results in a future post.

The bigger picture

For research-integrity work — which is why Journal Trends exists — the rule should be: prefer transparent partial coverage over invisible completeness. A chart that says "we don't know about this slice" is more useful than a chart that quietly counts only the slice it knows about and pretends that's the whole picture.

The author-affiliation trick won't close the gap for every journal — a 1990s practitioner journal might stay only partially recoverable forever — but it's an honest lift, free of new infrastructure beyond a cache table. And it scales gracefully: as OpenAlex back-fills more author profiles over time, the recoverable fraction grows on its own.

← Back to Journal Trends