The coevolution of languages and genes: tracking down matches and mismatches


The coevolution of languages and genes represents the ultimate Darwinian paradigm to track population dynamics in time and space, and one of the most evoked parallels between cultural and biological diversity (Cavalli-Sforza, 1991). In the past years scholars analyzed this congruence to shed light on population origin, diversification and contact. Popular case studies include the diffusion of major language families, such as the Indo-European (Haak et al.,2015) or the Austronesian (Gray et al., 2009), as well as smaller regional cases of contact vs. cultural barrier between groups (Pakendorf, 2014).

Genealogical Tree of Dead and Living Languages,
by Félix Gallet (c. 1800).
Genealogical Tree of Dead and Living Languages, by Félix Gallet (c. 1800).

Mismatches between linguistic and genetic variation are usually disregarded as an exception to the general pattern. But how often these events occur? Can we estimate the incidence of language shift and reconstruct more realistic models of cultural evolution? And which circumstances are driving these discontinuities in cultural transmission?

To answer these questions at a worldwide as well at a regional scale, we first need a robust panel of genetic diversity to be matched with relevant linguistic and cultural information on the populations collected.

Numerous standardized linguistic databases are built at the Max Planck Institute for the Science of Human History in Jena: for example glottobank (which includes grambank, lexibank, parabank, phonobank, numeralbank), CoBL, soundcomparisons. Other relevant linguistic databases are WALS, Tsammalex, WOLD, AFBO, AUTOTYP. Finally, D-Place (Database of Places, Languages, Cultures and Environment) and Pulotu (Database of Pacific Religions) are great resources for quantitative cultural comparisons.


We aim at assembling a new standardized database of genetic diversity to be matched with these existing databases, and to be used not only by geneticists but also by linguists, cultural anthropologists, and other experts interested in the study of human history and diversity. We will screen genetic literature for samples with clear geographic and ethnolinguistic characterization
, and prefer fast evolving genetic markers which could harbor signature of events occurred in the past millennia.

Scientific relevance

  • Push geneticists to properly characterize the human history behind the molecular data
  • Provide a reference tool for geneticists
  • Extract information on genealogical relatedness and demography useful for non-geneticists
  • Frame questions of major relevance for human history and diversity in a multidisciplinary perspective

Our final aim is to develop a more realistic understanding of the complex mechanisms behind cultural transmission. The change of cultural features through time not only impacts our ability of tracing back human prehistory, but also influences the definition of “population” as the unit of research.

This project is developed in collaboration with Damián Blasi and  Prof. Balthasar Bickel at University of Zurich and Robert Forkel at MPI SHH, Jena.


  • Cavalli-Sforza LL. 1991. Genes, Peoples and Languages. Sci Am 265:104–110.
  • Gray RD, Drummond AJ, Greenhill SJ. 2009. Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement. Science (80) 323.
  • Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, et al. 2015. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522:207–211.
  • Pakendorf B. 2014. Coevolution of languages and genes. Curr Opin Genet Dev 29:39–44.