Surnames and ancestry in Brazil

PLoS One. 2017 May 8;12(5):e0176890. doi: 10.1371/journal.pone.0176890. eCollection 2017.

Abstract

This paper presents a method for classifying the ancestry of Brazilian surnames based on historical sources. The information obtained forms the basis for applying fuzzy matching and machine learning classification algorithms to more than 46 million workers in 5 categories: Iberian, Italian, Japanese, German and East European. The vast majority (96.7%) of the single surnames were identified using a fuzzy matching and the rest using a method proposed by Cavnar and Trenkle (1994). A comparison of the results of the procedures with data on foreigners in the 1920 Census and with the geographic distribution of non-Iberian surnames underscores the accuracy of the procedure. The study shows that surname ancestry is associated with significant differences in wages and schooling.

MeSH terms

  • Brazil
  • Humans
  • Names*

Grants and funding

This work was supported by CAPES/ BEX Grant 2549/15-8, https://www.capes.gov.br/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.