ESCOLEX: a grade-level lexical database from European Portuguese elementary to middle school textbooks

Behav Res Methods. 2014 Mar;46(1):240-53. doi: 10.3758/s13428-013-0350-1.

Abstract

In this article, we introduce ESCOLEX, the first European Portuguese children's lexical database with grade-level-adjusted word frequency statistics. Computed from a 3.2-million-word corpus, ESCOLEX provides 48,381 word forms extracted from 171 elementary and middle school textbooks for 6- to 11-year-old children attending the first six grades in the Portuguese educational system. Like other children's grade-level databases (e.g., Carroll, Davies, & Richman, 1971; Corral, Ferrero, & Goikoetxea, Behavior Research Methods, 41, 1009-1017, 2009; Lété, Sprenger-Charolles, & Colé, Behavior Research Methods, Instruments, & Computers, 36, 156-166, 2004; Zeno, Ivens, Millard, Duvvuri, 1995), ESCOLEX provides four frequency indices for each grade: overall word frequency (F), index of dispersion across the selected textbooks (D), estimated frequency per million words (U), and standard frequency index (SFI). It also provides a new measure, contextual diversity (CD). In addition, the number of letters in the word and its part(s) of speech, number of syllables, syllable structure, and adult frequencies taken from P-PAL (a European Portuguese corpus-based lexical database; Soares, Comesaña, Iriarte, Almeida, Simões, Costa, …, Machado, 2010; Soares, Iriarte, Almeida, Simões, Costa, França, …, Comesaña, in press) are provided. ESCOLEX will be a useful tool both for researchers interested in language processing and development and for professionals in need of verbal materials adjusted to children's developmental stages. ESCOLEX can be downloaded along with this article or from http://p-pal.di.uminho.pt/about/databases .

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Child
  • Computer-Assisted Instruction / statistics & numerical data*
  • Databases, Factual* / statistics & numerical data
  • Humans
  • Information Literacy
  • Language
  • Portugal
  • Reading*
  • Schools
  • Textbooks as Topic*
  • Vocabulary*