Lexique 3 -- Expanded
Download
Lexique 3 -- Expanded tab-delimited text version [~5 mb]
Lexique 3 -- Expanded Excel version [~33 mb]
Lexique 3 -- Original unedited tab-deliminted text version [~5 mb]
Description
The expanded Lexique 3 database is a slightly expanded version of the original Lexique Database reported by New (2006). the expanded version contains additional columns for the following information. Note that accented characters (e.g., é) were treated as different letters in all of the computations.
- freq_films_SUBTL_ANALOG: the summed film frequency (freq_films) for each word collapsed across grammatical categories
- posBi: summed length-specific type positional bigram frequency, calculated for (and relative to) all single-word entries with a summed film frequency greater than 1.
- legalBi: Flag indicating if each string only contianed legal bigrams.
- posUni: same as for posBi, above, only for unigrams / individual letters.
- legalUni: same as for legalBi, above, only for unigrams / individual letters.
- coltNOrth: the number of orthographic neighbours (Coltheart's N) calculated for (and relative to) all single-word entries with a summed film frequency greater than 1.
- OrthNeighbours: pipe ("|") delimited list of all of the orthographic neighbours.
- coltNPhon: same as coltNOrth only calculated for the phonological representation of the word.
- PhonNeighbours: same as OrthNeighbours, above, only for phonology.
- hasHomographs: flag indicating if the word has homographs or not. See the special note about homophones for a related consideration when using this screen.
- hasHomophones: same as above only for homophones. Note that most words in French have homophones because of plural forms for which the "s" is silent.
- isSingleWord: flag indicating if the "word" in Lexique was a single word or multiple words with an intervening space (e.g., "a capella").
Additionally, an unedited version of the original Lexique 3 database is made available here for archival purposes.
NOTE: There is a separate entry/line for each word for each of its grammatical classes, in addition to the collapsed information that has been added in the expanded version, as noted above. Additional processing will be required to extract single-word information. A simple way of doing so is to sort the database by descending freq_films_SUBTL_ANALOG and then remove duplicate entries as they are encounted, thus preserving only the additional grammatical/lemma/etc information for the most frequently encountered form of the word.
The official verison of the Lexique 3 database may be obtained from:
The original database is associated with the following references:
New, B., Brysbaert, M., Veronis, J., & Pallier, C. (2007). The use of film subtitles to estimate word frequencies. Applied Psycholinguistics, 28(4), 661-677
New, B. (2006). Lexique 3: Une nouvelle base de données lexicales. Actes de la Conférence Traitement Automatique des Langues Naturelles (TALN 2006), Avril 2006, Louvain, Belgique.
Copyright Notice:
The information provided here is intended to ensure the timely dissemination of the EsPal data in an alternative format that may be useful for non-commercial academic research. Copyright of all of this material is maintained by the original authors or other copyright holders, and it is assumed that all users of these data will adhere to these copyrights.