Skip to Main Content

Linguistics

This guide provides information and selected resources on linguistics available to you through UCalgary Library and the Internet

Sources for Linguistic Data

Data is the foundation of research in linguistics, whether you collect it yourself or use what others have gathered. There are a number of places where you can find existing datasets; below are some examples:

General Sources
  • The Language Archive: https://archive.mpi.nl/tla/
    • Created by the Max Planck Institute for Psycholinguistics, TLA has audio and video language corpus data from around the world, records of speech in everyday interactions, information about linguistic phenomena and documentation, and more.
  • Endangered Languages Archive: https://www.elararchive.org/
    • A digital repository of multimedia collections from endangered languages across the world, hosted by the Berlin-Brandenburg
      Academy of Sciences and Humanities.
  • Archive of the Indigenous Languages of Latin America: https://www.ailla.utexas.org/
    • A digital archive of material in and about indigenous languages of Latin America, hosted by the University of Texas at Austin.
  • The Rosetta Project: https://archive.org/details/rosettaproject
    • A global collaboration of language specialists and native speakers containing nearly 10,000 pages of material from 2500 languages. Created by the Long Now Foundation and hosted by the Internet Archive.
  • The TalkBank System: https://talkbank.org/
    • Hosted at Carnegie Mellon University, TalkBank is a system of data repositories supporting 14 fundamental linguistic research areas, focusing on spoken communication. Particularly known for the Child Language Data Exchange System (CHILDES) and PhonBank.
Language-Specific Sources
  • British National Corpus: http://www.natcorp.ox.ac.uk/
    • A 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English, both spoken and written, from the late twentieth century.
  • English Corpora: https://www.english-corpora.org/corpora.asp
    • A collection of multiple English corpora from different time periods, with different sources, and from different countries.
  • The ARTFL Project
    • A collaboration between the French government and the University of Chicago which provides its members with access to North America's largest collection of digitized French resources. Available through the library catalogue at the link above.
  • El Corpus del Espanol: https://www.corpusdelespanol.org/x.asp
    • Multiple Spanish corpora, including both historical and modern language.

For help with finding other resources, contact the Linguistics librarian.

Analyzing Your Data

Managing Your Research Data

It's important to consider how you'll manage the data you're collecting, both during the active phases of your research project and after it's complete. Here are some resources that can help you in this area.

  • The Open Handbook of Linguistic Data Management
    • Written by linguists for linguists, this is a guide to principles and methods for the management, archiving, sharing, and citing of linguistic research data, especially digital data.
  • UCalgary's Research Data Management LibGuide
    • A general guide to all things research data management, including links to key tools and resources.
  • PRISM Data
    • UCalgary's institutional data repository, where you can share data collected as part of a research project.
  • DMP Assistant
    • A free, Canadian, bilingual tool to help you create data management plans (DMPs) before you start a research project.