APIs for Scholarly Resources
A list of commonly-used APIs for scholarly resources
Text and Data Mining
What is Text and Data Mining?
Text mining is defined as "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources", while data mining is an activity that attempts to discover interesting patterns from structured databases of facts (Hearst, 2003).
A common example of data mining is the analysis of sales records (a structured data) to determine the best time to push a sales campaign. Whereas a common example for text mining would be doing sentiment analysis on tweets on Twitter to find out more about people's reaction about a particular event.
Here's a fascinating article on the current state of TDM from a library perspective: McCracken, P. & Raub, E., (2023) “Licensing Challenges Associated With Text and Data Mining: How Do We Get Our Patrons What They Need?”, Journal of Librarianship and Scholarly Communication 11(1). doi: https://doi.org/10.31274/jlsc.15530
Text mining is defined as "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources", while data mining is an activity that attempts to discover interesting patterns from structured databases of facts (Hearst, 2003).
A common example of data mining is the analysis of sales records (a structured data) to determine the best time to push a sales campaign. Whereas a common example for text mining would be doing sentiment analysis on tweets on Twitter to find out more about people's reaction about a particular event.
Here's a fascinating article on the current state of TDM from a library perspective: McCracken, P. & Raub, E., (2023) “Licensing Challenges Associated With Text and Data Mining: How Do We Get Our Patrons What They Need?”, Journal of Librarianship and Scholarly Communication 11(1). doi: https://doi.org/10.31274/jlsc.15530
Text Sources
- Caselaw Access Project (CAP)CAP includes over 6 million official, book-published state and federal United States case law, free for the public to access and use. Bulk downloads and API is available for text mining and analysis purposes. You can also apply for Researcher Access for unlimited access.
- COREProcess and analyse the largest structured collection of open research with their full texts, manage your research papers, make them more discoverable, and comply with funder mandates.
- English-Corpora.orgEnglish-Corpora.org is the most widely used collection of corpora (highly searchable collections of texts). It contains a total of 17 corpora such as Wikipedia corpus, TIME Magazine corpus, and others. Check out this guided tour (PDF) for a quick overview on the built-in text analysis features.
- HKBU Corpus of Political SpeechesA parallel corpus containing political speeches of US presidents, Hong Kong Governors (1984-1996), Hong Kong Chief Executives (1997-2015), Taiwan Presidents (1978-2015), and Premiers of the People’s Republic of China (1984-2015).
- JSTOR ConstellateConstellate is the text analytics service from the not-for-profit ITHAKA - the same people who brought you JSTOR and Portico. It is a platform for teaching, learning, and performing text analysis using the world’s leading archival repositories of scholarly and primary source content.
- Policy CommonsPolicy Commons is a one-stop platform where researchers, academics, librarians, and students discover, access and share millions of publications from the world’s leading policy experts, think tanks, IGOs, and NGOs.
- Chinese/English Political Interpreting Corpus (CEPIC)A parallel corpus of transcripts of speeches delivered by top political figures from Hong Kong, Beijing, Washington DC and London, as well as their translated/interpreted texts.
- Chinese-English Parallel CorporaA parallel corpus for financial and legal texts in Hong Kong, extracted from the Hong Kong Stock Exchange, the Securities and Futures Commission of Hong Kong, and Hong Kong government websites.
- Accessible ArchivesAccessible Archives' databases contain the rich, comprehensive material found in leading historic periodicals and books in a user-friendly online environment.
- Last Updated: Oct 12, 2023 11:01 AM
- URL: https://libguides.ucalgary.ca/apis
- Print Page
Subjects: Biological Sciences, Computer Science, Library & Information Science, Research, Statistics
Tags: API, scholarship