Text mining is defined as "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources", while data mining is an activity that attempts to discover interesting patterns from structured databases of facts (Hearst, 2003).
A common example of data mining is the analysis of sales records (a structured data) to determine the best time to push a sales campaign. Whereas a common example for text mining would be doing sentiment analysis on tweets on Twitter to find out more about people's reaction about a particular event.
Here's a fascinating article on the current state of TDM from a library perspective: McCracken, P. & Raub, E., (2023) “Licensing Challenges Associated With Text and Data Mining: How Do We Get Our Patrons What They Need?”, Journal of Librarianship and Scholarly Communication 11(1). doi: https://doi.org/10.31274/jlsc.15530
CAP includes over 6 million official, book-published state and federal United States case law, free for the public to access and use. Bulk downloads and API is available for text mining and analysis purposes. You can also apply for Researcher Access for unlimited access.
English-Corpora.org is the most widely used collection of corpora (highly searchable collections of texts). It contains a total of 17 corpora such as Wikipedia corpus, TIME Magazine corpus, and others. Check out this guided tour (PDF) for a quick overview on the built-in text analysis features.
A parallel corpus containing political speeches of US presidents, Hong Kong Governors (1984-1996), Hong Kong Chief Executives (1997-2015), Taiwan Presidents (1978-2015), and Premiers of the People’s Republic of China (1984-2015).
Constellate is the text analytics service from the not-for-profit ITHAKA - the same people who brought you JSTOR and Portico. It is a platform for teaching, learning, and performing text analysis using the world’s leading archival repositories of scholarly and primary source content.
Policy Commons is a one-stop platform where researchers, academics, librarians, and students discover, access and share millions of publications from the world’s leading policy experts, think tanks, IGOs, and NGOs.