Answered By: James Adams Last Updated: Dec 04, 2025 Views: 841
Several of Harvard Library's databases allow text and data mining for research purposes under certain conditions:
ProQuest
A large amount of content that Harvard subscribes to through ProQuest is available through their Text Data Mining platform (contact a librarian to learn more). Some highlights include:
- ProQuest American Periodicals
- ProQuest Congressional:
- Congressional Record, 1789-1997
(For additional date ranges of the Congressional Record, you can also try the API developed by the University of North Texas.) - House and Senate Published Hearings, 1824-2003
- House Unpublished Hearings, 1973-1980
- Senate Unpublished Hearings, 1985-1990
- NOTE: Congressional materials are not available by default, but may be upon request.
- Congressional Record, 1789-1997
- ProQuest Historical Annual Reports, 1844-2008
- ProQuest Historical Newspapers, including:
- Atlanta Constitution, 1868-1930
- New York Times, 1851-1933
- New York Times Index, 1851-1993
- Wall Street Journal, 1889-1932
- More information on newspapers available for TDM via ProQuest is available from Baker Library at HBS.
Gale NewsVault
- British Library Newspapers, parts I-V, 1800-1950
- Daily Mail Historical Archive
- Economist Historical Archive, 1843-2012
- Times Digital Archive, 1785-2006
JSTOR
JSTOR offers text analysis support for downloading metadata and requesting full-text datasets. JSTOR's former text analysis platform, Constellate, was sunset in July 2025, but its Jupyter notebooks are available through GitHub, with lessons about how to use Python for text mining in addition to other topics.
HathiTrust
HathiTrust allows text mining, subject to certain conditions, via the HathiTrust Research Center (HTRC). The HathiTrust policies are posted at http://www.hathitrust.org/datasets.
ScienceDirect
Elsevier allows some text mining of content in its ScienceDirect database that Harvard subscribes to. For details on Elsevier's policy, as well instructions for accessing and using their API, please consult the Elsevier website.
Please contact an HKS Librarian for more information about text-mining the above resources.