Answered By: James Adams
Last Updated: Dec 04, 2025     Views: 841

Several of Harvard Library's databases allow text and data mining for research purposes under certain conditions: 

ProQuest

A large amount of content that Harvard subscribes to through ProQuest is available through their Text Data Mining platform (contact a librarian to learn more). Some highlights include:

Gale NewsVault 

  • British Library Newspapers, parts I-V, 1800-1950
  • Daily Mail Historical Archive 
  • Economist Historical Archive, 1843-2012 
  • Times Digital Archive, 1785-2006

JSTOR

JSTOR offers text analysis support for downloading metadata and requesting full-text datasets. JSTOR's former text analysis platform, Constellate, was sunset in July 2025, but its Jupyter notebooks are available through GitHub, with lessons about how to use Python for text mining in addition to other topics.

HathiTrust

HathiTrust allows text mining, subject to certain conditions, via the HathiTrust Research Center (HTRC). The HathiTrust policies are posted at http://www.hathitrust.org/datasets.

ScienceDirect 

Elsevier allows some text mining of content in its ScienceDirect database that Harvard subscribes to. For details on Elsevier's policy, as well instructions for accessing and using their API, please consult the Elsevier website

Please contact an HKS Librarian for more information about text-mining the above resources.