gutenbergsite/site/ebooks/offline_catalogs.md

6.3 KiB

layout title permalink
default Offline Catalogs | Project Gutenberg /ebooks/offline_catalogs.html

Offline Catalogs and Feeds

This page tells you how to find and get Project Gutenberg eBooks if:

  • you want notifications as new books become available, or
  • you don't want to use a browser to download eBooks but prefer other software like an ftp-client or wget, or
  • you are on a slow or limited internet connection, or
  • you'd rather have a handy book catalog to consult offline, or
  • you would like to make your own listings or derivatives from this information.

Feeds of new books

RSS

Find our RSS feed in the cache/feeds location. Updated daily after 2am U.S. Eastern time.

Email

The "posted" list is where every new eBook is announced as it is being uploaded to the Project Gutenberg servers. New books are then available for download, typically within 2 hours. The list has a once-daily digest option, and also online public archives.

Social media

List of Sites Hosting Project Gutenberg EBooks

The Project Gutenberg collection is available from dozens of sites offering access via http/https, ftp, rsync, and a few other methods. See our listing of mirror sites to choose the location, access method, or speed. Mirrors generally do not have a friendly Web-based front end, but do have the collection. See the mirroring how-to for details.

The GUTINDEX Listings of EBooks

Updated at least monthly. These plain text files provide the basic information about each eBook, and are good for searching from your own system (for example, use control-F in a Web browser or word processor). They are the accession lists for Project Gutenberg. Note that these files are not recommended for automation (that is, to use as input to generate a computerized database). Instead, use one of the catalog files mentioned below.

GUTINDEX Listings by Year

If GUTINDEX.ALL is too big for you or you prefer separate annual lists, you can download GUTINDEX files by year.

Affiliate sites

Not part of Project Gutenberg - check laws of the country where you are, before accessing or redistributing any eBooks.

Directory/Folder Listings

You can navigate the directory/folder contents starting at /dirs, however this is not very user-friendly.

The Project Gutenberg Catalog Metadata in Machine-Readable Format

XML/RDF

All Project Gutenberg metadata are available digitally in the XML/RDF format. This is updated daily (other than the legacy format mentioned below). Please use one of these files as input to a database or other tools you may be developing, instead of crawling or roboting the website.

Note that the exact same metadata is available as a per-eBook .rdf file. These are found in the cache/epub (i.e., cache/generated) directory, accessible by mirroring or by the directory/folder listings above. The large XML/RDF file is simply a concatenation of all the per-eBook metadata.

MARC Records (MAchine Readable Cataloging)

MARC is a common metadata format utilized by library card catalog databases. Steve Thomas of the University of Adelaide provided a Perl script to generate MARC records from the XML/RDF catalog files. Find it here: pgrdf2marc.pl. You will need to rename it, and make any necessary changes to run on your own system. This is unsupported software, provided without warranty or guarantee.

These instructions were provided to Project Gutenberg, and are listed here in the hopes they may be useful.

  • Download the XML/RDF file (i.e., /cache/epub/feeds/rdf-files.tar.zip.
  • Unzip, untar
  • Run your modified copy of the Adelaide script above, pgrdf2marc.pl, against the untarred/unzipped RDF files to generate MARC records (there may be a few RDF records that do not convert, perhaps as many as 100)

A Local, Browsable Copy on your own Computer or Mobile Device

Kiwix is an application that lets you download a large collection and use it locally. A copy of the Project Gutenberg content was made available in November 2018, and may be updated periodically.