Various updates and redactions.

bookshelf
Greg Newby 2019-11-27 13:31:24 -05:00 committed by GitHub
parent 29d1ad2101
commit a2bf4c476f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 23 additions and 398 deletions

View File

@ -6,7 +6,7 @@ permalink: /how_to/readers_faq.html
# Readers' FAQ
Most of this page is no longer actively maintained. Some content may be inaccurate or outdated - especially external links. The advice tends to focus on solo producers of eBooks, emphasizing plain text. To see current submission requirements, see the Project Gutenberg upload portal at [https://upload.pglaf.org](https://upload.pglaf.org)
Are you interested in submitting an eBook to Project Gutenberg? Instead, please see current submission requirements via the Project Gutenberg upload portal at [https://upload.pglaf.org](https://upload.pglaf.org)
<div class="contents">
<ol>
@ -81,224 +81,31 @@ Most of this page is no longer actively maintained. Some content may be inaccura
# About Finding eBooks
## How can I find an eBook I'm looking for?
For PG books, the simplest way is to go to the [Online Search page](https://dev.gutenberg.org/ebooks/), type the Author or Title into the search form, press the "Search" button, and follow the choices. There also is a full-text search available.
For Project Gutenberg (PG) eBooks, the simplest way is to go to the [Online Search page](https://www.gutenberg.org/ebooks/), type the Author or Title into the search form, press the "Search" button, and follow the choices. You can specify fields to search, such as "a.Twain" for authors named "Twain." There also is a full-text search available.
## Can I get a complete list of Project Gutenberg eBooks?
Yes. [GUTINDEX.ALL](https://www.gutenberg.org/dirs/GUTINDEX.ALL) is the raw list of files posted.
## How can I download a PG text without using the web catalog?
We have to divide this question into two answers, for books up to 10,000, and books after 10,000, or reposted since we moved past 10,000.
Books posted after 10,000 go into a new, simpler, naming scheme. Books REposted after we passed 10,000 (around November 2003) also use this scheme. We are reposting many older books, with improvements and corrections, all the time, and older books may also be reposted into the new scheme.
You can see clearly from the line in GUTINDEX.ALL whether the book is in the old naming scheme or the new naming scheme. Where the line starts with a Month and Year, and contains a file-name template in square brackets, the book is still in the old scheme, for example:
<pre>
Feb 2005 Mike, by P. G. Wodehouse [mikewxxx.xxx] 7423
</pre>
The line for the same book, in the new naming scheme, would omit the Month and Year, and the filename base, and look like:
<pre>
Mike, by P. G. Wodehouse 7423
</pre>
### Books after 10,000 — the new naming scheme
To find a text with a number over 10,000, or one that has been reposted since we passed 10,000, you must know the eBook number. You can get this from [GUTINDEX.ALL](https://www.gutenberg.org/dirs/GUTINDEX.ALL)
Once you know the number, you can find the directory containing all formats of it. Formally, the directory for the eBook will be contained in a hierarchy of directories, each one a single digit, being all the digits of the etext number except the last, in order. The name of the directory for the eBook itself will be the number of the eBook. But it's easier to see by example.
The files for eBook number 10214 will be found in the directory 1/0/2/1/10214 on the download site you choose. So, for example, if you are downloading eBook 10214 from our main site by HTTP from [http://www.gutenberg.org/dirs/](http://www.gutenberg.org/dirs/), you can just go to [http://www.gutenberg.org/dirs/1/0/2/1/10214/](http://www.gutenberg.org/dirs/1/0/2/1/10214/) and download whichever of the formats you want.
Or, instead of typing in the whole address, for numbers beginning with the digit "1", you can just go to [http://www.gutenberg.org/dirs/1/](http://www.gutenberg.org/dirs/1/) and navigate down the list of directories.
### Books before 10,000 — the old naming scheme
In short, just browse to:
[http://www.ibiblio.org/pub/docs/books/gutenberg/](http://www.ibiblio.org/pub/docs/books/gutenberg/)
choose the schedule year of the text (newly-posted texts will usually be in the latest year) and look down the list to find the filename you're looking for.
In general, you need to know:
1. the address of an FTP site
2. the schedule year of the text you want
3. the basename of the text you want.
The fastest and safest FTP site to use for this is [ftp://ftp.ibiblio.org](ftp://ftp.ibiblio.org), which is the first of our two primary posting sites (the other being [ftp://ftp.archive.org](ftp://ftp.archive.org) ). We post to these two sites, and then other sites copy from them at intervals, so with any FTP sites other than these two, the file may not be available immediately.
You can get the schedule year and basename of the text from its line in GUTINDEX.ALL. Let's take an example. The file
<pre>
Mar 2004 The Herd Boy and His Hermit, by C. M. Yonge [#32][hrdbhxxx.xxx]5313
</pre>
has been posted just a few hours ago as I write this. From the GUTINDEX entry, the schedule year is 2004, and the basename of the text is hrdbh.
We divide our texts into directories (folders) based on the schedule year, so this eBook will be in the directory for 2004, which will be named something ending in /etext04. All the directories are named etext plus the last two digits of the year. (Somebody's going to have to change that convention in about 87 years from now! :-) We currently have directories starting at 90, running through the 90s and then 00, 01, 02, 03, 04. All eBooks produced before 1991 are in the /etext90 directory, so if you're looking for
<pre>
Dec 1971 Declaration of Independence [whenxxxx.xxx] 1
</pre>
or
<pre>
Aug 1989 The Bible, Both Testaments, King James Version [kjv10xxx.xxx] 10
</pre>
you should look in /etext90.
As it happens, ibiblio supports both HTTP (web) and FTP access to the text, so we can just browse to [http://www.ibiblio.org/pub/docs/books/gutenberg/](http://www.ibiblio.org/pub/docs/books/gutenberg/) and choose the 2004 directory from there.
If you want to automate this, you could also use the more direct address [ftp://www.ibiblio.org/pub/docs/books/gutenberg/etext04/](ftp://www.ibiblio.org/pub/docs/books/gutenberg/etext04/)
The equivalent address for ftp.archive.org is [ftp://ftp.archive.org/pub/etext/etext04/](ftp://ftp.archive.org/pub/etext/etext04/)
Either way, we see a long page of files, in alphabetical order. Scroll down to the "H"s and look for hrdbh. We see four files with this basename:
<pre>
hrdbh10.txt
hrdbh10.zip
hrdbh10h.htm
hrdbh10h.zip
</pre>
Yes. Visit the "offline catalogs" page. The [GUTINDEX.ALL](https://www.gutenberg.org/dirs/GUTINDEX.ALL) file is the raw list of all books posted, and is updated approximately monthly.
## You don't have the eBook I'm looking for. Can you help me find it?
Sorry, no. We can suggest (see below) some other places to look for publicly accessible books on the Net, but we can't do the search for you.
Sorry, no. Please check with your local librarian.
## Where else can I go to get eBooks?
The [Online Books Page](https://onlinebooks.library.upeen.edu/) specializes in creating a list of all books on-line from any source. Searching there is a good place to start.
## Where else can I go to find eBooks?
The [Online Books Page](https://onlinebooks.library.upenn.edu/) specializes in creating a list of free books on-line from any source. Searching there is a good place to start.
If you're looking for commercial books, like current textbooks or bestsellers, you're not likely to find them here, since recent books are not in the public domain. For these, you should look for commercial booksellers on the Net — any search engine will direct you to some if you enter search terms like "shop ebook".
## I see some eBooks in several places on the Net. Do different people really re-create the same eBooks?
It does happen, but mostly by accident. Anyone experienced in eBook creation will first search the usual places to see whether anyone else has already transcribed the book they're interested in. If it has been transcribed, they will not duplicate the effort.
Etexts that are in the public domain very often float around the Net for years — stored in a gopher server here, posted to Usenet there, held on someone's local computer for a year or two and then reformatted as HTML and uploaded to a web site somewhere else. And this is good, because we want texts to be copied as widely as possible.
Public domain eBooks are fair game for anyone to copy, correct, mark up, package and post: that's what being in the public domain means.
Project Gutenberg eBooks are often quickly copied and reformatted, and posted on other sites like Blackmask Online and Steve Sakoman's site at [http://www.sakoman.net/](http://www.sakoman.net/). Unfortunately the Blackmask Online website is down due to a lawsuit about the copyright of the Doc Savage books.
If you find an eBook in many different places, the odds are good that it came from one original source, and was copied around.
It does sometimes happen that people duplicate the transcription of books already made into text. Sometimes it's because they didn't find the version already made. Sometimes they have a different edition, and want to transcribe that. Mostly, though, we all try not to do more work than we have to.
## About Using the Web Site
### Why couldn't I reach your site? (or: Why is your site slow?)
There may be a bottleneck somewhere else between you and the site. If at first you don't succeed, don't tell us, just try, try again. The correct address is:
http://www.gutenberg.org/
### I get an error when I try to download a book.
Many FTP sites throughout the world hold the whole Project Gutenberg archive of texts. An FTP site is just a computer on the Internet that specializes in holding files for download and sending them to people on request. You can find a list of FTP sites that hold Gutenberg texts at [http://www.gutenberg.org/MIRRORS.ALL](http://www.gutenberg.org/MIRRORS.ALL).
When you're searching or browsing for titles and authors, you're on this Project Gutenberg site, but if you choose one of the mirrors, or another method of downloading, when you click on the book to download it, you are connected to an FTP (or HTTP) site. At the time you click on the filename, your browser contacts an FTP site and tries to download the file from there. If you get an error, it could be because the FTP site is busy, or because there's a network traffic bottleneck between you and that FTP site, or because the text you're looking for is missing from that FTP site.
Usually, the easiest solution is to choose another FTP site to download your text from. Go to the Search page, choose a different FTP site, and search again for your text.
Tip: You should always try to choose the FTP site closest to you. Not only are you helping to minimize Net traffic by choosing a nearby site, but your file will download faster!
If all else fails, note the year and the filename of the book you want, if it's below number 10,000, or its number, if above 10,000, choose an FTP site from this list and click on one of them. Then browse your way through the listings to the file you want.
For example, if you find Lady Susan by Jane Austen, you will see that it was published by Gutenberg in 1997, and its filename is lsusn10.txt, so browse to one of the FTP sites, choose the directory called /etext97 and click (or right-click and Save, depending on your browser) on the file lsusn10.txt. Or, in the case of Clarissa, Volume 6 by Richardson, which is #11364, you will find it in the directory /1/1/3/6/11364
### I searched for a book I know is in Project Gutenberg, but got no results.
First go to the Advanced Search page. Sometimes you may miss in searching because of alternative spellings, so try searching separately using just one word in Author or Title. Read the Search Tips.
If that fails, you can Browse through the site catalog. Let's say you're looking for **The Wandering Jew** by Eugene Sue.
Go to the [Online Catalog](https://www.gutenberg.org/catalog/) page.
Once on this page, click on: "S" in "Authors:"
You should now see a list of all of the Authors whose last name starts with "S". Scroll down till you find the direct links to the Sue, Eugene works.
Click on the work you are interested to, then click on the file link found on the page you were brought to, Etext 3350 when selecting the work, as immediately above.
On this page, above the excerpt, there are download links:
Click on the link of your choice - plain text or zipped, and from ibiblio.org or other.
If you choose one of the mirrors, you are then brought to a new page, asking you to select an "Download site". Further details on how and why to choose an "FTP Site" are available on this page.
Select a site, and the file will be downloaded, or offered for download, depending on which format you selected and which browser you use.
If you can't find your text either way, the book has not been cataloged. If you know that the book has been posted recently, and maybe hasn't made it into the catalog yet, read: [ How can I download a PG text without using the web catalog?](how-can-i-download-a-pg-text-without-using-the-web-catalog)
If even this doesn't help, don't despair! We don't have it, but it may be elsewhere on the Web. Go to the major search engines and try there. You can also try looking in the Book Search section of [The Online Books Page](https://onlinebooks.library.upenn.edu), and if you have no luck with that, you might be able to find it listed as being In Progress somewhere on their [Books In Progress and Requested](https://onlinebooks.library.upenn.edu/in-progress.html) page.
### Can I copy your website, or your website materials?
No.
See the Permissions page. Basically, the content (eBooks) are available for unlimited free redistribution. There are limitations on commercial (for-fee) redistribution, derivative works, and our copyrighted works.
Keeping the PG site updated with the latest e-text releases is an ongoing job, and our experience is that people, however well-intentioned, do not keep copies up to date. We want there to be one clear source for people seeking the latest Project Gutenberg information, and we think that having a lot of out-of-date copies and partial copies scattered around the net would be a Bad Thing.
The website is not currently set up for copying, and automated anti-abuse tools might prevent mass downloads. The offline catalogs,
mentioned above, include all metadata.
We welcome mirrors and copies of our e-texts, in new FTP sites (see: Can I become an FTP mirror?), but the main web site itself is copyrighted and may not be copied.
Also see the Mirroring How-To, if you want to make a copy of large portions of the collection. The Roboting How-To has directions
for subsets of the collection.
### Your site doesn't look right in my browser. I clicked on a button, and nothing happened.
We take a lot of trouble to ensure that our website uses only valid, standard HTML, and we're not even slightly tempted to use glitzy features that look good in one browser but don't work in another, so we can promise you that our site is not the problem.
If you actually clicked on a button, like the Search button, and nothing happened, you might be behind a proxy or web filter that doesn't like you making POST requests. If you have a web filter switched on, turn it off, reload the page and try again.
### What does that thing about "mirror sites" mean?
Our texts are not actually held on the website. The website just holds an index; the files themselves are held on many sites throughout the world, called FTP sites. When you have found the book you're looking for, and you make that final click to get it, you're not actually talking to our website any more — you are transferred to the FTP site you selected. Some FTP sites are near you; some are far away. Some may be faster than others, even if they are about the same distance; some may have temporary technical problems.
You should usually select the FTP site nearest you. If you find you're having problems with that one, you can select another.
### What exactly is an FTP site anyway?
FTP stands for File Transfer Protocol, one of the oldest and most reliable protocols of the internet. This is the method by which a file can be copied from one computer to another.
We now have some HTTP (web) sites containing eBooks as well, including our main site at http://www.gutenberg.org/. You can use either HTTP or FTP.
An FTP site, or FTP server, is a computer that holds files that people can upload and download. In the case of PG, the Posting Team upload our texts when they're ready to two main FTP servers, ftp://ftp.ibiblio.org and ftp://ftp.archive.org , which serve as our master copies.
Other FTP sites around the world automatically download the files from these master sites, so they have a full set of PG publications for you to download. Because they only check for updates and new files at intervals, some FTP sites may be a day or two behind. Some FTP sites don't have space available for everything, so they may hold only the zipped versions of the files. But most FTP sites will have the entire PG collection. These are called FTP "mirrors", since they are a copy of the original.
Many FTP sites exist that offer a full PG mirror but are not on our FTP sites list. Commonly, these are in schools, where they serve the local students, but don't have enough bandwidth to offer downloads to worldwide users.
### Can I become an FTP mirror?
Yes! We're always looking for more FTP mirrors.
If you manage an FTP site with 100 GB or so of free space, please check our [Contact Information page](/about/contact_information.html) and contact the appropriate person, who will make the arrangements for you.
### Can I make a private FTP mirror for my school, library or organization?
Yes.
We like all FTP mirrors to be open to as many people as possible, but we know that not all schools have the resources to be a public mirror, so we welcome all mirrors.
And anyway, you don't even have to ask, because we don't control what happens to our texts once we post them!
### When I clicked on the file I want, nothing happened.
When you select a file for download, your request goes to the FTP site you selected, not to our website. If the FTP site you selected is having problems, or if there is the Net version of a traffic jam between you and it, you may have problems downloading.
Select a different FTP site (see: [What does that thing about "mirror sites" mean?](What does that thing about "mirror sites" mean?) and try again.
### How many texts are downloaded through the web site?
Go to the [Top 100 Page]().
### What are the most popular books?
### What are the most popular books, and how many times are they downloaded?
Go to the [Top 100 Page]().
These numbers vary a lot. When a movie based on a classic is released, downloads of that eBook go through the roof!
## About Downloading and Using Project Gutenberg eBooks
### Should I download a ZIP or a TXT file?
If you know how to unzip a file, then downloading the zip is faster. For some non-text eBooks that contain multiple files, like HTML with included images, only a zip file may be available. For some other formats, like MP3 or MPEG, there may not be a zipped version available because the native format of the file is already compressed enough that zipping it doesn't save much.
### I've got a ZIP file. What do I do with it?
Unzip it.
If you want a free program, you could try the open source Info-Zip software available at [http://www.ctan.org/tex-archive/tools/zip/info-zip/](http://www.ctan.org/tex-archive/tools/zip/info-zip/) for Mac, MS-DOS, Unix, Windows and just about everything else you might have.
If you want a commercial program, PKZIP from [http://www.pkware.com](http://www.pkware.com) and WinZip from [http://www.winzip.com](http://www.winzip.com) are among many popular shareware utilities that allow you to unzip files.
Mac-users using Stuffit Expander may like to set a preference (File / Preferences / Cross Platform) to "Convert text files to Macintosh format … When a file is known to contain text". This gets rid of strange characters (linefeeds), which are not wanted on a Mac, at the beginnings of lines. MacZip is another free program for Macs. Mac users can also try ZipIt or other shareware programs available from the Info-Mac archives, e.g. from [ftp://mirrors.aol.com/pub/info-mac/_Compress_&_Translate/](ftp://mirrors.aol.com/pub/info-mac/_Compress_&_Translate/).
### I tried to unzip my file, but it said the file was corrupt, or damaged.
The chances are that it didn't download correctly. Try downloading it again. If you don't succeed the second time, try downloading the unzipped version.
### I see gibberish onscreen when I click on a book.
To save download time, our etexts are stored in zipped form as well as text form. Zipped files are smaller, and take less time to transfer to your computer, but you need a program to unzip them. If you try to view a zipped file directly, it looks like gibberish.
You can recognize zipped files easily because their filenames end in .zip.
If this happens, either make sure you're asking your browser to Save the file rather than display it (often, you right-click the file and choose Save) or else click on the version of the file that ends in .txt instead of .zip. You don't need a zip program to view .txt files.
Looking at a zip rather than a text file is by far the most common reason for this problem, but there are some others. If you're quite sure that you're not looking at a zip file, then it could be that the file you downloaded is in a character set that your viewer doesn't recognize, like Big-5 [V.78] for Chinese texts, or Unicode [V.77]. If this is the case, you will have to find a viewer that works on your computer for the specified character set. We may also have an ASCII version of the same text available for you — we do try to have ASCII versions for everything [G.17], but some languages, like Chinese, just cannot be sensibly expressed in ASCII.
If you can see most of the characters, enough to be able to make out the text, but there are regular gibberish characters, black squares, empty boxes or obviously missing characters scattered about through words, then you are probably looking at an "8-bit" text [V.79], with accented characters, and your viewer doesn't handle the character set. See the FAQ "I can read the text file, but a few characters appear as black squares, or gibberish" [R.31].
If there are a very few gibberish characters, black squares or obviously missing characters in the text, then it's likely that this was intended to be a 7-bit text, but a few 8-bit characters like the British pound symbol or accented letters slipped through.
### Can I download and read your books?
Yes. That's what Project Gutenberg is all about — making texts available free to everyone!
@ -307,21 +114,22 @@ Most Project Gutenberg e-texts are in the public domain. You can do anything you
Some Project Gutenberg e-texts have copyright restrictions. You can still download and read these, but you may not be allowed to reproduce, modify or distribute them. When browsing or searching on the site, you will see these copyright-restricted texts indicated in the listings. For fuller information about them, download the e-text and read the header or footer of the file, which will spell out the conditions in detail.
See the Permissions How-To for details, especially for commercial (non-free) redistribution and derivatives. For people who want
to read, enjoy, and share these eBooks, the only limitation is that not all content is in the public domain around the world. The
"Terms of Use" (on the website and within every eBook) remind you to check the laws of your country, before accessing a PG eBook.
### Does Project Gutenberg know who downloads their books?
No, and we don't want to!
Like any Internet transfer, our sites have to know the IP addresses that contact them; without that, no communication is possible. But we do not trace, hold or examine them beyond what is necessary to deal with any problems or maintain logs or statistics. We never identify IP addresses with people.
Like any Internet transfer, our sites have to know the IP addresses that contact them; without that, no communication is possible. But we do not trace, hold or examine them beyond what is necessary to deal with any problems or maintain logs or statistics. We never identify IP addresses with people. We never ask who you are, or ask you to register, to obtain eBooks.
Further, we encourage people, sites, schools around the world to mirror, or copy, our texts to their sites. Once that happens, we have no control over them, and we never have any idea who or even how many people access them after that.
Even further, we encourage people to distribute the texts on disks, CDs, paper, and any other storage format they can find. We encourage them to convert the texts to other formats, and share them.
Even further, we encourage people to distribute the texts by email, file sharing, disks, CDs, paper, and any other storage format they can find. We encourage them to convert the texts to other formats, and share them.
For most people reading this, anonymity is probably not an issue, but you may live in a place or time where reading Paine, or Voltaire, or the Bible, or the Koran, is considered suspicious or even subversive. We don't know who you are, and what we don't know, we can't tell.
Currently (2004), by means of DRM (Digital Rights/Restrictions Management) many commercial publishers can make a list of exactly who is reading which of their eBooks. We don't know, and we don't want to know.
### I've found some obvious typos in a Project Gutenberg text. How should I report them?
(This section was updated in October 2019)
Errata reports, typos, etc. are welcome and appreciated. Please visit the contact information page for the current correct email address to send reports to.
@ -398,7 +206,6 @@ For example:
him ==> him.
</pre>
may take 10 minutes or more of the editor's time searching for "to him" if there are 100 instances of "to him" in the eBook.
Experienced Project Gutenberg contributors may find that when there are a great many suspected errors it may be easiest all round just to submit a corrected version of the text and html files.
@ -425,136 +232,15 @@ Sometimes, even though you've noticed only one or two small typos, one of the Po
If the text needs a lot of changes, we may post a new EDITION [R.35] of it, with a new filename: e.g. abcde10.txt may become abcde11.txt. In this case, you will receive a copy of the e-mail sent to the posted list announcing the new file. Our current rule of thumb is that we create a new edition when we make twelve significant changes, but we judge each on a case-by-case basis, and especially will usually not make a new edition if the original was posted recently.
### I've got the text file, and I can read it, but it seems to be double-spaced or it has control characters like ^J or ^M at the end of every line.
This is most often seen on Mac or Linux. If you want to dig into why this effect happens, see the FAQ "Why use a CR/LF at end of line?" [V.85].
Perhaps viewing it in a different editor or viewer will help, but it's usually easiest just to globally replace all of the control characters (if you see them) with nothing, or to replace all double line-ends with single line-ends.
### When I print out the text file, each line runs over the edge of the page and looks bad.
If you have a file ending in .txt from Project Gutenberg, it is usually formatted with about 70 characters per line, and with a Carriage Return/Line Feed pair (also known as a "Hard Return" or a "Paragraph Mark") at the end of every line.
This is the most widely accepted format for text files, but it's not ideal on all computers and all programs. 70 characters per line means that if you are using an unusually large or small font to print it, lines may wrap around or not reach across the page. The hard return means that on some systems, the lines may appear double-spaced.
Unfortunately, we can't advise you how best to format texts on all systems, mostly because we don't know every system! Here are a couple of tips you might try:
If your font is too big or too small, try setting the font to Courier size 10 or Times size 12. It may not be ideal, but it mostly works.
In a word processor, you may be able to remove the Hard Returns, but beware! if you remove too many, the whole text will become one paragraph. One common formula for removing the HRs goes like this:
First, all paragraphs and separate lines should be separated by two HRs, so that you can see one blank line between them. Where they aren't, as in the case of a table of contents or lines of verse, add the extra HRs to make them so.
Replace All occurrences of two HRs with some nonsense character or string that doesn't exist in the text, like ~$~.
Replace All remaining HRs with a space.
Replace your inserted string ~$~ with one HR.
### I can read the text file, but a few characters appear as black squares, or gibberish.
The text is using some character set that your editor or viewer isn't. For example, the text is using ISO-8859-1, and your viewer is using Codepage 850 — or vice versa. You can see the plain ASCII characters, but non-ASCII characters like accented letters display as nonsense.
Look at the top of the file for a clue to the character set encoding: if it's there, it may help you to find which editor, or font, or viewer you should be using.
### Can I get a handheld device for reading PG texts? Which device should I get?
To read eBooks on a handheld, you need three things: the eBook content itself (which you can get from PG and other sites), a device (which I will sometimes call a PDA, even though technically, the RocketBook isn't a PDA) and the reader software that runs on the PDA.
In mid-2002, there are three main families of handheld devices people use for reading eBooks: Palms, Pocket PCs and RocketBooks (or their successor, REB1100s). In general, it is possible to use any of these in combination with any common type of personal computer.
Palms are very common, especially when you count not just the Palm [http://www.palmone.com/us/](http://www.palmone.com/us/) itself, but PalmOS-based devices from other manufacturers, like:
the Franklin eBookMan [http://www.franklin.com/ebookman/](http://www.franklin.com/ebookman/)
the Handspring Visor [http://www.handspring.com](http://www.handspring.com)
the Sony Clié [http://www.sony.com](http://www.sony.com)
Because of the number of makers of PalmOS-based devices, you can buy them with lots of combinations of features — color screen, audio, different memory sizes. Of course, Palms have other applications besides eBook reading. Palms are the smallest and most portable of the three classes, and tend to have the best battery life for travelling, but they also have the smallest screen. Just about all reader software will run on Palms, except the Microsoft Reader, which runs only on Pocket PCs, but you don't need the Microsoft Reader for Project Gutenberg eBooks.
In Pocket PCs, the Compaq iPaq [http://www.hp.com](http://www.hp.com) and the Dell Axim [http://www.dell.com](http://www.dell.com) are by far the most common at the end of 2003. More expensive and bulkier than a Palm, they have a bigger screen. Like the Palms, they can perform many functions besides reading eBooks. Only Pocket PCs can support the Microsoft Reader, but this is not necessary for reading Project Gutenberg eBooks.
The RocketBook, and its successor the Gemstar REB1100, are quite different from the others. These were built specifically for reading eBooks, and do not have additional functions. They are not, technically, PDAs. Their screens are bigger, and excellent for reading, but do not offer color. They also don't offer a choice of readers — the dedicated reader is built-in to the device. Both of them require the eBooks you load to be formatted for their reader, and files made for them usually have the extension .rb for RocketBook. The REB1100 did not come with the RocketLibrarian, which is the program you run on your PC to turn an etext into a RocketBook file, but people are still making .rb files, and the RocketLibrarian is still available and popular among an enthusiastic group of Rocket users. (The REB1200 is entirely different from the REB1100, and, as far as we know, PG etexts cannot easily be transferred to it.)
In late 2003, Gemstar discontinued their eBook reader range, but there are many still around.
In summary, the Rocket/REB1100 is a dedicated reader, with a good screen, but limited to what it does.
Palms are relatively cheap and common, with a wide range of options, and the capacity to function as PDAs as well. They can run all common readers except the Microsoft one. .
The iPaq [http://www.hp.com](http://www.hp.com) has a good color screen, but is bulkier than a Palm, and can run lots of readers, including the Microsoft one, but not all Palm readers are available for Pocket PC. Like Palms, the iPaq can do other jobs besides displaying eBooks.
Different people make different choices among these for reading their eBooks, and they all work well; it's a matter of personal taste.
### How can I read a PG eBook on my Palm?
These steps work for all devices running the Palm OS.
1. Install the free [Plucker Viewer](http://www.plkr.org/dl)
2. Download the eBook in the "plucker" format to your desktop
3. Sync the plucker file to the Palm using your favorite desktop application
### How can I read a PG eBook on my PDA (not Palm)?
To read a book on your PDA, you need to get the file into a format that your reader software understands. Each PDA reader program will work only with a specific format of file. Some will read several formats, but, in general, it's a jungle of competing options.
Unless you use a Rocket or REB1100, you will need to install at least one reader program, and many veteran readers install two or three to deal with different formats. There are many of them available. One of the most used is the [Mobipocket Reader](http://www.mobipocket.com).
Further, the process may be different depending on which reader software you're using. Each format that a reader understands has one or more converter programs that run on your PC, and turn the plain text file into that format. So in general, you have to:
1. Download the PG text
2. Edit the text for the layout the converter wants (often HTML).
3. Use the converter to create a file of the format the reader wants.
4. Transfer the converted file to your PDA.
If all this sounds too complicated, remember that many people take and convert PG texts into many formats, and offer them for download from their sites. Of course, there is no guarantee that someone will have converted the particular eBook you want, but there are lots of options. Try [Blackmask](http://www.blackmask.com), which lists thousands of texts already converted for Mobipocket, iSilo, RocketBook and the Microsoft Reader.
There are many other sites that serve pre-converted PG texts.
[MemoWare](http://www.memoware.com) is also a useful resource for converted eBooks, and has lots of information, including an excellent [map of the readers and formats jungle](http://www.memoware.com/mw.cgi/?screen=help_format)
Steve Sakoman's site at [http://www.sakoman.net/](http://www.sakoman.net/) takes plain texts from PG and produced automated conversions to HTML and PalmDOC PDB.
If you're "rolling your own", you'll probably need to convert our plain texts to HTML at some point, because a lot of converters require HTML as input, and this is a common theme in readers' explanations of how they get texts onto their PDAs. Don't panic! You don't have to be a HTML wizard to do this — in fact, you don't need to know anything about HTML at all! Usually, it's just a matter of removing some line ends and Saving As HTML. You won't get a lot of fancy markup, or images out of thin air, but you will get the book.
One of the main things you usually have to do in making HTML is unwrap the lines. If you're making your HTML manually, this is usually done by replacing two paragraph marks with some nonsense marker like @@Z@@, replacing all single paragraph marks with a space, and replacing the nonsense marker with a paragraph mark. After unwrapping, the text can just be Saved As HTML.
This has the drawback that lines that shouldn't be wrapped — like poetry, tables or letter headings, will be wrapped. You may have to go through the text and add extra line breaks for these.
There are some applications that specifically assist with auto-converting text into HTML:
- GutenMark [http://www.sandroid.org/GutenMark](http://www.sandroid.org/GutenMark) was specifically written for the purpose, and knows enough about PG conventions to do a very good job.
- InterParse [http://www.interparse.com](http://www.interparse.com) is a Windows-based generic text parser that is very easy and intuitive to use.
- The World Wide Web Consortium lists some other options at [http://www.w3.org/Tools/Misc_filters.html](http://www.w3.org/Tools/Misc_filters.html)
If you're using a RocketBook or REB1100, you don't have either the choices or the confusion to deal with. One of our volunteers who uses a RocketBook offered this recipe for getting a PG text onto a RocketBook:
On converting to Rocket:
1. Download text file.
2. Using your utility for showing formatting, enter your word processing program's edit mode.
3. Replace all double paragraph marks with some nonsense sequence that can't possibly actually be there, such as @@Z@@.
4. Replace all single paragraph marks with one single space (enter).
5. Replace your nonsense sequence with one paragraph mark.
6. Convert all your double spaces to single spaces. Repeat this until you get "0" for how many replacements were made.
7. Save in HTML.
8. Go into your Rocket Librarian. Use "import file using Rocket Librarian." Go and pick up the file, which will be automatically converted to .rb in this process.
This sounds long, but it usually takes me under three minutes except for a very long text. I've never taken longer than five minutes. You can just go in and pick up the text file with Rocket Librarian, but what you get onscreen doing this looks very odd. Steps 2-7 are not essential, and if I'm in a hurry to read something once I might skip them, but if it's something I know I want to keep I use them.
This formula is not ideal for poetry or blank verse — if you want to keep the lines unwrapped, you should avoid removing the paragraph marks.
Another volunteer, who reads on Mobipocket [http://www.mobipocket.com](http://www.mobipocket.com) offered this suggestion:
I use the MobiPocket Publisher, available free from [http://www.mobipocket.com](http://www.mobipocket.com). It wants to take a HTML file as input, so the first thing I have to do is convert my PG text to HTML.
I usually do this by running GutenMark, available at [http://www.sandroid.org/GutenMark](http://www.sandroid.org/GutenMark). I can also do it in Microsoft Word using the following sequence:
- Edit / Replace / Special and choose Paragraph Mark twice (or, from replace, you can type in ^p^p to get two Paragraph Marks) and replace with @@@@. Replace All. This saves off real paragraph ends by marking them with a nonsense sequence.
- Now Replace one Paragraph Mark (^p) with a space. Replace All. This removes the line-ends.
- Finally, replace @@@@ with one Paragraph Mark. Replace All. This brings back the Paragraph Ends.
- Now I can Save As HTML.
GutenMark does a better job of converting to HTML than my simple Word formula, since it recognizes standard PG features, and sometimes Mobipocket doesn't like the HTML produced from Word — it complains of a missing file, or doesn't recognize quotation marks.
Having got my HTML file, I open Mobipocket Publisher, choose "Project Gutenberg", Add the File I created, and just Publish it to MobiPocket .PRC format. Then I pick it up on my iPaq the next time I sync. The whole process takes two or three minutes, and the results, since I discovered GutenMark, are good.
I recently came across InterParse 4 at [http://www.interparse.com](http://www.interparse.com). It doesn't have the built-in knowledge of GutenMark, so the results aren't as good, but it's really easy to use, and you can see the effect of your changes onscreen as you do it. For most PG books, all you have to do is just Open the text file and choose Options / Remove all CRLFs (Except at Paragraph End), then Convert / Text to HTML and Save As the HTML filename you want. Quick and painless.
## About the Files
### What types of files are there, and how do I read them?
The vast majority of our files are plain text. You can read these with any editor or text viewer or browser. Some are HTML. You can read these with any browser.
The vast majority of our eBooks are available as plain text. You can read these with any editor or text viewer or browser. Some are HTML. You can read these with any browser.
For a fuller listing of other file types, and how to read them, please see the Formats FAQ [F.2].
### What do the filenames of the texts mean?
We have to divide this question into two answers, for books up to 10,000, and books after 10,000 (or older books reposted after we hit 10,000).
#### Books after 10,000 — the new naming scheme
Since eBook number 10,000, we name our files based on the PG etext number; thus, the base of the name simply reflects the order in which the book was posted. 12345.txt is just the 12,345th book posted.
Also, when we correct an older book, we may repost it into the new naming scheme rather than just replacing it in the old scheme. When we do this, its naming conventions are the same as if it had been numbered after 10,000, and, additionally, we add a subdirectory "old/", into which we put all of the older files, so that they are preserved for anyone who wants to examine them. In this way, we will eventually move all e-books to the new naming scheme.
We name our files based on the PG eBook number; thus, the base of the name simply reflects the order in which the book was posted. 12345.txt is just the 12,345th book posted (more or less: there are some gaps in the number sequence). Friendlier filenames, or
variations on the eBook #, are sometimes included.
Formats or character sets other than plain ASCII then get extensions added to indicate the type of file. Character sets get digits; formats get letters. The most common of these are:
- -0 for Unicode
@ -569,8 +255,7 @@ Thus, eBook number 12345 may — fairly typically — have the files 12345.txt,
Other formats get appropriate three-letter extensions, like -pdf.
The complete set of naming rules for post-10K eBooks is:
## Please explain the folder (directory) structure:
1. Directory structure: the directory for the eBook shall be contained in a hierarchy of directories, each one a single digit, being all the digits of the etext number except the last, in order. The name of the directory for the eBook itself shall be the number of the eBook. Thus, eBook #12345 will be contained in:
<pre>
@ -646,15 +331,6 @@ The Release Date in the standard header will be the month and year of the actual
- Example: "12345-pdf-readme.txt" for the file 12345-pdf.pdf Note: If we were able to add the standard header prior to creating the PDF file, it could be distributed as any other editable format without a readme.
- Example: "12345-m-readme.txt" for the files 12345-m-001.mp3, 12345-m-002.mp3, etc.
7. The GUTINDEX file(s) will have entries of the form:
Title, by Author eBook#
eBook # will be in 5 digits, followed by a "C" if copyrighted and "*" if reserved. "by " will be omitted if there is not enough space. Any additional data, such as a translator or subtitle, will be on a following line or lines surrounded by square brackets [] and indented by two spaces.
GUTINDEX will have approximate date indicators such as:
MARCH 2004: 822 eBooks
The following is an example of etext# 12345, assuming it has ASCII, 8-bit and Unicode text files, a HTML and a HTML broken into pages, an XML, PDF, TeX, and LIT formats, and MP3. Assume that we couldn't edit the LIT, and so had to add a "readme" for that containing the header as in point 6 above.
The directory 12345 for the eBook will be at
@ -694,57 +370,6 @@ and in its subdirectories the further files
1/2/3/4/12345/12345-m/12345-m-001.mp3
1/2/3/4/12345/12345-m/12345-m-002.mp3
#### Books up to 10,000 — the old naming scheme
Older PG files are named for the text, the edition, and the format type.
Nearly all of these PG files are named in "8.3" format — that is, up to eight characters, a dot, and three more characters. (It should have been all of them, by the rules, but we had to break a few.)
The first five characters in the filename are simply a unique name for that text, for example, "Ulysses" by Joyce begins with "ulyss".
If the text has been posted as both a 7-bit and 8-bit text, then the first character of the filename will be a 7 or an 8, to indicate that. For example, we have both 7crmp10 and 8crmp10 for Dostoevsky's Crime and Punishment.
The 6th and 7th characters of the name are the edition number — 01 through 99. We normally start at edition 10 (1.0); numbers lower than that indicate that we think the text needs some more work; numbers higher than that mean that someone has corrected the original edition 10.
The 8th character of the filename, if it exists, indicates either the version or the format of the file. When we get a different version of the text based on a different source, we give it an a, b, c, as for example if the text is from a different translation. Where we have posted a text in a different format, we also add an eighth character — "h" for HTML, "x" for XML, "r" for RTF, "t" for TeX, "u" for Unicode are established formats. There have been some experimental postings with "l" for LIT, and "p" for either PRC or PDB.
So, for example:
<pre>
7crmp10 is our first edition of Crime and Punishment in plain ASCII
8sidd10 is our first edition of Siddhartha, as an 8-bit text
dyssy10b is our first edition of our third translation of Homer's Odyssey, in plain ASCII
jsbys11 is our second edition of Jo's Boys, in plain ASCII
vbgle10h is our HTML format of our first edition of Darwin's Voyage of the Beagle
7ldv110 is our 7-bit ASCII version of the first volume of the Notebooks of Leonardo da Vinci
</pre>
To make it worse, we don't always stick to these rules, for example:
<pre>
1ddc810 is our first edition of the first book of Dante's Divina Commedia in Italian, as an 8-bit text
80day10 is our first edition of Verne's Around the World in 80 days, in plain 7-bit ASCII in English.
emma10 is our first edition of Jane Austen's "Emma" — with a 4-character basename instead of 5.
</pre>
Some series have special, non-standard names. Shakespeare is named with a digit representing the overall source (First Folio, etc), then "ws", then a series number, so for example 0ws2610, 1ws2610 and 2ws2610 are all versions of "Hamlet". The Tom Swift series is named with a two-digit prefix denoting the series number, then "tom", so for example 01tom10 is "Tom Swift and his Motor-Cycle".
And what should we do with a text from a different source that is formatted as HTML? For example, if dyssy10b is the name of the third translation, what should the HTML version be named? dyssy10bh is obvious, but it uses 9 characters.
The problem, of course, is that we are trying to fit a lot of information into an 8-character filename, and as the collection grows, and the number of formats and versions increases, we come across more pressure on filenames, so while the filename is a good guide to the contents, it's not definitive.
### What is the difference within PG between an "edition" and a "version"?
We give the name "edition" to a corrected file made from an existing PG text. For example, if someone points out some typos in our file of "War and Peace", we will fix them, and, if enough are found to warrant a "new edition", then instead of just replacing the file wrnpc10.txt, we may make a new file wrnpc11.txt, and leave the original alone. A new edition is always filed under the same year and etext number as the original — it's just an update.
We give the name "version" to a completely independent e-text made from the same original book, but a different source. For example, Homer's Odyssey was translated by many different people, but they all worked from the same book. The translations by Lang, Butler, Pope and Chapman are very different, but they all come from the same root.
Thus, these are all "versions" of Homer's Odyssey. We give them all the same basename — dyssy — and each gets a new number, but we keep the original basename, and add a letter to the filename to indicate that they are "versions" of the same original book:
<pre>
dyssy10.txt Butler's Translation
dyssy10a.txt Butcher & Lang's Translation
dyssy10b.txt Pope's Translation
</pre>
The differences don't have to be as extreme as this for us to create a new version. "Clotelle"/"Clotel", for example, was a book published multiple times in English by William Wells Brown, and each time, he changed the text. We preserve three different texts of the same book as different versions: clotl10 clotl10a and clotl10b.
### What is the difference between an "etext" and an "eBook"?
If there is any, it seems to be in the eye of the Marketing Department! Michael Hart started the whole thing, and coined the word "Etext". The term "eBook" is gaining in popularity, even for texts that are not full books, so we've started using that more now.
### What are the "Etext/Ebook numbers" on the texts?
These are simply a series of numbers. We give one to each etext as it is posted, so the earliest etexts have low numbers and later etexts have higher numbers. Etext number 1 is the Declaration of Independence, the first text that Michael Hart typed in to the mainframe that he was using in 1971.
@ -753,7 +378,7 @@ A few numbers are reserved for books that we hope to have in the PG archive some
When we improve an text by making some corrections, we call it a new EDITION, and it keeps the same etext number, but when we post a different VERSION of the same text, from a different paper book — like different translations of Homer's Odyssey — each new version gets a new etext number.
### What do the month and year on the text mean?
Project Gutenberg sets a production target for itself. The idea is that we try to produce X texts in a month, and in books before #10,000, we dated the texts according to what month of our schedule they appear in. For example, if our target for September 2000 was 50 texts, and we actually produced 55, then the last five would be dated October 2000, and we'd get a head-start on the month. At the time of writing the original FAQ, in July 2002, that target was the publication of 200 books per month. However, our actual production far outpaced our targets, with the result that the "head-start" had accumulated so much that in July 2002, we were releasing books scheduled for March, 2004!
Project Gutenberg used to set a production target for itself. The idea was to produce X texts in a month, and in books before #10,000, we dated the texts according to what month of our schedule they appear in. For example, if our target for September 2000 was 50 texts, and we actually produced 55, then the last five would be dated October 2000, and we'd get a head-start on the month. At the time of writing the original FAQ, in July 2002, that target was the publication of 200 books per month. However, our actual production far outpaced our targets, with the result that the "head-start" had accumulated so much that in July 2002, we were releasing books scheduled for March, 2004!
The fact that we were so far ahead of schedule makes this quite confusing for newcomers. If it bothers you, just don't think about it! But at least it's better than being behind schedule. We didn't always produce so many books. In the September 1994 newsletter, Michael Hart wrote: