gutenbergsite/site/about/background/history_and_philosophy.md~

---
layout: default
title: History and Philosophy | Project Gutenberg
permalink: /background/history_and_philosophy.html
---
The History and Philosophy of Project Gutenberg, by Michael Hart
================================================================

© August 1992

## The Beginning

Project Gutenberg began in 1971 when Michael Hart was given an
operator's account with $100,000,000 of computer time in it by the
operators of the Xerox Sigma V mainframe at the Materials Research
Lab at the University of Illinois.

This was totally serendipitous, as it turned out that two of a four
operator crew happened to be the best friend of Michael's and the best
friend of his brother. Michael just happened "to be at the right place
at the right time" at the time there was more computer time than
people knew what to do with, and those operators were encouraged to do
whatever they wanted with that fortune in "spare time" in the hopes
they would learn more for their job proficiency.

At any rate, Michael decided there was nothing he could do, in the way
of "normal computing," that would repay the huge value of the computer
time he had been given ... so he had to create $100,000,000 worth of
value in some other ma nner. An hour and 47 minutes later, he
announced that the greatest value created by computers would not be
computing, but would be the storage, retrieval, and s earching of what
was stored in our libraries.  </p><p>He then proceeded to type in the
"Declaration of Independence" and tried to send it to everyone on the
networks ... which can only be described today as a not so narrow miss
at creating an early version of what was later called the "Internet
Virus."

A friendly dissuasion from this yielded the first posting of a
document i n electronic text, and Project Gutenberg was born as
Michael stated that he had "earned" the $100,000,000 because a copy of
the Declaration of Independence woul d eventually be an electronic
fixture in the computer libraries of 100,000,000 o f the computer
users of the future.

## The Beginning of the Gutenberg Philosophy

The premise on which Micheal Hart based Project Gutenberg was:
anything that can be entered into a computer can be reproduced
indefinitely ... what Micheal termed "Replicator Technology" The
concept of Replicator Technology is simple; once a book or any other
item (including pictures, sounds, and even 3-D items can be stored in
a computer), then any number of copies can and will be
available. Everyone in the world, or even not in this world (given
satellite transmission) can have a copy of a book that has been
entered into a computer.

This philosophical premise has created several offshoots: 1.Electronic
Texts (Etexts) created by Project Gutenberg are to be made available
in the simplest, easiest to use forms available.

Suggestions to make them less readily available are not to be treated
lightly.  Therefore, Project Gutenberg Etexts are made available in
what has become known as "Plain Vanilla ASCII," meaning the low set of
the American Standard Code for Information Interchange: ie the same
kind of character you read on a normal printed page — italics,
underlines, and bolds have been capitalized.

The reason for this is that 99% of the hardware and software a person
is likely to run into can read and search these files.

Any other system of etext storage is going to fall short of an
audience of 99%.

This does not mean there are not other valid mean of doing the etext
business ... after all, over half the computers are DOS, so one could
address a wide audience by just doing DOS. Plain Vanilla ASCII,
however, addresses the audience with Apples and Ataris all the way to
the old homebrew Z80 computers, while an audience of Mac, UNIX and
mainframers is still included.

In this same vein, Project Gutenberg selects etexts targeted a bit on
the "bang for the buck" philosophy ... we choose etexts we hope
extremely large portions of the audience will want and use
frequently. We are constantly asked to prepare etext from out of print
editions of esoteric materials, but this does not provide for usage by
the audience we have targeted, 99% of the general public.

Also in the same vein, Project Gutenberg has avoided requests,
demands, and pressures to create "authoritative editions." We do not
write for the reader who cares whether a certain phrase in Shakespeare
has a ":" or a ";" between its clauses. We put our sights on a goal to
release etexts that are 99.9% accurate in the eyes of the general
reader. Given the preferences your proofreaders have, and the general
lack of reading ability the public is currently reported to have, we
probably exceed those requirements by a significant amount. However,
for the person who wants an "authoritative edition" we will have to
wait some time until this becomes more feasible. We do, however,
intend to release many editions of Shakespeare and the other classics
for the comparative study on a scholarly level, before the end of the
year 2001, when we are scheduled to complete our 10,000 book Project
Gutenberg Electronic Public Library.

Project Gutenberg has been a part of celebrations of the 100th
Anniversary of Public Libraries, starting in 1995. Project Gutenberg
hopes to found "The Public Domain Register," after the 100th
Anniversary of The U.S. Copyright Register in 1997.

We hope you will be part of it, too. You are all invited.

Footnote:

Our eventual goal is to provide Public Domain Etext editions a short
time after they enter the Public Domain. Of course, the period before
a copyrighted work entered the Public Domain was extended from 28
years (with a 28 year extension available) to 50 years more than the
life of the author, so this put a kink, to put it mildly, into our
plans. (The original copyright was for 14 years, in the U.S.) Thus, a
person could originally do a reasonable prediction that anything under
copyright would be in the Public Domain while it could be used, under
the new law it is impossible to predict the length of a copyright, and
the likelihood of a new book entering the Public Domain during the
lifetime of the average reader is minimal. (Suppose you might be 25
when you read a new book and the author is 50: wait the average 25
years for the author to die (what a thought!*) Now you have to wait
another 50 years to have access to that book; it doesn't matter when
it was written (unless it is an old one ... before the period the law
retroacted to) ... so you would have to wait (on the average) until
you were 100 years old. A 25-year-old under the original law would
only have to wait for 14 years ... until the age of 39. Quite a
difference; between the ages of 39 and 100. Not only that, but the
copyright laws would have to stay the same for all that time
... something in serious doubt, seeing how much they have changed in
the recent century.

## The Project Gutenberg Philosophy (continued)

The Project Gutenberg Philosophy is to make information, books and
other materials available to the general public in forms a vast
majority of the computers, programs and people can easily read, use,
quote, and search.

This has several ramifications:

1. The Project Gutenberg Etexts should cost so little that no one will
really care how much they cost. They should be a general size that
fits on the standard media of the time ...

2. The Project Gutenberg Etexts should be so easily used that no one
should ever have to care about how to use, read, quote and search them
...

## The Project Gutenberg Philosophy (continued)

[...] This has several ramifications:

1. The Project Gutenberg Etexts should cost so little that no one will
really care how much they cost. They should be a general size that
fits on the standard media of the time.

i.e. when we started, the files had to be very small as a normal 300
page book took one meg of space which no one in 1971 could be expected
to have (in general). So doing the U.S. Declaration of Independence
(only 5K) seemed the best place to start. This was followed by the
Bill of Rights — then the whole US Constitution, as space was getting
large (at least by the standards of 1973). Then came the Bible, as
individual books of the Bible were not that large, then Shakespeare (a
play at a time), and then into general work in the areas of light and
heavy literature and references.

By the time Project Gutenberg got famous, the standard was 360K disks,
so we did books such as Alice in Wonderland or Peter Pan because they
could fit on one disk. Now 1.44 is the standard disk and ZIP is the
standard compression; the practical filesize is about three million
characters, more than long enough for the average book.

However, pictures are still so bulky to store on disk that it will
still be a while before we include even the lowres Tenniel
illustrations in Alice and Looking-Glass. However we ARE very
interested in doing them, and are only waiting for advances in
technology to release a test edition. The market will have to
establish SOME standards for graphics, however, before we can attempt
to reach general audiences, at least on the graphics level.

To illustrate our faith in graphics, and in the future, we have gone
one step further in our pursuit of what we named "Replicator
Technology" TM a few years ago. We would like the end of this phase of
Project Gutenberg (with a first 3D application of Replicator
Technology), by doing CAT, MRI and XRAY Fluoroscopy scans of
something, perhaps a painting, and printing 3D copies. If anyone can
get us access to a hundred year old masterpiece ... the average book.

## The Project Gutenberg Philosophy (continued, 2)

[...] This has several ramifications:

2. The Project Gutenberg Etexts should so easily used that no one
should ever have to care about how to use, read, quote and search
them.

This has created a need to present these Project Gutenberg Etexts in
"Plain Vanilla ASCII" as we have come to call it over the years.

The reason for this is simple ... it is the only text mode that is
easy on both the eyes and the computer.

However, this encourages others to improve our etexts in a variety of
ways and to distribute them in a variety of the available media, as
follows: Once an etext is created in Plain Vanilla ASCII, it is the
foundation for as many editions as anyone could hope to do in the
future. Anyone desiring an etext edition matching, or not matching, a
particular paper edition can readily do the changes they like without
having to prepare that whole book again. They can use the Project
Gutenberg Etext as a foundation, and then build in any direction they
like.

Thus any complaints about how we do italics, bold, and the
underscoring, or whether we should use this or that markup formula are
sent back with encouragement to do it any ways any person wants it,
and with the basic work already done, with our compliments.  The same
goes for media. We have had a long-standing work ethic of providing
our etexts in any medium people wanted: Amiga, Apple, Atari ... to
IBM, to Mac, to TRS-80 ...  However, now that our etexts are carried
in so many BBS's, networks and other locations, it is easier to
download the file in a manner that puts them in your format than we
can make and mail a disk, so we don't really do that too much.

The major point of all this is that years from now Project Gutenberg
Etexts are still going to be viable, but program after program, and
operating system after operating system are going to go the way of the
dinosaur, as will all those pieces of hardware running them. Of
course, this is valid for all Plain Vanilla ASCII etexts ... not just
those your access has allowed you to get from Project Gutenberg. The
point is that a decade from now we probably won't have the same
operating systems, or the same programs and therefore all the various
kinds of etexts that are not Plain Vanilla ASCII will be obsolete. We
need to have etexts in files a Plain Vanilla search/reader program can
deal with; this is not to say there should never be any markup
... just those forms of markup should be easily convertible into
regular, Plain Vanilla ASCII files so their utility does not expire
when programs to use them are no longer with us. Remember all the
trouble with CONVERT programs to get files changed from old word
processor programs into Plain Vanilla ASCII?

Do you want to go through all that again with every book a whole world
ever puts into etext?

The value of Plain Vanilla ASCII is obvious ... so is very much of the
value of most of the various markup systems we have in the world. But
until some real standards arrive — we would be limiting our options a
great deal if we do not keep copies of all etexts in Plain Vanilla
ASCII as well.  We don't have anything against markup. Not vice versa.

Alice in Wonderland, the Bible, Shakespeare, the Koran and many others
will be with us as long as civilization ... an operating system, a
program, a markup system ... will not.

This includes the many requests we have for compression in particular
formats. There are only two formats we know of that are suitable for
transfer to a wide general audience: Plain Vanilla ASCII (.txt files)
and ZIPped files of them, (.zip files). Requests for other compression
formats must be ignored as they are appropriate only for small
portions of our target audience. However, (programmers take note: we
will need help) we are planning to put some compression links on our
files so they can be transmitted in any of an assortment compression
formats on the fly. i.e. we should be able to generate any kind of
file asked for, but we can keep only one copy of each etext on our
servers ... as the .Z compression format does in a similar manner
today.

## The Selection of Project Gutenberg Etexts

There are three portions of the Project Gutenberg Library, basically
be described as:

Light Literature; such as Alice in Wonderland, Through the
Looking-Glass, Peter Pan, Aesop's Fables, etc.

Heavy Literature; such as the Bible or other religious documents,
Shakespeare, Moby Dick, Paradise Lost, etc.

References; such as Roget's Thesaurus, almanacs, and a set of
encyclopedia, dictionaries, etc.

The Light Literature Collection is designed to get persons to the
computer in the first place, whether the person may be a pre-schooler
or a great-grandparent. We love it when we hear about kids or
grandparents taking each other to an etexts to Peter Pan when they
come back from watching HOOK at the movies, or when they read Alice in
Wonderland after seeing it on TV. We have also been told that nearly
every Star Trek movie has quoted current Project Gutenberg etext
releases (from Moby Dick in The Wrath of Khan; a Peter Pan quote
finishing up the most recent, etc.) not to mention a reference to
Through the Looking-Glass in JFK. This was a primary concern when we
chose the books for our libraries.

We want people to be able to look up quotations they heard in
conversation, movies, music, other books, easily with a library
containing all these quotations in an easy to find etext format.

With Plain Vanilla ASCII you will be easily able to search an entire
library, without any program more sophisticated than a plain search
program. In fact, these Project Gutenberg Etext files are so plain
that you can do a search on them without even using an intermediate
search program (i.e. a program between you and the disk) Norton's and
other direct disk access programs can search every one of your files
without you even naming them, pointing to an etext directory, or
whatever. You can simply search a raw output from the disk ... I do
this on a half gigabyte disk partition, containing all our editions.