309 lines
15 KiB
Markdown
309 lines
15 KiB
Markdown
|
---
|
||
|
layout: default
|
||
|
title: History and Philosophy | Project Gutenberg
|
||
|
permalink: /background/history_and_philosophy.html
|
||
|
---
|
||
|
The History and Philosophy of Project Gutenberg, by Michael Hart
|
||
|
================================================================
|
||
|
|
||
|
© August 1992
|
||
|
|
||
|
## The Beginning
|
||
|
|
||
|
Project Gutenberg began in 1971 when Michael Hart was given an
|
||
|
operator's account with $100,000,000 of computer time in it by the
|
||
|
operators of the Xerox Sigma V mainframe at the Materials Research
|
||
|
Lab at the University of Illinois.
|
||
|
|
||
|
This was totally serendipitous, as it turned out that two of a four
|
||
|
operator crew happened to be the best friend of Michael's and the best
|
||
|
friend of his brother. Michael just happened "to be at the right place
|
||
|
at the right time" at the time there was more computer time than
|
||
|
people knew what to do with, and those operators were encouraged to do
|
||
|
whatever they wanted with that fortune in "spare time" in the hopes
|
||
|
they would learn more for their job proficiency.
|
||
|
|
||
|
At any rate, Michael decided there was nothing he could do, in the way
|
||
|
of "normal computing," that would repay the huge value of the computer
|
||
|
time he had been given ... so he had to create $100,000,000 worth of
|
||
|
value in some other ma nner. An hour and 47 minutes later, he
|
||
|
announced that the greatest value created by computers would not be
|
||
|
computing, but would be the storage, retrieval, and s earching of what
|
||
|
was stored in our libraries. </p><p>He then proceeded to type in the
|
||
|
"Declaration of Independence" and tried to send it to everyone on the
|
||
|
networks ... which can only be described today as a not so narrow miss
|
||
|
at creating an early version of what was later called the "Internet
|
||
|
Virus."
|
||
|
|
||
|
A friendly dissuasion from this yielded the first posting of a
|
||
|
document i n electronic text, and Project Gutenberg was born as
|
||
|
Michael stated that he had "earned" the $100,000,000 because a copy of
|
||
|
the Declaration of Independence woul d eventually be an electronic
|
||
|
fixture in the computer libraries of 100,000,000 o f the computer
|
||
|
users of the future.
|
||
|
|
||
|
## The Beginning of the Gutenberg Philosophy
|
||
|
|
||
|
The premise on which Micheal Hart based Project Gutenberg was:
|
||
|
anything that can be entered into a computer can be reproduced
|
||
|
indefinitely ... what Micheal termed "Replicator Technology" The
|
||
|
concept of Replicator Technology is simple; once a book or any other
|
||
|
item (including pictures, sounds, and even 3-D items can be stored in
|
||
|
a computer), then any number of copies can and will be
|
||
|
available. Everyone in the world, or even not in this world (given
|
||
|
satellite transmission) can have a copy of a book that has been
|
||
|
entered into a computer.
|
||
|
|
||
|
This philosophical premise has created several offshoots: 1.Electronic
|
||
|
Texts (Etexts) created by Project Gutenberg are to be made available
|
||
|
in the simplest, easiest to use forms available.
|
||
|
|
||
|
Suggestions to make them less readily available are not to be treated
|
||
|
lightly. Therefore, Project Gutenberg Etexts are made available in
|
||
|
what has become known as "Plain Vanilla ASCII," meaning the low set of
|
||
|
the American Standard Code for Information Interchange: ie the same
|
||
|
kind of character you read on a normal printed page — italics,
|
||
|
underlines, and bolds have been capitalized.
|
||
|
|
||
|
The reason for this is that 99% of the hardware and software a person
|
||
|
is likely to run into can read and search these files.
|
||
|
|
||
|
Any other system of etext storage is going to fall short of an
|
||
|
audience of 99%.
|
||
|
|
||
|
This does not mean there are not other valid mean of doing the etext
|
||
|
business ... after all, over half the computers are DOS, so one could
|
||
|
address a wide audience by just doing DOS. Plain Vanilla ASCII,
|
||
|
however, addresses the audience with Apples and Ataris all the way to
|
||
|
the old homebrew Z80 computers, while an audience of Mac, UNIX and
|
||
|
mainframers is still included.
|
||
|
|
||
|
In this same vein, Project Gutenberg selects etexts targeted a bit on
|
||
|
the "bang for the buck" philosophy ... we choose etexts we hope
|
||
|
extremely large portions of the audience will want and use
|
||
|
frequently. We are constantly asked to prepare etext from out of print
|
||
|
editions of esoteric materials, but this does not provide for usage by
|
||
|
the audience we have targeted, 99% of the general public.
|
||
|
|
||
|
Also in the same vein, Project Gutenberg has avoided requests,
|
||
|
demands, and pressures to create "authoritative editions." We do not
|
||
|
write for the reader who cares whether a certain phrase in Shakespeare
|
||
|
has a ":" or a ";" between its clauses. We put our sights on a goal to
|
||
|
release etexts that are 99.9% accurate in the eyes of the general
|
||
|
reader. Given the preferences your proofreaders have, and the general
|
||
|
lack of reading ability the public is currently reported to have, we
|
||
|
probably exceed those requirements by a significant amount. However,
|
||
|
for the person who wants an "authoritative edition" we will have to
|
||
|
wait some time until this becomes more feasible. We do, however,
|
||
|
intend to release many editions of Shakespeare and the other classics
|
||
|
for the comparative study on a scholarly level, before the end of the
|
||
|
year 2001, when we are scheduled to complete our 10,000 book Project
|
||
|
Gutenberg Electronic Public Library.
|
||
|
|
||
|
Project Gutenberg has been a part of celebrations of the 100th
|
||
|
Anniversary of Public Libraries, starting in 1995. Project Gutenberg
|
||
|
hopes to found "The Public Domain Register," after the 100th
|
||
|
Anniversary of The U.S. Copyright Register in 1997.
|
||
|
|
||
|
We hope you will be part of it, too. You are all invited.
|
||
|
|
||
|
Footnote:
|
||
|
|
||
|
Our eventual goal is to provide Public Domain Etext editions a short
|
||
|
time after they enter the Public Domain. Of course, the period before
|
||
|
a copyrighted work entered the Public Domain was extended from 28
|
||
|
years (with a 28 year extension available) to 50 years more than the
|
||
|
life of the author, so this put a kink, to put it mildly, into our
|
||
|
plans. (The original copyright was for 14 years, in the U.S.) Thus, a
|
||
|
person could originally do a reasonable prediction that anything under
|
||
|
copyright would be in the Public Domain while it could be used, under
|
||
|
the new law it is impossible to predict the length of a copyright, and
|
||
|
the likelihood of a new book entering the Public Domain during the
|
||
|
lifetime of the average reader is minimal. (Suppose you might be 25
|
||
|
when you read a new book and the author is 50: wait the average 25
|
||
|
years for the author to die (what a thought!*) Now you have to wait
|
||
|
another 50 years to have access to that book; it doesn't matter when
|
||
|
it was written (unless it is an old one ... before the period the law
|
||
|
retroacted to) ... so you would have to wait (on the average) until
|
||
|
you were 100 years old. A 25-year-old under the original law would
|
||
|
only have to wait for 14 years ... until the age of 39. Quite a
|
||
|
difference; between the ages of 39 and 100. Not only that, but the
|
||
|
copyright laws would have to stay the same for all that time
|
||
|
... something in serious doubt, seeing how much they have changed in
|
||
|
the recent century.
|
||
|
|
||
|
## The Project Gutenberg Philosophy (continued)
|
||
|
|
||
|
The Project Gutenberg Philosophy is to make information, books and
|
||
|
other materials available to the general public in forms a vast
|
||
|
majority of the computers, programs and people can easily read, use,
|
||
|
quote, and search.
|
||
|
|
||
|
This has several ramifications:
|
||
|
|
||
|
1. The Project Gutenberg Etexts should cost so little that no one will
|
||
|
really care how much they cost. They should be a general size that
|
||
|
fits on the standard media of the time ...
|
||
|
|
||
|
2. The Project Gutenberg Etexts should be so easily used that no one
|
||
|
should ever have to care about how to use, read, quote and search them
|
||
|
...
|
||
|
|
||
|
## The Project Gutenberg Philosophy (continued)
|
||
|
|
||
|
[...] This has several ramifications:
|
||
|
|
||
|
1. The Project Gutenberg Etexts should cost so little that no one will
|
||
|
really care how much they cost. They should be a general size that
|
||
|
fits on the standard media of the time.
|
||
|
|
||
|
i.e. when we started, the files had to be very small as a normal 300
|
||
|
page book took one meg of space which no one in 1971 could be expected
|
||
|
to have (in general). So doing the U.S. Declaration of Independence
|
||
|
(only 5K) seemed the best place to start. This was followed by the
|
||
|
Bill of Rights — then the whole US Constitution, as space was getting
|
||
|
large (at least by the standards of 1973). Then came the Bible, as
|
||
|
individual books of the Bible were not that large, then Shakespeare (a
|
||
|
play at a time), and then into general work in the areas of light and
|
||
|
heavy literature and references.
|
||
|
|
||
|
By the time Project Gutenberg got famous, the standard was 360K disks,
|
||
|
so we did books such as Alice in Wonderland or Peter Pan because they
|
||
|
could fit on one disk. Now 1.44 is the standard disk and ZIP is the
|
||
|
standard compression; the practical filesize is about three million
|
||
|
characters, more than long enough for the average book.
|
||
|
|
||
|
However, pictures are still so bulky to store on disk that it will
|
||
|
still be a while before we include even the lowres Tenniel
|
||
|
illustrations in Alice and Looking-Glass. However we ARE very
|
||
|
interested in doing them, and are only waiting for advances in
|
||
|
technology to release a test edition. The market will have to
|
||
|
establish SOME standards for graphics, however, before we can attempt
|
||
|
to reach general audiences, at least on the graphics level.
|
||
|
|
||
|
To illustrate our faith in graphics, and in the future, we have gone
|
||
|
one step further in our pursuit of what we named "Replicator
|
||
|
Technology" TM a few years ago. We would like the end of this phase of
|
||
|
Project Gutenberg (with a first 3D application of Replicator
|
||
|
Technology), by doing CAT, MRI and XRAY Fluoroscopy scans of
|
||
|
something, perhaps a painting, and printing 3D copies. If anyone can
|
||
|
get us access to a hundred year old masterpiece ... the average book.
|
||
|
|
||
|
## The Project Gutenberg Philosophy (continued, 2)
|
||
|
|
||
|
[...] This has several ramifications:
|
||
|
|
||
|
2. The Project Gutenberg Etexts should so easily used that no one
|
||
|
should ever have to care about how to use, read, quote and search
|
||
|
them.
|
||
|
|
||
|
This has created a need to present these Project Gutenberg Etexts in
|
||
|
"Plain Vanilla ASCII" as we have come to call it over the years.
|
||
|
|
||
|
The reason for this is simple ... it is the only text mode that is
|
||
|
easy on both the eyes and the computer.
|
||
|
|
||
|
However, this encourages others to improve our etexts in a variety of
|
||
|
ways and to distribute them in a variety of the available media, as
|
||
|
follows: Once an etext is created in Plain Vanilla ASCII, it is the
|
||
|
foundation for as many editions as anyone could hope to do in the
|
||
|
future. Anyone desiring an etext edition matching, or not matching, a
|
||
|
particular paper edition can readily do the changes they like without
|
||
|
having to prepare that whole book again. They can use the Project
|
||
|
Gutenberg Etext as a foundation, and then build in any direction they
|
||
|
like.
|
||
|
|
||
|
Thus any complaints about how we do italics, bold, and the
|
||
|
underscoring, or whether we should use this or that markup formula are
|
||
|
sent back with encouragement to do it any ways any person wants it,
|
||
|
and with the basic work already done, with our compliments. The same
|
||
|
goes for media. We have had a long-standing work ethic of providing
|
||
|
our etexts in any medium people wanted: Amiga, Apple, Atari ... to
|
||
|
IBM, to Mac, to TRS-80 ... However, now that our etexts are carried
|
||
|
in so many BBS's, networks and other locations, it is easier to
|
||
|
download the file in a manner that puts them in your format than we
|
||
|
can make and mail a disk, so we don't really do that too much.
|
||
|
|
||
|
The major point of all this is that years from now Project Gutenberg
|
||
|
Etexts are still going to be viable, but program after program, and
|
||
|
operating system after operating system are going to go the way of the
|
||
|
dinosaur, as will all those pieces of hardware running them. Of
|
||
|
course, this is valid for all Plain Vanilla ASCII etexts ... not just
|
||
|
those your access has allowed you to get from Project Gutenberg. The
|
||
|
point is that a decade from now we probably won't have the same
|
||
|
operating systems, or the same programs and therefore all the various
|
||
|
kinds of etexts that are not Plain Vanilla ASCII will be obsolete. We
|
||
|
need to have etexts in files a Plain Vanilla search/reader program can
|
||
|
deal with; this is not to say there should never be any markup
|
||
|
... just those forms of markup should be easily convertible into
|
||
|
regular, Plain Vanilla ASCII files so their utility does not expire
|
||
|
when programs to use them are no longer with us. Remember all the
|
||
|
trouble with CONVERT programs to get files changed from old word
|
||
|
processor programs into Plain Vanilla ASCII?
|
||
|
|
||
|
Do you want to go through all that again with every book a whole world
|
||
|
ever puts into etext?
|
||
|
|
||
|
The value of Plain Vanilla ASCII is obvious ... so is very much of the
|
||
|
value of most of the various markup systems we have in the world. But
|
||
|
until some real standards arrive — we would be limiting our options a
|
||
|
great deal if we do not keep copies of all etexts in Plain Vanilla
|
||
|
ASCII as well. We don't have anything against markup. Not vice versa.
|
||
|
|
||
|
Alice in Wonderland, the Bible, Shakespeare, the Koran and many others
|
||
|
will be with us as long as civilization ... an operating system, a
|
||
|
program, a markup system ... will not.
|
||
|
|
||
|
This includes the many requests we have for compression in particular
|
||
|
formats. There are only two formats we know of that are suitable for
|
||
|
transfer to a wide general audience: Plain Vanilla ASCII (.txt files)
|
||
|
and ZIPped files of them, (.zip files). Requests for other compression
|
||
|
formats must be ignored as they are appropriate only for small
|
||
|
portions of our target audience. However, (programmers take note: we
|
||
|
will need help) we are planning to put some compression links on our
|
||
|
files so they can be transmitted in any of an assortment compression
|
||
|
formats on the fly. i.e. we should be able to generate any kind of
|
||
|
file asked for, but we can keep only one copy of each etext on our
|
||
|
servers ... as the .Z compression format does in a similar manner
|
||
|
today.
|
||
|
|
||
|
## The Selection of Project Gutenberg Etexts
|
||
|
|
||
|
There are three portions of the Project Gutenberg Library, basically
|
||
|
be described as:
|
||
|
|
||
|
Light Literature; such as Alice in Wonderland, Through the
|
||
|
Looking-Glass, Peter Pan, Aesop's Fables, etc.
|
||
|
|
||
|
Heavy Literature; such as the Bible or other religious documents,
|
||
|
Shakespeare, Moby Dick, Paradise Lost, etc.
|
||
|
|
||
|
References; such as Roget's Thesaurus, almanacs, and a set of
|
||
|
encyclopedia, dictionaries, etc.
|
||
|
|
||
|
The Light Literature Collection is designed to get persons to the
|
||
|
computer in the first place, whether the person may be a pre-schooler
|
||
|
or a great-grandparent. We love it when we hear about kids or
|
||
|
grandparents taking each other to an etexts to Peter Pan when they
|
||
|
come back from watching HOOK at the movies, or when they read Alice in
|
||
|
Wonderland after seeing it on TV. We have also been told that nearly
|
||
|
every Star Trek movie has quoted current Project Gutenberg etext
|
||
|
releases (from Moby Dick in The Wrath of Khan; a Peter Pan quote
|
||
|
finishing up the most recent, etc.) not to mention a reference to
|
||
|
Through the Looking-Glass in JFK. This was a primary concern when we
|
||
|
chose the books for our libraries.
|
||
|
|
||
|
We want people to be able to look up quotations they heard in
|
||
|
conversation, movies, music, other books, easily with a library
|
||
|
containing all these quotations in an easy to find etext format.
|
||
|
|
||
|
With Plain Vanilla ASCII you will be easily able to search an entire
|
||
|
library, without any program more sophisticated than a plain search
|
||
|
program. In fact, these Project Gutenberg Etext files are so plain
|
||
|
that you can do a search on them without even using an intermediate
|
||
|
search program (i.e. a program between you and the disk) Norton's and
|
||
|
other direct disk access programs can search every one of your files
|
||
|
without you even naming them, pointing to an etext directory, or
|
||
|
whatever. You can simply search a raw output from the disk ... I do
|
||
|
this on a half gigabyte disk partition, containing all our editions.
|