known issues updated
parent
564c9a020e
commit
19181b4511
|
@ -1,113 +1,80 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: Volunteers' FAQ (old) | Project Gutenberg
|
title: Volunteers' FAQ | Project Gutenberg
|
||||||
permalink: /attic/volunteers_faq.html
|
permalink: /help/volunteers_faq.html
|
||||||
---
|
---
|
||||||
|
|
||||||
Volunteers' FAQ (old)
|
Volunteers' FAQ
|
||||||
=====================
|
===============
|
||||||
|
|
||||||
Project Gutenberg welcomes contributions of eBooks from people with the interest, time,
|
Project Gutenberg welcomes contributions of eBooks from people with
|
||||||
and skillset needed to meet our submission standards. Details of the process and the standards
|
the interest, time, and skillset needed to meet our submission
|
||||||
are at our copyright clearance site (https://copy.pglaf.org) and upload site (https://upload.pglaf.org).
|
standards. Details of the process and the standards are at our
|
||||||
|
copyright clearance site (https://copy.pglaf.org) and upload site
|
||||||
|
(https://upload.pglaf.org).
|
||||||
|
|
||||||
Join Distributed Proofreaders, Instead
|
Join Distributed Proofreaders, Instead
|
||||||
--------------------------------------
|
--------------------------------------
|
||||||
|
|
||||||
For most people interested in producing eBooks, we recommend starting with Distributed Proofreaders (https://www.pgdp.net).
|
For most people interested in producing eBooks, we recommend starting
|
||||||
With Distributed Proofreaders, you can get involved with different portions of the production pipeline
|
with Distributed Proofreaders (https://www.pgdp.net). With
|
||||||
described below. This is a much easier way to get started, and results in very high quality eBooks.
|
Distributed Proofreaders, you can get involved with different portions
|
||||||
|
of the production pipeline described below. This is a much easier way
|
||||||
|
to get started, and results in very high quality eBooks.
|
||||||
|
|
||||||
|
If you simply want to suggest a book for digitization, DP has online
|
||||||
|
forums for this, or you can simply send an email (contact information
|
||||||
|
is on the site).
|
||||||
|
|
||||||
|
Distributed Proofreaders maintains canonical guidance on production.
|
||||||
|
See especially:
|
||||||
|
|
||||||
|
* The Post-Processing FAQ --
|
||||||
|
https://www.pgdp.net/wiki/DP_Official_Documentation:PP_and_PPV/Post-Processing_FAQ
|
||||||
|
* Easy Epub -- https://www.pgdp.net/wiki/DP_Official_Documentation:PP_and_PPV/Easy_Epub (It's a guide to how best to handle the HTML that goes through epubmaker to lead to passable epubs/mobis)
|
||||||
|
* HTML Best Practices -- https://www.pgdp.org/~jana/best-practices/ (this was written a while back but DP tries to keep it up-to-date)
|
||||||
|
|
||||||
If you simply want to suggest a book for digitization, DP has online forums for this, or you can
|
|
||||||
simply send an email (contact information is on the site).
|
|
||||||
|
|
||||||
Being a Solo Producer
|
Being a Solo Producer
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
If you might be interested in producing an eBook yourself, here is some guidance.
|
If you might be interested in producing an eBook yourself, without involving
|
||||||
|
Distributed Proofreaders, here is some guidance. But start with what's above,
|
||||||
These steps are for eBooks that are in the public domain in the US, usually because the source
|
including the DP links.
|
||||||
printed book was published, and then the term of copyright protection expired. **Project Gutenberg
|
|
||||||
does not accept contemporary or copyrighted works. See our How-To on submitting your own work,
|
|
||||||
for information about our self-publishing portal.**
|
|
||||||
|
|
||||||
Note also that Project Gutenberg does not have a corps of volunteers who can take
|
|
||||||
your partially-completed work and turn it into a completed eBook. Instead, join Distributed
|
|
||||||
Proofreaders to be part of a larger group with many shared roles. But if there is a problem
|
|
||||||
with your ability to complete a digitization process, and you would like to
|
|
||||||
submit partial work for safekeeping for possible future completion, contact us.
|
|
||||||
|
|
||||||
Generally, and as detailed at the upload site, Project Gutenberg eBooks are submitted as
|
|
||||||
fully valid HTML, with accompanying plain text. Two other formats are less frequently utilized:
|
|
||||||
ReStructured text (RST), and LaTeX (which is used mainly for mathematical works). Automated
|
|
||||||
conversion to derivative formats, including the MOBI (Kindle) and EPUB formats, occurs
|
|
||||||
on the Project Gutenberg website back-end.
|
|
||||||
|
|
||||||
In a nutshell, the production process typically involves the following:
|
In a nutshell, the production process typically involves the following:
|
||||||
- Identify a candidate printed book. Confirm it is not already in the collection,
|
- Identify a candidate printed book. Confirm it is not already in the
|
||||||
or in process by other volunteers.
|
collection, or in process by other volunteers. Use the [Collection
|
||||||
- Obtain a copyright clearance for the printed book. Usually this is based on
|
Development Policy](/policy/collection_development.html) to guide
|
||||||
scanned title page and verso page demonstrating the printed book was published
|
you on eligibility.
|
||||||
more than 95 years ago.
|
- Obtain a copyright clearance for the printed book. Usually this is
|
||||||
- Obtain scans of the book. This may be done using your own scanner, or there
|
based on scanned title page and verso page demonstrating the printed
|
||||||
might be online scans from one of the eBook projects. Scans must come from the
|
book was published more than 95 years ago. See the [Copyright
|
||||||
exact same print edition as your copyright clearance.
|
How-To](/help/copyright.html).
|
||||||
- Perform optical character recognition (OCR) on the scans, to make an approximate
|
- Obtain scans of the book. This may be done using your own scanner,
|
||||||
representation of the book in plain text.
|
or there might be online scans available for reuse. Scans
|
||||||
- Proofread, proofread, proofread: "Fix" the OCR output by carefully fixing any
|
must come from the exact same print edition as your copyright
|
||||||
errors it made. Remove page headers & footers. De-hyphenate. Add back italics or
|
clearance.
|
||||||
other formatting.
|
- Perform optical character recognition (OCR) on the scans, to make an
|
||||||
- Format: Generate the HTML source. Different tools are available for this, and
|
approximate representation of the book in plain text.
|
||||||
usually involve editing the HTML source code directly. Note that many tools produce
|
- Proofread, proofread, proofread: "Fix" the OCR output by carefully
|
||||||
convoluted, non-standard, or non-valid HTML, which can be very difficult to clean
|
fixing any errors it made. Remove page headers &
|
||||||
up for Project Gutenberg.
|
footers. De-hyphenate. Add back italics or other formatting.
|
||||||
- Check, and recheck. The upload site has various tools, including to test proper
|
- Format: Generate valid and well-formed HTML source. Different tools
|
||||||
conversion to derived formats.
|
are available for this, and usually involve editing the HTML source
|
||||||
- Upload your work, using the copyright clearance key generated earlier.
|
code directly. Note that many tools produce convoluted, non-standard,
|
||||||
- Coordinate with the Project Gutenberg production volunteers (known as "whitewashers,"
|
or non-valid HTML, which can be very difficult to clean up for Project
|
||||||
after the Mark Twain book) on final formatting and presentation.
|
Gutenberg: poor HTML is not accepted, even if it is valid.
|
||||||
- Once the eBook is added to the Project Gutenberg collection, confirm it is
|
- Check, and recheck. The upload site has various tools, including to
|
||||||
appearing correctly, and all metadata are correct.
|
test proper conversion to derived formats.
|
||||||
- If possible, stay in touch into the future. If we receive errata reports that
|
- Upload your work, using the copyright clearance key generated
|
||||||
require access to source material, or are stylistic or subjective in nature, we
|
earlier.
|
||||||
might get in touch to discuss potential changes.
|
- Coordinate with the Project Gutenberg production volunteers (known
|
||||||
|
as "whitewashers," after the Mark Twain book) on final formatting and
|
||||||
More about HTML
|
presentation.
|
||||||
---------------
|
- Once the eBook is added to the Project Gutenberg collection, confirm
|
||||||
|
it is appearing correctly, and all metadata are correct.
|
||||||
Project Gutenberg is not in the business of teaching HTML,
|
- If possible, stay in touch into the future. If we receive errata
|
||||||
however nearly all Project Gutenberg's editions now have
|
reports that require access to source material, or are stylistic or
|
||||||
both text and HTML files. The HTML files are not only
|
subjective in nature, we might get in touch to discuss potential
|
||||||
important in themselves but are the master files for the
|
changes.
|
||||||
auto-production of each eBook's mobile viewer files: .mobi
|
|
||||||
and .epub. Almost one third of Project Gutenberg eBooks are
|
|
||||||
now read these eBook formats.
|
|
||||||
|
|
||||||
A good starting point is to look at the HTML of a similar eBook from
|
|
||||||
the Project Gutenberg collection - or peruse a few. See how the simplicity
|
|
||||||
of structural markup is presented, perhaps by inclusion of cascading style
|
|
||||||
sheets (CSS), into a beautiful and functional ebook.
|
|
||||||
|
|
||||||
It is fascinating to learn how to produce valid HTML files and
|
|
||||||
there are many programs to help: https://www.w3schools.com/html/html_editors.asp
|
|
||||||
is one of many sources.
|
|
||||||
|
|
||||||
In Project Gutenberg HTML eBooks, you can see the HTML code in many
|
|
||||||
different tools, including Notepad, NoteTab or word
|
|
||||||
processors. The headings and footers of all our eBooks are
|
|
||||||
very similar. The body of the file has text with various
|
|
||||||
HTML tags such as
|
|
||||||
<p>....</p>, <pre>....</pre>, <h2>....</h2>
|
|
||||||
to make the html display give the text
|
|
||||||
enclosed between these html tags apppear differently.
|
|
||||||
|
|
||||||
If you start with plain text, there are two very helpful programs that producers utilize: "guiguts"
|
|
||||||
at https://sourceforge.net/projects/guiguts/available
|
|
||||||
and "pg2html.exe," an early DP program. These are valuable in the
|
|
||||||
first steps of producing an html file from a properly
|
|
||||||
formatted text file.
|
|
||||||
|
|
||||||
Try to turn a very simple text file into an HTML ebook
|
|
||||||
and you are started on your way to more complex files with
|
|
||||||
the aid of the instructions in the references in
|
|
||||||
https://www.w3schools.com/html/html_editors.asp.
|
|
||||||
|
|
|
@ -37,6 +37,8 @@ Here are pages that give background about Project Gutenberg, including how new i
|
||||||
<h2>T</h2>
|
<h2>T</h2>
|
||||||
<li><a href="/help/mobile.html">Tablets, Phones and eReaders How-To</a></li>
|
<li><a href="/help/mobile.html">Tablets, Phones and eReaders How-To</a></li>
|
||||||
<li><a href="/policy/license.html">Trademark License</a></li>
|
<li><a href="/policy/license.html">Trademark License</a></li>
|
||||||
|
<h2>V</h2>
|
||||||
|
<li><a href="/help/volunteers_faq.html">Volunteers' FAQ</a></li>
|
||||||
<h2>W</h2>
|
<h2>W</h2>
|
||||||
<li><a href="/help/new_website.html">Website redesign 2020</a></li>
|
<li><a href="/help/new_website.html">Website redesign 2020</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
|
@ -23,21 +23,13 @@ THANK YOU for your patience as we continue to update the website to fix remainin
|
||||||
### Functionality issues
|
### Functionality issues
|
||||||
1. Roboting not working properly. URLs such as http://www.gutenberg.org/robot/harvest?filetypes[]=txt sometimes work, and sometimes generate a 500 server error. This might be a missing package on one of the back end servers. Status: Checking with sysadmins.
|
1. Roboting not working properly. URLs such as http://www.gutenberg.org/robot/harvest?filetypes[]=txt sometimes work, and sometimes generate a 500 server error. This might be a missing package on one of the back end servers. Status: Checking with sysadmins.
|
||||||
2. "Authors" match in search yields 404. For example, from this page: https://www.gutenberg.org/ebooks/search/?query=a.roosevelt&submit_search=Go%21 the "Authors" link (top left) should list all Roosevelts, but instead give a 404 at https://www.gutenberg.org/ebooks/authors/search/?query=a.roosevelt . Status: This is a template error in autocat3, being investigated.
|
2. "Authors" match in search yields 404. For example, from this page: https://www.gutenberg.org/ebooks/search/?query=a.roosevelt&submit_search=Go%21 the "Authors" link (top left) should list all Roosevelts, but instead give a 404 at https://www.gutenberg.org/ebooks/authors/search/?query=a.roosevelt . Status: This is a template error in autocat3, being investigated.
|
||||||
3. "Titles" have the same issue as #4. Same status.
|
3. "Titles" have the same issue as above. Same status.
|
||||||
4. Bookshelf editing is not currently available. Bookshelves only have older entries. Most bookshelves had not been updated recently anyway, and we hope to add bookshelf editing capabilities soon. Status: Under development.
|
|
||||||
|
|
||||||
### Content issues
|
### Content issues
|
||||||
1. Revise the Volunteer's FAQ (currently in "the attic" since it was outdated). **Status: The Whitewashers team is looking into this.**
|
1. Bookshelf editing is not currently available. Bookshelves only have older entries. Most bookshelves had not been updated recently anyway, and we hope to add bookshelf editing capabilities soon. Status: Under development.
|
||||||
2. Add these links to the DP HTML documentation, to the Volunteer's FAQ. **Status: awaiting the Volunteer's FAQ mentioned just above.**
|
|
||||||
The Post-Processing FAQ --
|
|
||||||
https://www.pgdp.net/wiki/DP_Official_Documentation:PP_and_PPV/Post-Processing_FAQ
|
|
||||||
Easy Epub -- https://www.pgdp.net/wiki/DP_Official_Documentation:PP_and_PPV/Easy_Epub (It's a guide to how best to handle the HTML that goes through epubmaker to lead to passable epubs/mobis)
|
|
||||||
HTML Best Practices -- https://www.pgdp.org/~jana/best-practices/ (this was written a while back but DP tries to keep it up-to-date)
|
|
||||||
|
|
||||||
### User interface and user experience issues
|
### User interface and user experience issues
|
||||||
1. Selecting text is challenging on landing pages. For example, on this page: https://www.gutenberg.org/ebooks/13930 it is hard to select the title text ("African and European Addresses by Theodore Roosevelt") to copy-and-paste. Instead, the book image and "Download this eBook" are selected. Status: not yet determined.
|
1. At https://www.gutenberg.org/, the 'box of latest books' contains 10 books, but only shows 9, so I get a scroll bar. When I scroll to reveal the 10th book, the covers all shift left, but the titles below don't move. (And in fact, the 10th title is hanging off the right edge of the box.) Status: CSS issue, same as #2 under "Functionality issues."
|
||||||
2. At https://www.gutenberg.org/, the 'box of latest books' contains 10 books, but only shows 9, so I get a scroll bar. When I scroll to reveal the 10th book, the covers all shift left, but the titles below don't move. (And in fact, the 10th title is hanging off the right edge of the box.) Status: CSS issue, same as #2 under "Functionality issues."
|
|
||||||
|
|
||||||
|
|
||||||
### Search-related issues
|
### Search-related issues
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue