This checklist contains steps for all types of books including those
with illustrations and indexes, and it assumes you will create an HTML
version. Not all projects need all steps. As you gain experience you
may decide to do things in a different sequence.
FYI: Here is the general work-flow sequence for a PGDP book.
(1) first-round proofers correct individual pages. (2) Second-round
proofers correct individual pages. (3) Post-proofers like you (PPers)
do the work listed below to assemble the entire book into finished form.
(4) Post-proofing Verifiers
(PPVers) do a quality-check on the uploaded book, then pass it to
(5) Project Gutenberg "whitewashers" (WWers) who inspect the book and
install it in the PG site.
1. Initial Research (1 hr.)
- Go to Project page
-
- Read details and requirements.
- bookmark the project URL and note project ID:
http://www.pgdp.net/c/tools/post_proofers/post_comments.php?project=projectID
- Click to project forum page, note any issues proofers raised.
- Make a project folder, e.g. (Win)
C:\dp\pp\bookname (Mac/Linux)
/dp/pp/bookname
- Download the text and images files and unpack in new folder:
-
- Text to bookname.txt.
(Mac/Linux: Run dos2unix bookname.txt.)
- page images (nnn.png) in subfolder pngs
- illustrations (imagenn.png) in subfolder originals
- empty subfolder images.
2. Sequential Inspection of Text (4-20 hr.)
This is the only step in which you will examine the whole ASCII text in
sequence; hereafter you navigate with searches. (Most post-proofers read
the book, often finding errors that got past the proofers; figure 2 minutes
per page. Some merely skim the text comparing it to the page images;
figure 2pp/minute.) Either way, look for:
- Proper markup of <i>italic</i> and <b>bold</b>.
-
- Watch for punctuation wrongly contained in markups:
<i>(ibid.</i> or
<b>Subtopic.</b>.
- Proper markup of Greek and other transliterations (content check later)
- Block material all marked in some fashion:
- poetry, misc. tabular in /* */
- block quotes in /# #/
- you may fix block markups that cross page boundaries now or in the next step
- Figures properly in [Illustration: caption]
- check: caption text agrees with List of Illustrations (if any)
- consistent spelling, abbreviation, capitalization in captions
- Fix Footnotes, Illustrations still inside a paragraph.
- move outside paragraph to next or prior page as appropriate
- don't worry about duplicate footnote number/symbol now
- sidenotes handled later
- Make notes of things that will need attention in the HTML:
- Author cross-references like "(p. 150)" and "see page 222"
that should become links.
- How the editor laid out special sections such as sidebars.
3. Fix Block Markups and Proofer Notes (15-60 min.)
- Use the Search menu to step through all /* */ blocks.
- Use the Search menu to step through all /# #/ blocks
- check for a blank line before and after markup
- make sure correct type of markup used
- close-up where broken at page boundaries
- apply specific margin values if desired
- check consistent indentation of block text
- Use Find Orphaned Brackets
and Markup dialog to check and correct orphans of each type in turn,
especially:
- square brackets
- /* */
- /# #/
- /p p/
- /$ $/
- Search&Replace: text: [^/]\* (anything but a slash,
followed by a literal asterisk), regex;
keep clicking "Search" to check all asterisks in document.
- look for malformed thought-breaks (5 stars)
- resolve proofer's notes, which are indicated by asterisk
4. Basic Fixup (10 min.)
5. Format Front Matter (15 min.)
- Format the title page, preserving as much of the original material
as possible. Protect in /f...f/ (no rewrap, center).
- Edit the TOC. Find each matching chapter head; make sure heads are 1:1
with TOC. Protect TOC with /$...$/.
- If book has illustrations, edit or create List of Illustrations.
Make sure it is 1:1 with [Illustration] captions.
Protect with /$...$/.
6. Edit Transliterations (0-? hr.)
- Search&Replace: text: \[[^FIS] (left-bracket followed by anything
other than F, I or S), regex.
Check content of each transliteration.
For Greek, use the
Greek Transliteration Tool.
7. Remove Visible Page Breaks (10-30 min.)
8. Apply Word-Frequency Checks (10-60 min.)
Open the Word Frequency report.
See this page for usage.
- Turn off Sort Alpha button; click All Words. List is now sorted by word
frequency; scroll to the end and skim up the list of words that only appear 1 time
looking for oddities and obvious misspellings.
- Click Character Cnts.
- Note characters that appear only once, check usage.
- Check for equal counts of left & right parens and brackets.
- Turn on Sort Alpha; click All Words. Scroll to the word
Footnote and write down count for later use.
- Click Check Emdashes.
Conflicting usages are marked with asterisks; check against text and page
images. Preserve author's intent even when inconsistent.
- Click Check Hyphens. Resolve conflicts as above.
- Click Alpha/num. Scan list for one/ell and oh/zero errors.
- Click ALL CAPS. Scan list looking for oddities.
- Click Check MiXeD Case. Scan list looking for letters such as o
that sometimes OCR wrongly as uppercase. Oh/zero errors can show up here, too.
- Click Check Accents. Scan list looking for mistakes, inconsistent usages.
- Click Check , Upper. Scan list for comma-for-period errors.
- Click Check . Lower. Scan list for period-for-comma errors.
9. Apply Scanno Checks (1-3 hr.)
See this topic for usage of the
automated scanno checks.
10. Apply Gutcheck (10-45 min.)
Start the Gutcheck Process.
- Work through the list, correcting as appropriate.
11. Apply Spellcheck (30-90 min.)
Start the spellcheck process.
- Proceed through the document, correcting words
or adding them to the project dictionary as appropriate.
12. Fix Sidenotes (0-? hr.)
Read the discussion on this page.
Step through sidenotes with: Search&Replace of [S, not regex,
not whole word, ignore case. Click Search to find each Sidenote.
- Compare to page image. Move note above paragraph if feasible.
- Otherwise, position it above
the sentence to which it applies, with blank lines to prevent
rewrapping if you decide that is best.
13. Fix Footnotes (0-? hr.)
- Use Fixup>Footnote Fixup and follow the procedures on
this page.
14. Fix Poetry Line Numbers (0-20 min.)
- If the book has poetry that uses line numbers, read this page
and align the line numbers consistently.
15. Fix ASCII Tables (0-? hr.)
- Use Search>Find Next /**/ Block to step through all tabular material.
16. Save Edited Markup (2 min.)
- Save any unsaved changes.
Then use File>Save As to make bookname.markup.txt; this will
be the starting file for the HTML version. You can also use it as fallback
in case you mess up and need to start the following steps over.
- Re-open bookname.txt.
17. Convert Italic, Bold, and Smallcap (10 min.)
- Fix italics: use Search&Replace, text </?i> (<i> or
</i>), regex. ignore case. Replacement: underscore. Click Replace
All. Italic markup is replaced with underscores.
- Fix bold. Decide if you want to mark bold with $, or =,
or by all uppercase.
- For $, use Search&Replace, text </?b> (<b> or
</b>), regex. Replacement: $. Click Replace All.
- For =, use Search&Replace, text </?b> (<b> or
</b>), regex. Replacement: =. Click Replace All.
- For uppercase, use a regex search for
<b>([\w\s\p{IsPunct}\n]+?)</b>
(<b> then words, numbers, whitespace, and punctuation up to the first </b>).
Replacement: \U$1\E.
Click Search, then Replace until you are confident it works; then Replace All.
Afterward, search for
b>
and hand-edit any remaining bold.
- Uppercase the small-cap, which proofers have changed to
<sc>Title-Cased-Text</sc>:
regex find
<sc>([\w\s\p{IsPunct}\n]+?)</sc>
(<sc> then words, numbers, whitespace, and punctuation up to the first </sc>).
Replacement: \U$1\E.
Click Search, then Replace until you are confident it works; then Replace All.
Afterward, search for
sc>
and hand-edit any remaining bold.
- Save the document.
18. Rewrap and Clear Rewrap Markers (10-30 min.)
- Use Edit>Select All then Selection>Rewrap Selection.
Wait while rewrap completes.
- Page through entire text, looking for improper indentation. If found,
re-open, clicking NO when asked if you want to
save the edits. Find and fix broken rewrap markups.
Save, repeat this step.
- Remove all rewrap markers: see this page.
- Open Fixup>Footnote Fixup; tidy up footnotes. See
this discussion.
- Use Fixup>Remove End-of-line Spaces.
- Use Fixup>Run Gutcheck and resolve any new issues.
- Save the document.
19. Determine Character Coding (5-60 min.)
Character codes are described on
this page. You need to
be certain which the coding your etext uses.
Search&Replace, text [\x7f-\xff], regex. If nothing is
found, the book contains only characters from the 7-bit ASCII set.
If 8-bit characters are found, use Fixup> Run Word Frequency Routine.
In the report window, click the Unicode>FF button.
Words containing a multi-byte (Unicode) character are listed.
If none are shown, the text is probably, but not certainly, Latin-1;
it is possible that you have inserted Unicode punctuation
that is not part of a word. But you should be aware if you have
used the Unicode menu or pasted a Unicode symbol.
If your text is Latin-1 or UTF-8, read or reread
this item
of the Gutenberg FAQ. Decide if you will upload a single version
of if you should do the division into ASCII and high-bit versions.
If you will do it, then:
- Open bookname.txt.
- Save-as bookname_asc.txt (not as
bookname.asc, or the .bin file for
bookname.txt will be lost)
- Search with the regex \P{IsAscii} (note uppercase P) to step through
each character not 7-bit ASCII
- Replace each, using some consistent substitution scheme
- Add a "Transcriber's Note" to the head of the text to document
your substitution scheme.
ASCII etext bookname.txt and
optional bookname_asc.txt
are now complete!
20. Prepare HTML Edition (4-? hr.)
- Open bookname.markup.txt; save it as
bookname.html.
- Open the HTML Fixup Palette; see this page
for use.
- Apply Automatic HTML conversion
and wait while it completes.
- Save the file and open it in a browser.
Scroll through looking for systematic errors.
(Title pages, tables, etc. will look terrible; no matter).
If automatic conversion messed up, delete the
.html file and start this step over.
- Page through the book looking for text that was not handled well
by automatic HTML generation, in particular:
-
- Title pages.
- Tables.
- Tables of Contents and Indexes, which are best formatted
using unsigned lists, rather than the
markup Guiguts generates for /$..$/.
- Illustrations, either using the
Guiguts markup tool or your own HTML.
- Use the element-markup buttons
in the HTML Palette to mark up these areas.
Use regex replacements to make systematic changes.
- Open the file in one or more web browsers
(Internet Explorer and at least one other such as Firefox or Netscape).
Page through the entire book.
-
- Where you see a problem, make a correction in Guiguts, save the file,
and click the "reload" button in each browser.
- Hyperlink page references in text, TOC, and index
(discussed here).
- Apply the Link Checker and
correct all issues found.
- Optionally, apply HTML Tidy.
- Open the WC3 Validator, upload
the file, and correct the nits it picks.
21. Process Hi-resolution Images (? hr.)
If the project manager provided high-resolution scans of the
images in the text, use an image-processing program such as The Gimp or
Adobe Photoshop Elements to optimize them—see
DPWiki's Guide to Image Processing.
You can do this before, during, or after
HTML step 20. For each image:
- Load image from the originals folder (see step 1)
- Straighten it (almost all scanned images are off-perpendicular;
some are trapezoidal owing to the page not being flat on the scan window).
- Crop it to remove all redundant white space and borders
(provide margins and borders with CSS styling of the <img> markup).
- Correct the contrast (you must have calibrated your monitor,
see this page).
- Sharpen.
- Correct any major scratches, freckles, dirt, etc.
- Save in the subfolder images using appropriate type:
- Line drawings in .png at 8 bits per pixel (not
the default 24-bit RGB format).
- Photographs as .jpg with an appropriate compression level
such as (Photoshop) level 6.
- Page through entire HTML book making sure that each image is
being loaded correctly.
22. Upload the Finished Project
- Prepare a new folder whose name contains the full project ID,
e.g. projectID40213e6231ac4
- Move into it only the files to be uploaded:
- the etext file bookname.txt
- the bookname.bin matching
bookname.txt
(some PPVers use Guiguts too!)
- the optional 7-bit ASCII version
bookname_asc.txt
- the HTML file if one was made
- the images folder if required by HTML
- Do not include the original images
or the page images; do not include any work files or scratch files
or auto-backup editions. If you have been told to
upload directly to the Gutenberg site for a whitewasher, do not include
the .bin file.
- Mac OS X users: the Finder creates hidden files named
.DS_Store in any folder you display as a window.
Although harmless, these files are not wanted by PG. Get rid of them
as follows: In a terminal window, cd
into the project folder. Run this command, copying its arcane syntax
precisely:
find . -name ".DS_Store" -exec rm '{}' \;
- Linux and Mac users: cd into this folder and
use the command unix2dos *.txt; unix2dos *.html.
- Use a zip utility to make a zip archive of this folder.
(OS X users: do not use the Finder command
File> Create Archive of...; it creates a gzip file that PG cannot use.
Use a zip command in a terminal window.)
- Open the project page in your web browser and at the bottom,
select Change Project State: Upload for Verification.
- On the next page, write comments noting any unusual features of
the book. Especially note the character code (7-bit, Latin-1, or Unicode)
of the .txt file.
Mention the _asc.txt file if you made one.
- use the Browse button to navigate to the
zipped file. Wait while it uploads, which can take quite a while.
Finished! Treat yourself to your favorite beverage! When refreshed,
return to Step 1.