If you are typing your text, use a word processing program such as Microsoft Word. As you type, leave out non-prose categories of text listed in Step 2: Prepare your text for measurement. Once you've typed your text, you can go on to Step 4: Convert your text into a plain text file.
If you are scanning your text, save the pages as a PDF file, and load them into an optical character recognition (OCR) program. Newer versions of Adobe Acrobat include this option:
- Open your PDF scan file in Adobe Acrobat.
- Under the Document menu, select Recognize Text Using OCR.
- When the OCR process is finished running, select Save As... from the File menu.
- In the Save as type drop-down box, choose Text (Plain) (*.txt).
- Open the resulting file in a text editor such as Notepad (double-click and your computer should automatically know how to open it).
- Check the plain-text file very carefully for errors, paying particular attention to sentence beginnings and endings.
OCR results can be inconsistent, particularly involving punctuation marks such as periods not being recognized at all (see "OCR caveats" on page 8). These inconsistencies will impact the accuracy of the Lexile measure. Unfortunately the repair of a poorly OCRed file can sometimes take as long as typing the sample.
A better OCR option is ABBYY FineReader, for which a free trial version is available online. Go to http://www.abbyy.com/ and select the "Downloads" tab. This program is not simple to use, but it does enable you to convert a complete book - or a larger portion of it than typing would give you - to the plain text format that the Lexile Analyzer requires.
If a text is converted from hard copy to electronic format using an OCR application, some problems may occur in the conversion process. These tend to relate to the specific software used, and special care should be taken to ensure the accuracy of the electronic facsimile. Some examples of common OCR errors are as follows:
- A letter "m" might convert as "rn."
- A comma followed by a quotation mark (,") might be interpreted as a slash w/an apostrophe (/').
- Verify that all the intended punctuation is in place-no periods missing, semicolons omitted, etc.
- If a polysyllabic word is split between two lines with a hyphen, the hyphen should be removed and the word made whole.
These examples are not exhaustive but are representative of the kinds of OCR errors to look out for when preparing a text for measurement.
>> Go on to Step 4: Convert your text into a plain text file
<< Go back to Step 2: Prepare your text for measurement