A scanner is an electromechanical device that converts hard-copy text and graphics into digital form for processing and storage in a computer.
Scanners can save untold hours of tedious manual retyping. Suppose you wrote a book a long time ago, and have lost the digital files (or used an old-fashioned typewriter, and never had any digital files!). The hard copy of that book sits in your basement, awaiting the massive editing that can turn it into a great novel. It needs the power of your computer’s word processor. But you can’t deal with the prospect of retyping its 1000 pages. A scanner, equipped with optical character recognition (OCR), can do away with most of the hard labor involved in getting a hard-copy manuscript onto disk.
A good scanner can be had for a couple of hundred dollars. But the value of optical scanning is hard to measure. For many entrepreneurs, it can make the difference between staying afloat and going bankrupt. It can do the work of one or two full-time typists for a tiny fraction of the longterm cost. Big companies can save money, too. Lawyers and doctors find scanners invaluable for backing up files of all kinds. Aside from storing text, scanners can make digital copies of vital papers, records, and receipts, which can be easily backed up on CD-R or CD-RW media.
A typical scanner can render color images, text, photographs, and everything else needed to make a complete, accurate digital record of any document. Color scanners use three different light beams (red, blue, and green) to get three different images, which are processed and combined in much the same way as a color television camera works. The image resolution of a scanner is measured in dots per inch (dpi), just as is done with printers. The higher the dpi specification, the more detail the scanner can see.
For reliable scanning of text and most images, a resolution of at least 300 dpi is recommended. Virtually all scanners meet this requirement. For images, greater detail translates into more memory consumed. Color increases the amount of memory or storage that an image takes up, if the image resolution remains constant.
Scanners come in three basic configurations. The scanner that’s best for you will depend on what you want to do with it, and on how much money you’re willing to spend for it. The cheapest type of scanner is a handheld scanner. It looks something like a miniature vacuumcleaner head, or one of the bar-code readers in retail stores. You roll the unit over the paper containing the text and/or graphics you want to scan. Because the unit is not as wide as most pages, you’ll have to make two or three passes over the page. Handheld scanners are preferred by people who scan small images, such as snapshots. They are light in weight, and need almost no desk space.
One potential problem is that you might try to scan too fast. Some handheld scanners have speed indicators that tell you if you’re going too fast. Another potential difficulty is not getting a straight-line scan. Most handheld units have built-in guides (like miniature rolling pins) that minimize this problem. If you want to scan a book or magazine, a flatbed scanner is much easier to use than a handheld scanner. The unit looks something like a photocopier. Using a flatbed scanner is similar to working a small photocopy machine. You lay the page, photo, or sheet down on a clear glass, and the scanning head moves past it, picking up the image. Flatbed scanners consume desk space, which, if you have a couple of printers and a fax machine, might already be at a premium.
A sheet scanner, also called a feedthrough scanner, resembles a fax machine (and in fact, many of these units can do double duty as fax machines). As its name implies, this type of scanner pulls sheets of paper through, one by one. You can stack several pages, one on top of the other, and the machine will automatically feed and scan them. However, you can’t scan bound books or magazines as you can with a flatbed scanner—unless you’re willing to rip out individual pages.
Even the best scanners make some mistakes when used with OCR. This is especially true if text contains nonstandard symbols. Highly technical material presents the worst problems. Some mathematical symbols are so esoteric that the average person (let alone a machine) is befuddled by them. Ink spots, stray markings, and smudges on a page can cause scanning errors, in much the same way as background noise confuses a speech recognition system.
A human reader can often tell what a printed letter should be, even when it is severely mutilated. But computers lack human intuition. To some extent this can be corrected by a built-in spell checker. Some OCR programs have spell checking, but this introduces its own set of problems because it, too, is imperfect.
If a scanner doesn’t recognize a character, it will usually print a “tag”—a blank space, underline, or default symbol such as @ or #. Scanned text must always be carefully proofread, and corrections made with word-processing software, after the data has been stored on the hard drive.