Machine Reading of Hand-Written Forms
Machine reading of forms is a natural continuation of the
optical character-recognition activity carried out during the past
years. Form reading
is one of the most time-consuming and expensive activities if carried
out by hand. For this reason, it has been an active area of research
for many years in Office Automation.
In general, the form-reading activity can
be roughly described in terms of the following subtasks: form
identification, field isolation, bounding-box removal, line isolation,
segmentation, segment re-combination, recognition and correction. Some
of these operations are strictly problem dependent, such as form
identification or field isolation, because the form types and their
layouts can not be generalized. Other activities, such as bounding-box
removal or recognition are more general and can be used for
different forms and frameworks. The form reader developed in this
research is able to produce answers from a digital image of forms
scanned from paper and microfilm, representing the Industry or
Employer section of the Official 1990 U.S. Census Forms.
The research activity on optical-character recognition brought to an
analog k-nearest-neighbor chip development by SGS Thomson, Agrate.
Its use on a real character-image database was also considered. The
character-image database is obtained extracting 48k feature vectors
from the images. Splitting of the 48k elements into 2k blocks was
performed in order to improve the overall recognition capabilities and
to meet the hardware constraints of the chips. The 24 analog
k-nearest-neighbor chips are installed onto 6 VME boards and the
overall system works driven by a commercial PC.
- Zs. Kovács
- R. Guerrieri
- G. Baccarani
For local sites only,