Machine reading of forms is a natural continuation of the optical character-recognition activity carried out during the past years. Form reading is one of the most time-consuming and expensive activities if carried out by hand. For this reason, it has been an active area of research for many years in Office Automation. In general, the form-reading activity can be roughly described in terms of the following subtasks: form identification, field isolation, bounding-box removal, line isolation, segmentation, segment re-combination, recognition and correction. Some of these operations are strictly problem dependent, such as form identification or field isolation, because the form types and their layouts can not be generalized. Other activities, such as bounding-box removal or recognition are more general and can be used for different forms and frameworks. The form reader developed in this research is able to produce answers from a digital image of forms scanned from paper and microfilm, representing the Industry or Employer section of the Official 1990 U.S. Census Forms.

The research activity on optical-character recognition brought to an analog k-nearest-neighbor chip development by SGS Thomson, Agrate. Its use on a real character-image database was also considered. The character-image database is obtained extracting 48k feature vectors from the images. Splitting of the 48k elements into 2k blocks was performed in order to improve the overall recognition capabilities and to meet the hardware constraints of the chips. The 24 analog k-nearest-neighbor chips are installed onto 6 VME boards and the overall system works driven by a commercial PC.



