Automatic identification of handwritten script facilitates many important applications such as automatic transcription of multi-lingual documents and search for documents on the web containing a particular script. The increase in usage of handheld devices which accept handwritten input is creating a huge volume of handwritten data. This work proposes a method to classify words and lines in an on-line handwritten document. The classification is based on 11 different spatial and temporal features extracted from the strokes of the words.
We present a hierarchical approach for extracting homogeneous regions in on-line documents. The problem of identifying and processing ruled and unruled tables, text and drawings is addressed. The on-line document is first segmented into regions with only text stroke and regions with both text and non-text strokes. The text region is further classified as unruled table or plain text. Stroke clustering is used to segment the non-text regions. Each non-text segment is then classified as drawing, ruled table or underlined keyword using stroke properties. The individual regions are processed and the results are assembled to identify the structure of the on-line document.
0 comments:
Post a Comment