- BCR 2D
- MICR CMC7
- MICR E13B
- Form Identification
- Image Enhancement
- TWAIN and ISIS Scanning
- Black Border Removal
- Lines Removal
- Dynamic Thresholding
- Layout Analysis
- Quality Control
- File Format Conversion
- Book Curvature Correction
- Keystone Correction
CHR - Cursive Handwritten Recognition
The optical recognition of handwritten text in italics is an open problem, an active area of research.
Inside the recognition of handwritten text in italics can be a
further important distinction distinction between on line recognition and off line recognition.
When we talk of recognition of writing on line we refers at technology
which serves to recognize words written on devices capable of reproduce information
space-time related to pen movements made by the writer.
In practice, the recognition is based on vectorial data
consisting of the coordinates of the lines of ink and pen
up/pen down sampled in real-time while writing.
It is therefore the analysis of a dynamic representation of the writing
Speaking instead of off line handwriting recognition refers to the technology that is used
to recognize words written on paper support
in which this is the only representation as an image of text to be recognized.
In practice, the recognition is done on raster data, made from just pixels on/off
obtained by digitization of paper documents using a scanner. It is therefore the analysis of a static representation of writing.
With a PDA or a PC tablet you can easly experiment with the input of text
using the recognition of cursive write online sometime integrated therein;
this technology is, in fact, very usage on device that require the input using a pen or a graphic tablet or a touch screen.
The contrary it is very rare find the software package able to make the
recognition of write off line on images and document acquired with scanner:
this is due to the intrinsic difficulty in the recognition off line to operate on amount of information
much lower and much noise than those on which recognition works online.
The first generation technology, only available up to now,
has a fundamental requirement which in many cases is not a problem but in
some circumstances can be a big problem:
for each type of data to be read is required the use of a dictionary that contains all the possible words that can be recognized.
In practice to recognize a "name" and a "last name" is necessary to have
a dictionary of names and a dictionary of last name that contain all possible items recognizable.
Therefore, if the system was undergoing an image containing a word not in the dictionary, you would receive as a result of
the word most similar among those contained in the dictionary, without which the system can groped to produce output in the unknown word.
In addition to the difficulty of having to get the dictionaries
to use and not to be able to read extra dictionary words,
even if written in an ideal way,
a further problem is given by the high
error rate found in a application real to increase of dimension of dictionary,
Starting from these considerations, Recogniform Technologies, italian company fully,
several years ago he decided to start a news research project relating precisely the data capture
of the handwritten cursive, with the objective of achieving a technology
also usable without using vocabularies and with recognition rates are sufficiently high to be able to be used with profit in every application field.
At the base of this innovative research project is the simple idea of being
able to recognize words by identifying the sub-sequences
by comparing ink strokes contained in a reference set.
A very simplistic example is that of a reference set containing only the words
"problem" and "value", which can recognize the words "test", "illness", "bows", "sea", "weapon", "rome "and so on.
However, the variability in the shape of the manuscript italics produced
by a population of writers translates into the possibility that writers belonging to the same group can use N different graphemes to represent the same N grams.
In other words, the variability in the way of writing is such that the same thing can be written in ways even very different among them, according to the style of writing of each.
The recognition technology for off line handwritten cursive
developed by Recogniform Technologies was named CHR, Cursive Handwritten Recognition,
and required the collaboration of prestigious italian university laboratories,
huge investments and more than three years of research and experimentation to solve this problem.
The architecture consists of several sub-systems specialist that, from the image of the word to be recognized,
able to produce its representation in terms of characters.
To understand how it works, shortly, we describe the different phases in sequence from the different sub-systems that act in a cascade.
The first step deals with the pre-processing of the image, namely the cleaning and standardization of the stroke of ink.
Each word is rectified, is cleaned by image noise and dirt, it is corrected by the inclination of the stroke,
the relationship between core is normalized, ascenders and descenders
Later, we use the routing, in practice you try to reconstruct the dynamic sequence of ink strokes as they were presumably traced by the writer.
This is more important operation: the greater or lesser success of this step, determined by quality of images
and of write style used, define the quality of final interpretation.
In the beginning of this process, the word result divided on sequences of strokes,
each of which corresponds to the trace of ink produced by the writer between the instant of time in which it is resting the pen on the sheet and the time instant in which the pen is lifted momentarily interrupting the writing sequence.
The next step of segmentation generates atomic sections that
correspond to the elementary motor acts performed by the writer.
In fact, according to the studies on the generation of cursive writing the complex movements
required to produce the cursive writing can be seen as a composition of elementary movements that correspond to elementary forms, calls stroke.
These strokes are drawn one after the other and the writing is flowing due to the temporal overlap of the elementary movements that produce the stroke.
The stroke represent the primitives form that each writer
uses in the writing process and relevant studies conducted in the field of vision
have shown that the curvature of the traits, plays a key role in the perception of shapes and of their composition.
It is known that in correspondence of the lines of connection between adjacent strokes are generated significant changes of curvature,
the process of segmentation positions of the points of cut in correspondence of the ink strokes in which the changes occur most relevant curve, trying to discard the changes of curvature spurious still generated by the processes of writing and digitization.
In the next step called Description each stroke previously identified by the segmentation process is labeled according to its curvature change, suitably quantized.
This step, the matching, consists in comparing ink strokes related to the word to be recognized
with those relating to a set of reference words of which the transcription is also available in ASCII code.
In this way they are extracted sequences similar stroke and with them the possible interpretations.
Finally, the process ended with the classification stage which produces the possible interpretations of the word going to consider all possible combinations of matches obtained from the previous step and calculating the same level of reliability.
As can be seen the architecture is quite complex, especially due to the difficulty of the task, but the results that you are getting the first applications in the laboratory are surprising and extremely encouraging.
Our products that implement the CHR technology
For more information on the CHR technology, it is worthwhile to know how and know our solutions that implement it, you can send us an e-mail to email@example.com or fill in the form below.