Optimal affine image normalization approach for optical character recognition

Optical character recognition (OCR) in images captured from arbitrary angles requires preliminary normalization, i.e. a geometric transformation resulting in an image as if it was captured at an angle suitable for OCR. In most cases, a surface containing characters can be considered flat, and a pinhole model can be adopted for a camera. Thus, in theory, the normalization should be projective. Usually, the camera optical axis is approximately perpendicular to the document surface, so the projective normalization can be replaced with an affine one without a significant loss of accuracy. An affine image transformation is performed significantly faster than a projective normalization, which is important for OCR on mobile devices. In this work, we propose a fast approach for image normalization. It utilizes an affine normalization instead of a projective one if there is no significant loss of accuracy. The approach is based on a proposed criterion for the normalization accuracy: root mean square (RMS) coordinate discrepancies over the region of interest (ROI). The problem of optimal affine normalization according to this criterion is considered. We have established that this unconstrained optimization is quadratic and can be reduced to a problem of fractional quadratic functions integration over the ROI. The latter was solved analytically in the case of OCR where the ROI consists of rectangles. The proposed approach is generalized for various cases when instead of the affine transform its special cases are used: scaling, translation, shearing, and their superposition, allowing the image normalization procedure to be further accelerated.


Projective image normalization
Optical character recognition (OCR) in images captured from arbitrary angles requires preliminary normalization, i.e. geometric transformation resulting in an image as if it was captured from the angle suitable for OCR. In most cases, a surface containing characters can be considered flat, and a pinhole model can be adopted for a camera. Thus, in theory, the normalization should be projective. The latter is commonly employed as a part of image preprocessing for various computer vision tasks, such as document OCR [1,2,3,4,5], vehicle license plate recognition [6], TV-stream recognition based on a picture of a TV screen [7], chessboard recognition [8], artificial on-road obstacles detection [9], object detection using shape features (detection of the shape of an object within an image and matching that shape with an object from database) [10,11,12,13,14,15], surface parameters monitored from satellites (time-temporal variability of sea surface temperature, determining the velocity of the cloud masses motion, etc.) [16], reconstruction of plans and maps from the aerial photographs [17,18], and many more. In addition, the projective normalization of photographs of documents helps human perception [19].

Affine approximation of projective normalization
Usually, the camera optical axis is approximately perpendicular to the document surface. In such cases, a projection model of the affine camera can be utilized [20], and a projective normalization can be replaced with a commonly used affine normalization without significant loss of accuracy [21,22]. The affine image transformation is performed significantly faster than the projective normalization [22,23], which is helpful for fast image normalization. The latter is important for the OCR on mobile devices [24].
The idea of the replacement of the projective transformation with the affine one in practice was mentioned in [25] back in 1985. This property was implemented in [26] for the simplification of the mathematical calculations. The affine approximation is commonly used in image completion [27] and rendering [23,28,29]. In [30] the projective transformation is replaced with the simpler affine transformation in order to avoid overfitting. A similar idea is utilized in «weak-perspective projection» [31,32,33], where the approximation is partial. Use of the affine invariant methods instead of significantly more complicated projective invariant methods is common in keypoints technology [34,35,36], as well as in the related problem of salient region detection [37], and both of these methods are essentially camera angle invariant. There is also a division into affine and projective methods in stereo reconstruction [38]. The utilization of affine transformation for image rendering and normalization results in loss of accuracy [22,39], but the accuracy was not formally introduced.
Affine approximation of the given projective normalization aiming to accelerate the latter is considered for the first time in this work.

Definitions and notation
Let I input be an input image (usually a photograph) for the normalization. Let its known projective normalization be a perfect normalization H. Let an image formed as the result of the application of H to I input be a projectively normalized image I proj (see Fig. 1). An arbitrary affine approximation of the projective normalization H is denoted as A: Â = H . Thus A is the affine normalization of the image I input . The resulting image I affin of A applied to I input is an affinely normalized image. be Cartesian coordinates of pixels on the plane of I proj . We define the residual projective distortion as which for each point of the scene transforms coordinates r of its image on I proj into coordinates V(r) of its image on I affin . Ideally, the residual distortion V is an identical transformation. For the formalization of pointwise error of affine normalization we define the coordinate discrepancies [40] (see Fig. 2 In some cases, it is possible to evaluate beforehand which part of the projectively normalized image I proj is of interest. Such region of interest (ROI) is denoted as R   2 . Otherwise, R denotes the entire I proj .

Root mean square criterion of normalization accuracy
As a criterion of normalization accuracy, we choose the widely used criterion of root mean square (RMS) coordinate discrepancies. In cases of ROI with finite nonzero area 0 < S(R) <  and non-empty finite ROI 0 < |R| <  the criterion is defined as follows: Such criterion was used, for example, for the automatic normalization of distortion caused by lens distortion and camera movement [41]. The same criterion was also employed for the calculation of the accuracy of the aligned image formation via projectors matrix [42]. Using definitions (1) and (2) we establish the dependence of criterion on the affine transformation A:

Problem formulation
Now, as the criterion of normalization accuracy is set, we can formulate a problem of search for the optimal affine approximation of the projective normalization H: We will also refer to it as the optimal affine normalization. The correspondent optimum is denoted as The projective normalization H is parametrized by the homography matrix then P ~ H -1 . Because matrices P and H are homogenous we assume The affine transformation A is parametrized by matrix Thus, problem (5) of the optimal transformation search can be formulated as the problem of optimal matrix search 2 2 * 2 2 P( ) d for 0< ( )< , Earlier in [43] we proved that this problem is convex.

The applicability limits
Consider function def 31 32 33 The line Z(r) = 0 on I proj image surface is denoted as the horizon. Let us consider ROI R which does not lie strictly on one part outlined by the horizon. Points on the horizon turn the denominator of the transformation (7) into zero, which corresponds to them being infinitely remote on the input image I input plane. Hence these points cannot be present in I input image because of its finite size. In reality, these points of a scene are situated in the /2 angle of camera view. Points that belong to the different sides of the horizon cannot be simultaneously present in I input image, because points that belong to one of these sides are situated in > / 2 angle of camera view, i.e. located behind the camera. Thus, at least a part of the ROI is absent in the input I input image. In this case, the RMS criterion of accuracy (4) is meaningless. Hence we will consider only cases when the ROI lies strictly on one of the sides outlined by the horizon: This condition also guarantees the correctness of the RMS accuracy criterion definition (4).

ROI of non-zero finite area
Let us consider the ROI with the non-zero finite area, then from (10) follows We will express the affine transformation matrix A as the vector a a a a a a A A a a a a a a Let us specify the transformation P through its components: and introduce a matrix function Q: which allows the problem vectorization (13) to be defined as follows: Note that the target function of the problem (17) is quadratic: We will refer to the coefficients K as the target coefficients. As was shown above, the target coefficients are defined by the homography matrix and the ROI: If the target coefficients are calculated, the problem (17) can be presented as and can be solved analytically: Thus, the problem of the unconstrained normalization (13) is quadratic and can be reduced to the problem of the fractional quadratic functions integration over the ROI. Obviously, for an arbitrary ROI this integration can be performed only numerically.

Non-empty finite ROI
Similar reasoning can be suggested for the non-empty finite ROI R. In this case, according to (10): while the definition (20) of functions f and the expression (22) for analytical calculation of a * are preserved. Hence the RMS criterion (3) can be calculated as Thus, in all considered cases (3) the optimal affine normalization is calculated according to the general Algorithm 1.
Notes on Step 1 regarding the calculation of the target coefficients K = K (H, R). The cases of the non-empty finite ROI and the ROI of non-zero finite areas are discussed above. Let us specify the corresponding Algorithms (2 and 3) for the target coefficients calculation: Algorithm 1. Algorithm of the optimal affine image normalization search Input:  matrix H 3×3 of projective normalization H,  ROI R   2 : 0 < S(R) <  or 0 < |R| < . Output:  matrix A *  2×3 of optimal affine approximation H on R: (9),  the optimal value of RMS accuracy criterion * 2 L : (6).
Step 1. Based on H and R target coefficients are calculated K = K (H, R).

Algorithm 2. Calculation of the target coefficients for the non-empty finite ROI Input:
 matrix H 3×3 of projective normalization H,  non-empty finite ROI 0 < |R| < . Output: Target coefficients K = K (H, R).

Algorithm 3. Numerical estimation of the target coefficients for the ROI of non-zero finite area Input:
 matrix H 3×3 of projective normalization H,  ROI R   2 of the non-zero finite area: 0 < S(R) < . Output: Numerical estimation of the target coefficients K = K (P, R).
of uniformly distributed on R points is generated.
In order to get the conventional statistical estimation, result of (23) should be multiplied by S (R) /n at the final step of Algorithm 3. This multiplication is skipped intentionally, because on the one hand, it does not change the output of the Algorithm 1 (see expression (22)), and on the other hand, the accurate calculation of the area S (R) in special cases complicates the Algorithm 3, and generally might not be even possible.
Further we will analytically calculate the target coefficients K for some special cases of the ROI R with non-zero finite area.

Rectangular ROI
We have analytically calculated the target coefficients K for the orthotropic rectangular ROI above. Now we will generalize this solution for the arbitrary oriented rectangular ROI. Let us introduce the latter as an image of the rotation U of an orthotropic rectangle R 0 : Which means Thus we can define sets of scaling, translation, shearing, and their superposition matrices. But we cannot introduce, for example, a set of rotation matrices. Let us provide some examples.
The isotropic scaling The superposition of translation and shearing: The superposition of shearing and anisotropic scaling: The full affine transformation  L L  , then * affin I is calculated via the application of the transformation A * to the image I input . If otherwise, I proj is calculated via the application of the transformation H to the image I input .
An example of the optimal affine normalization is illustrated in Fig. 1. Each of these three images I input , I proj and I affin has three channels of 1434 × 966 pixels. Computations were performed on a computer with an Intel Core i3 4030U processor. OpenCV library was utilized for image normalization. As the ROI R we have chosen the composition of three rectangles of text fields on a credit card. The analytical search of the optimal affine normalization (Algorithm 4) in this case on the average of 10 4 repetitions took t c = 0.191 milliseconds. The application of the resulting affine normalization took t a = 5.90 milliseconds, while the projective normalization took t p = 9.91 milliseconds. Thus, the Algorithm (5) allowed for the t p /(t c + t a )  1.63 times faster performance.
And Fig. 1 shows that even though the camera optical axis is oriented significantly off the perpendicular to the document surface, text fields of the credit card were normalized with high accuracy.

Conclusion
In this work, we propose a fast approach for image normalization. It utilizes the affine normalization instead of projective if there is no significant loss of accuracy. The approach is based on a proposed criterion for the normalization accuracy: root mean square (RMS) coordinate discrepancies over the region of interest (ROI). The problem of optimal affine normalization according to this criterion is considered. We have established that this unconstrained optimization is quadratic and can be reduced to the problem of fractional quadratic functions integration over the ROI. The latter was solved analytically in the case of OCR where the ROI consists of rectangles. The proposed approach is generalized for various cases when instead of an affine transform its special cases are used: scaling, translation, shearing, and their superposition, allowing the image normalization process to be further accelerated.