Menu

Recognition of handwritten letters using neural networks. Construction of trained logical neural networks. What to do if the child does not remember letters

Thrush

This project does not claim to be the first in the world and is not considered as a competitor FineReader, but I hope that the idea of ​​character pattern recognition using the Euler characteristic will be new.

Introduction to the Euler characteristic of an image.

The basic idea is that we take a black and white image, and assuming that 0 is a white pixel and 1 is a black pixel, then the whole image will be a matrix of zeros and ones. In this case, a black and white image can be represented as a set of 2 by 2 pixel fragments, all possible combinations are shown in the figure:

On each image pic1, pic2,... a red square of the counting step in the algorithm is shown, inside which one of the fragments F from the picture above. At each step, each fragment is summed, as a result for the image Original we get the set: , further it will be called the Euler characteristic of the image or the characteristic set.


COMMENT: in practice, the value F0 (for an Original image, this value is 8) is not used, since it is the background of the image. Therefore, 15 values ​​will be used, starting from F1 to F15.

Properties of the Euler characteristic of an image.

  1. The value of the feature set is unique, in other words, there are no two images with the same Euler feature.
  2. There is no conversion algorithm from the feature set to the original image, the only way- it's overkill.

What is the text recognition algorithm?

The idea of ​​letter recognition is that we pre-compute the Euler characteristic for all characters of the alphabet of the language and save it to the knowledge base. Then, for parts of the recognizable image, we will calculate the Euler characteristic and look for it in the knowledge base.

Stages of recognition:

  1. The image can be both black and white and color, so the first step is the approximation of the image, that is, obtaining black and white from it.
  2. We make a pixel-by-pixel pass through the entire image in order to find black pixels. When a filled pixel is found, a recursive operation is launched to search for all filled pixels adjacent to the found one and subsequent ones. As a result, we will get a fragment of the image, which can be either a whole character or a part of it, or "garbage" that should be discarded.
  3. After finding all unrelated parts of the image, the Euler characteristic is calculated for each.
  4. Next, the analyzer enters the work, which, passing through each fragment, determines whether the value of its Euler characteristic is in the knowledge base. If we find a value, then we consider that this is a recognized fragment of the image, otherwise we leave it for further study.
  5. The unrecognized parts of the image are subjected to heuristic analysis, that is, I try to find the most appropriate value in the knowledge base. If it was not possible to find, then an attempt is made to "glue" the nearby fragments, and for them to search for the result in the knowledge base. What is "bonding" for? The fact is that not all letters consist of one continuous image, let's say "!" The exclamation mark contains 2 segments (a wand and a dot), so before looking for it in the knowledge base, you need to calculate the total value of the Euler characteristic from both parts. If, even after gluing with neighboring segments, an acceptable result could not be found, then the fragment is considered garbage and skipped.

System composition:

  1. Knowledge base- a file or files originally created by me or someone else, containing characteristic character sets and required for recognition.
  2. Core- contains the main functions that perform recognition
  3. Generator- a module for creating a knowledge base.

ClearType and anti-aliasing.

So, at the input we have a recognizable image, and the goal is to make it black and white, suitable for starting the recognition process. It would seem, what could be simpler, we consider all white pixels as 0, and all the rest as 1, but not everything is so simple. The text in the image can be anti-aliased or not anti-aliased. Anti-aliased characters look smooth and without corners, while non-anti-aliased ones will look on modern monitors with pixels visible to the eye along the contour. With the advent of LCD (liquid crystal) screens, ClearType (for Windows) and other types of anti-aliasing were created, which take advantage of the features of the monitor matrix. Changes the colors of the text image pixels, after which it looks much "softer". To see the result of smoothing, you can type some letter (or text) for example in mspaint, zoom in, and your text has turned into some kind of multi-colored mosaic.

What's the matter? Why do we see a regular symbol on a small scale? Are our eyes deceiving us? The fact is that the LCD monitor pixel does not consist of a single pixel that can take on the desired color, but of 3 subpixels of 3 colors, which are enough to obtain the desired color. Therefore, the goal of ClearType is to get the most pleasing to the eye text using the feature of the LCD monitor matrix, and this is achieved using subpixel rendering. Who has a "Magnifier" can, for the purpose of experiment, increase any place on the screen and see the matrix as in the picture below.

The figure shows a square of 3x3 LCD matrix pixels.

Attention! This feature makes it difficult to obtain a black and white image and greatly affects the result, since it does not always make it possible to obtain the same image, the Euler characteristic of which is stored in the knowledge base. Thus, the difference between the images makes it necessary to perform heuristic analysis, which may not always be successful.


Obtaining a black and white image.

The algorithms for converting color to black and white found on the Internet did not suit me with the quality. After their application, the images of symbols subjected to sub-pixel rendering became different in width, breaks in the lines of letters and incomprehensible garbage appeared. As a result, I decided to get a black and white image by analyzing the brightness of a pixel. Black considered all pixels brighter (greater than the value) 130 units, the rest are white. This method not ideal, and still leads to an unsatisfactory result if the brightness of the text changes, but at least it received images similar to the values ​​​​in the knowledge base. The implementation can be viewed in the LuminosityApproximator class.

Knowledge base.

The initial idea of ​​filling the knowledge base was such that for each letter of the language I would calculate the Euler characteristic of the resulting image of the symbol for 140 fonts that are installed on my computer (C:\Windows\Fonts), add all the font types (Regular, Fatty, Italics) and sizes from 8 to 32, thus I will cover all, or almost all, variations of letters and the base will become universal, but unfortunately it turned out not to be as good as it seems. With these conditions, I got this:

  1. The knowledge base file turned out to be quite large (about 3 megabytes) for Russian and in English. Despite the fact that the Euler characteristic is stored as a simple string of 15 digits, and the file itself is a compressed archive (DeflateStream), which is then unpacked in memory.
  2. It takes me about 10 seconds to deserialize the knowledge base. At the same time, the time for comparing characteristic sets suffered. The function for calculating GetHashCode() could not be found, so I had to compare bit by bit. And compared to a knowledge base of 3-5 fonts, the time of text analysis with a database of 140 fonts increased by 30-50 times. At the same time, the same characteristic sets are not saved in the knowledge base, despite the fact that some characters in different fonts may look the same and be similar, even if there are, for example, 20 and 21 fonts.

Therefore, I had to create a small knowledge base that goes inside the Core module, and makes it possible to test the functionality. There is a very serious problem in filling the base. Not all fonts display small characters correctly. Let's say the character "e" when rendered in 8 font size named "Franklin Gothic Medium" is obtained as:

And not much like the original. Moreover, if you add it to the knowledge base, then this will greatly worsen the results of the heuristic, as it misleads the analysis of symbols similar to this one. D This character was obtained in different fonts for different letters. The process of filling the knowledge base itself must be controlled so that each image of a symbol, before being saved to the knowledge base, is checked by a person for compliance with the letter. But, unfortunately, I don't have that much energy and time.

Symbol search algorithm.

I will say right away that initially I underestimated this problem with the search and forgot that characters can consist of several parts. It seemed to me that in the course of pixel-by-pixel passage I would encounter a symbol, find its parts, if any, combine them and analyze them. A normal pass would look like this, I find the letter "H" (In the knowledge base) and consider that all characters below the top dot and above the bottom dot belong to the current line and should be parsed in conjunction:

But this is an ideal situation, while in the course of recognition I had to deal with torn images, which, in addition to everything, could have a huge amount of garbage located next to the text:


In this image, the words "yes" will try to explain the complexity of the analysis. We will assume that this is a complete string, but at the same time b13 and i6 are garbage fragments as a result of approximation. The character "y" lacks a dot, and none of the characters are present in the knowledge base to say with certainty that we are dealing with a line of text from "c" to "i" of the line. And the height of the line is very important for us, since for gluing we need to know how much the nearest fragments should be "glued" and analyzed. After all, there may be a situation that we inadvertently begin to glue the characters of two strings and the results of such recognition will be far from ideal.

Heuristics in pattern analysis.


What is a heuristic in image recognition?
This is the process by which a characteristic set that is not present in the knowledge base is recognized as the correct letter of the alphabet. I thought for a long time how to analyze, and as a result, the most successful algorithm turned out to be this:

  1. I find all characteristic sets in the knowledge base that have the largest number values F fragments matches the recognizable image.
  2. Next, I select only those characteristic sets that have a difference of no more than +- 1 unit with the recognizable image by unequal F values ​​of the fragment: -1< F < 1. И это все подсчитывается для каждой буквы алфавита.
  3. Then I find the symbol that has largest number occurrences. Considering it the result of heuristic analysis.
This algorithm does not give the best results on small character images (7 - 12 font size) . But it may be due to the fact that the knowledge base contains characteristic sets for similar images of different symbols.

Usage example in C#.

An example of the beginning of image recognition image. The result variable will contain the text:

var recognizer = new TextRecognizer(container); varreport = recognizer.Recognize(image); // raw text. var result = report.RawText(); // List of all fragments and recognition state for each ones. var fragments = report.Symbols;

Demo project.

For a visual demonstration of the work, I wrote WPF Appendix. It is launched from a project named " Qocr.Application.Wpf". An example of a window with the result of recognition is below:

To recognize an image, you need:

  • presses "New Image" selects an image for recognition
  • Using the mode " black and white" you can see which image will be analyzed. If you see an extremely low quality image, then do not expect good results. To improve the results, you can try to write a color image converter to black and white yourself.
  • Choose a language "Language".
  • Clicks recognize "Recognize".
All image fragments should become marked with an orange or green border.
English text recognition example:

It is required to create a neural network to recognize 26 letters of the Latin alphabet. We will assume that there is a system for reading characters, which represents each character in the form of a matrix. For example, the character A can be represented as shown in Fig. 2.22.

Rice. 2.22. Symbol representation

The actual character reading system does not work perfectly, and the characters themselves differ in style. Therefore, for example, for the symbol A, units may be located in the wrong cells, as shown in Fig. 2.22. In addition, non-zero values ​​may occur outside the character outline. The cells corresponding to the symbol outline may contain values ​​other than 1. We will call all distortions noise.

MATLAB has a function prprob, which returns a matrix , each column of which represents a matrix written as a vector that describes the corresponding letter (the first column describes the letter A, the second column describes the letter B, and so on). Function prprob also returns a target matrix of size , each column of which contains one 1 in the row corresponding to the letter number, with all other elements of the column being zero. For example, the first column corresponding to the letter A contains 1 in the first row.

Example. Define a template for the letter A (program Template_A.m).

% Template for the letter A

Prprob;

i=1; % number of letter A

v=alphabet(:,i); % vector corresponding to the letter A

template=reshape(v, 5,7)";

In addition to the function already described prprob functions used in the program reshapes, which forms the matrix , and after transposition - (make sure that it is impossible to immediately form the matrix ), and the function plotchar, which draws the 35 elements of the vector as a lattice. After running the program Template_A.m we get the matrix template and letter A template as shown in fig. 2.23.

Rice. 2.23. Letter A pattern formed

To recognize letters of the Latin alphabet, it is necessary to build a neural network with 35 inputs and 26 neurons in the output layer. Let us take the number of neurons in the hidden layer equal to 10 (such a number of neurons was chosen experimentally). If there are difficulties during training, then the number of neurons of this level can be increased.



The pattern recognition network is built by the function patternnet. Please note that when creating a network, the number of neurons in the input and output layers is not specified. These parameters are implicitly set when training the network.

Consider a program for recognizing letters of the Latin alphabet Char_recognition.m

% Latin alphabet recognition program

Prprob; % Formation of input and target vectors

size(alphabet);

size(targets);

% Network creation

Train(net,P,T);

% Training in the presence of noise

P = ;

Train(netn,P,T);

Train(netn,P,T);

% Network Test

noise_rage=0:0.05:0.5; % Array of noise levels (standard deviations of noise

for noiselevel=noise_rage

for i=1:max_test

% Test for network 1

% Test for network 2

title("Network error");

xlabel("Noise level");

ylabel("error percentage");

Operator = prprob; form an array of input vectors alphabet size with alphabet character patterns and an array of target vectors targets.

The network is created by the operator net=patternnet. Let's accept the default network settings. The network is trained first in the absence of noise. The network is then trained on 10 sets of ideal and noisy vectors. Two sets of ideal vectors are used so that the network retains the ability to classify ideal vectors (no noise). After training, the network "forgot" how to classify some of the noise-free vectors. Therefore, the network should be trained again on ideal vectors.

The following program fragment performs training in the absence of noise :

% Network training in the absence of noise

Train(net,P,T);

disp("Network training in the absence of noise is completed. Press Enter");

Training in the presence of noise is carried out using two ideal and two noisy copies of the input vectors. The noise was simulated by pseudo-random normally distributed numbers with zero mean and standard deviation of 0.1 and 0.2. Training in the presence of noise produces the following program fragment:

% Training in the presence of noise

netn = net; % retention of the trained network

T = ;

P = ;

Train(netn,P,T);

disp("Network training in the presence of noise is completed. Press Enter");

Since the network was trained in the presence of noise, it makes sense to repeat the training without noise to ensure the correct classification of ideal vectors:

% Retraining in the absence of noise

Train(netn,P,T);

disp("Retraining of the network in the absence of noise is completed. Press Enter");

The network was tested for two network structures: network 1 trained on ideal vectors and network 2 trained on noisy sequences. Noise with a mean value of 0 and a standard deviation of 0 to 0.5 with a step of 0.05 was added to the input vectors. For each noise level, 10 noisy vectors were formed for each symbol, and the network output was calculated (it is desirable to increase the number of noisy vectors, but this will significantly increase the program running time). The network is trained to form a one in the only element of the output vector whose position corresponds to the number of the recognized letter, and fill the rest of the vector with zeros. The network output will never form an output vector consisting of exactly 1 and 0. Therefore, under noise conditions, the output vector is processed by the function compet, which transforms the output vector so that the largest output is set to 1 and all other outputs are set to 0.

The corresponding program fragment looks like:

% Perform test for each noise level

for noiselevel=noise_rage

for i=1:max_test

P=alphabet+randn(35, 26)*noiselevel;

% Test for network 1

errors1=errors1+sum(sum(abs(AA-T)))/2;

% Test for network 2

errors2=errors2+sum(sum(abs(AAn-T)))/2;

% Average error values ​​(max_test sequences of 26 target vectors)

network1=;

network2=;

plot(noise_rage, network1*100, noise_rage, network2*100);

title("Network error");

xlabel("Noise level");

ylabel("error percentage");

legend("Ideal input vectors","Noisy input vectors");

disp("Test completed");

When calculating the recognition error, for example, errors1=errors1+sum(sum(abs(AA-T)))/2, it is taken into account that in case of incorrect recognition, two elements of the output vector and the target vector do not match, therefore, when calculating the error, division by 2 is performed Sum sum(abs(AA-T)) calculates the number of mismatched elements for a single instance. sum(sum(abs(AA-T))) calculates the number of mismatched elements across all examples.

Graphs of the recognition error by the network trained on ideal input vectors and the network trained on noisy vectors are shown in Fig. 2.24. From fig. 2.24 it can be seen that the network trained on noisy images gives a small error, and the network could not be trained on ideal input vectors.

Rice. 2.24. Network errors depending on the noise level

Let's check the work of the trained network (the trained network must be present in the MATLAB workspace). Program Recognition_J.m generates a noisy vector for the letter J and recognizes the letter. Function randn generates a pseudo-random number distributed according to the normal law with zero mathematical expectation and unit standard deviation. Random number with mathematical expectation m and standard deviation d obtained by the formula m+randn*d(in a programme m=0, d=0.2).

noisyJ = alphabet(:,10)+randn(35,1) * 0.2;

plotchar(noisyJ);

disp("Noisy character. Press Enter");

A2 = netn(noisyJ);

A2 = compet(A2);

ns = find(A2 == 1);

disp("Character recognized");

plotchar(alphabet(:,ns));

The program outputs the number of the recognized letter, the noisy letter template (Fig. 2.25), and the recognized letter template (2.26).

Rice. 2.25. Noisy letter template

Rice. 2.26. Recognized letter pattern

Thus, the considered programs demonstrate the principles of image recognition using neural networks. Training the network on various sets of noisy data made it possible to train the network to work with images distorted by noise.

Tasks

1. Do all the examples given.

2. Experience recognition various letters

3. Investigate the effect of noise in programs on the accuracy of character recognition.

Function Approximation

Letter recognition exercise. Various difficulty levels. The letter is masked with noise. Sometimes you need to be quick-witted in order to understand by elimination what kind of letter was in the task.

Teaching children to read and the letters of the Russian alphabet. What letter is shown? Choose the correct answer on the right.

What letter is hidden. Online game for early development children. Recognition of letters of the Russian alphabet

How to learn the letters of the Russian alphabet

Often the letters of the Russian alphabet begin to be taught in order, as it is written in the primer. In fact, the letters need to be learned in the order of their frequency of use. I'll give you a little hint - the letters in the center of the keyboard are used more often than those on the periphery. Therefore, first you need to memorize A, P, R, O .... and leave for a snack such as Y, X, F, W ...

What is better - to teach a child to read letters or syllables

Many teachers teach in syllables at once. I suggest getting around this little problem and instead of learning syllables, play online games. So the child learns and plays at the same time. Rather, it seems to him that he is playing and at the same time involuntarily repeats the necessary sounds.

The advantage of online games is that if you did not pronounce any letter correctly, then the simulator will patiently repeat the correct answer for you until you remember.

Do alphabets help to learn letters. Why paper primers are still used in teaching practice

Traditionally, paper primers are used to teach letters. Their benefits are undeniable. If you drop the paper version on the floor, then you can not be afraid that the device will break. Primers can be opened on a specific page and placed in a conspicuous place. All this is not in electronic devices.

However, programmable reading simulators also have certain advantages, for example, they can speak, unlike paper counterparts. Therefore, both paper and electronic sources can be recommended.

Do online exercises help memorize letters

The main emphasis when using electronic and online games is that a person involuntarily repeats the same information many times. The more often there is a repetition, the more firmly the information is entered into the consciousness and brain. Therefore, online exercises are a very useful addition to traditional cubes and paper books.

At what age should a child be sent to educational centers

The growth rate is different. Usually. girls up to a certain age are ahead of boys in development. Girls start talking earlier, they are more socially oriented and more amenable to learning. boys, on the contrary, are often big autists - who walk on their own. From this it is possible to conclude that girls learn to read a little earlier than boys. But, this is only an external scheme. Each child is individual and his readiness for learning can be tested in practice. Does the child enjoy attending classes? does anything remain in his mind after he has unlearned?

Maybe try to study on your own, especially since riding the bus takes time, and no one understands your baby better than mom and dad.

What to do if the child does not remember letters

Learning is hard. And it doesn't matter if it's an adult or a baby. Learning is very, very difficult. In addition, children learn only in the game. Another fact is that in order to learn something, you need to practice or repeat it many times. Therefore, it is not surprising that children remember letters very poorly.

There is a separate number of children who begin to speak late and at the same time confuse not only letters, but also sounds. with these guys you need to draw letters together, use all possible materials for this, and cereals, and matches, and pebbles, pencils - everything that is at hand. Draw you - ask the child to repeat.

You can perform graphic dictations, you can play draw and repeat.

What to do if the baby confuses letters, for example, D and T

If a child confuses letters, it means that it is too early to switch to reading words. Go back and repeat the letters. Often children confuse voiced and unvoiced letters or similar in spelling, for example, P and R. Practicing repetition can help. For example, you can sculpt letters together, you can make letters from the body, for example, by spreading your arms to the sides to represent the letter T.

How to teach a child to memorize letters if he does not want to

repetition is the mother of learning. Repeat letters in words, repeat letters in syllables, try to guess the letters. Let the child write the letter, and you try to guess. And, you can do it the other way around - try to make a letter out of rice grains, and your son or daughter will guess the way what kind of letter it is. You can write with a stick in the sand.

Why does the letter not pronounce correctly. How to teach a child to pronounce letters clearly and clearly?

Gaps can be at the level of physiology. The person does not hear himself correctly. or he thinks he is speaking correctly. it is very easy to check this - just record the conversation on a voice recorder and let the child read it.

It can also be elementary lack of training. Different people need a different number of times to repeat information before it is remembered, and the child is no exception. It must be repeated many times and in different situations before he begins to pronounce the letters and sounds correctly.

What should also be noted is that children need to be loved and dealt with periodically. Do not start processes.

How to teach your child the alphabet to prepare for school

Need to deal with children game form. Just as stated on this site. Another secret to learning is to do it in small portions. Children cannot hold their attention for more than 5 minutes. Therefore, it is simply useless to study longer.

What letters do you need to start memorizing the alphabet with?

You need to start memorizing letters with commonly used letters. The second secret is to memorize the letters that make up the name of the child, the name of mom and dad, you can add the names of your brother and sister, grandparents to these words. These are my favorite names.

By the way, if you are learning to type blindly, then the first word with which you need to start typing training is again your first and last name.

Do I need to memorize the letters of the English alphabet to the baby

Knowing the English alphabet doesn't hurt. At school, they do not study the alphabet, but immediately begin to read, leaving the alphabet at the mercy of their parents. it is also worth noting that large and small English letters look different and they must be memorized. If your child started talking late, then most likely, memorizing Latin letters will be a problem for him.

Is it possible to teach a child to read immediately with words

Written Russian looks the same as spoken Russian, unlike English or French, so memorizing words

How to remember numbers for a preschooler

Draw numbers, count sticks, when you walk count red and white cars, count who walks down the street more men or women. Turn everything into a game.

Try to spell the text yourself - not only will it take a long time, but it also doesn’t look like how we actually speak. Grown-ups don't spell, except if the word is unfamiliar or foreign language. Then, in order to hear it, they read it slowly and carefully pronouncing the words.

Why does a preschooler forget letters. Learning to read in games

Why does the kid forget the letters, although he learned them yesterday

Usually, a child easily remembers some letters, and not so others. The role of an adult is to note what is not working for his ward and give additional tasks.

Other important thing- regularity. Since for a child all learning is, frankly, cramming and repetition, the learning process should be such that the information is repeated at certain intervals.

Ebbinghaus (read more about this on Wikipedia) studied how quickly meaningless information for this person is forgotten and came to the conclusion that 40% of the information is forgotten in the first twenty minutes. And, if it is impossible to say exactly what this or that letter means, then this is tantamount to the fact that the letter is completely unfamiliar. There must be an unambiguous 100% recognition.

Repeat, repeat, repeat

For example, you train warehouses (syllable, combination of letters) ON THE, and the child more or less learned to recognize and read the combination. Add the syllable BUT to the tasks, and ask them to read the words helping to read the letters unfamiliar to the child. However, the child can click on the syllables himself and listen to the computer read.

and with a probability of 0.1 - to the class C 2 . The stated problem can be solved using an MSP with N inputs and M outputs, trained to give the output vector c when the input is p.

In the learning process, the network builds a mapping P → C. It is not possible to get this mapping as a whole, but you can get an arbitrary number of pairs ( p → c) connected by mapping. For an arbitrary vector p at the input, we can get approximate probabilities of belonging to classes at the output.

It often turns out that the components of the output vector can be less than 0 or greater than 1, and the second condition (1) is satisfied only approximately. Inaccuracy is a consequence of the analog nature of neural networks. Most of the results obtained with the help of neural networks are inaccurate. In addition, when training the network, the specified conditions imposed on the probabilities are not directly introduced into the network, but are implicitly contained in the set of data on which the network is trained. This is the second reason for the incorrectness of the result.

There are other ways to formalize.

We will represent the letters in the form of bitmaps (Fig.).

Rice. . Dotted image.

The dark cell-pixel in the image corresponds to I ij = 1, light - I ij = 0 . The task is to determine from the image the letter that was presented.

Let's build an SME with N i X Nj inputs, where each input corresponds to one pixel: x k = I ij . The pixel brightnesses will be the components of the input vector.

As output signals, we choose the probabilities that the presented image corresponds to a given letter:

The network calculates the output:

where is the exit c 1 = 0.9 means, for example, that the image of the letter "A" is presented, and the network is 90% sure of this, the output c 2 \u003d 0.1 - that the image corresponded to the letter "B" with a probability of 10%, etc.

There is another way: the inputs of the network are selected in the same way, and the output is only one, number m presented letter. The network learns to give meaning m according to the provided image I:



(I ij) → m

In this case, the disadvantage is that letters that have close m numbers but dissimilar images may be confused by the network during recognition.