Not easy.
The trick is to pre-process the scan of the text you want to "read". My suggestion:
1. Whip up the contrast to take out as much of the noise as possible. There may be other steps you can take to improve accuracy.
2. Take the image and split out the characters
3. Standardise the - size of the character image. 1 pixel => 1 NN input.
4. Process each character separately.
Step 2 is fairly hard, but I'd guess the line running along the top of the script will help, as will the fact that each character is discrete.
Obviously you'll need to train your NN to recognise the characters. I suggest a BackPropagation Network, trained against as many examples of each character you can find. You'll need a hidden layer (number of neurons is a matter of trial and error), and I suggest 1 output neuron per Devnagari character.
This[
^] is a good run-through of training BPN network.
Two things I found helpful for my NN was to add a momentum factor during training, and to a lesser extent a boltzmann machine (you can Google both these things), though my problem domain was quite different so YMMV.