Keywords: maximum entropy, total variation, image restoration


Netherlands Forensic Institute

Volmerlaan 17

2288 GD Rijswijk


Email [email protected]




In the last decade many systems for forensic images have been developed. Further in banks, shops and roads many video camera's are making images of robberies and other criminal activities. Often images are blurred due to defocusing, movement or poor lighting condition. For this reason often the question for deblurring the images is requested at forensic laboratories.

In this research report different methods for deblurring of digital images are compared Since the computer speed is getting faster each year, more complex methods for deblurring can be used. In the last five years many new methods for deblurring images are developed. They are developed for aerospace and medicine research (Maximum Entropy method) or for military purposes (Total Variation). A comparison is made with other methods as Wiener-filtering. In preliminary research it appeared however that simple methods as local contrast enhancement might give a better result.

It appears that the Maximum Entropy-method in combination with a premodel of the estimated blur function, will give a better result than the Wiener restoration. The resulting image is more stable. Further for larger areas the Total Variation method works well for noise suppression. Sometimes simpler methods can be more suitable for deblurring the images. A problem in forensic images is that often the blur function is not known. More efforts should be made to improve the Total Variation Method and reduce the computing time.


1         Table of Contents



1 Table of Contents. 1

2 Introduction. 2

3 Blur and Noise. 3

4 Maximum Entropy. 3

4.1 Introduction. 3

4.2 Classical Maximum Entropy. 3

4.3 Bayesian probability theory: the Cox axioms. 4

4.4 The axioms of maximum entropy. 4

4.5 Quantification. 6

4.5.1 Choice of regularization parameter 7

4.6 Premodel by Gull 8

5 Minimal Total Variation. 9

5.1 MTV definitions. 10

5.2 Discretization. 13

5.3 MTV algorithm.. 15

6 Experiments. 15

6.1 Image processing package Khoros. 15

6.2 Images for experiments. 16

6.3 Methods used in Experiments. 16

6.3.1 Experiment with Wiener-filter 17

6.3.2 Experiment with Cannon-filter 17

6.3.3 Experiments with Maximum Entropy. 18

6.3.4 Experiments with MTV.. 19

7 Conclusions. 20


2         Introduction

In forensic investigations, often the visual information is kept in for example handwriting, hairs, fibers, fingerprints, toolmarks and video-images. At present more often the images are stored and digitized by an image acquisition system. By acquiring the image in the system, there will be often some degradation of the image in the system. For that reason, image restoration and enhancement techniques for improving degraded forensic images attract high attention. Under some conditions these techniques are able to enhance obscured, blurred or noisy images. In practice sometimes the following degradations can be restored:

        Over- and Under Exposure

        Pictures Taken out of focus

        Point Spread Function of the lens

        Smear by movement of camera or subject

        Grain structures

        Noise from system

        Jitter in the video system


Since the techniques for over- and under-exposure are well known, in this report the techniques for the Gaussian blur and noise blur will be handled.


In the past often linear techniques (Wiener filter) have been applied to restorate the image. From aerospace research it appeared however that non-linear methods as the Maximum Entropy-method might be more appropriate to restore images. Further Rudin have published on the Total Variation-techniques to restore blurred image. Many new improvements on those methods have been developed. The developers of those algorithms claim that the results are better than the Wiener restoration or linear techniques. This is explained in the fact video-systems and photographic emulsions have non-linearity.

This research report is a result of a literature and experimental study of two months at the National Research Institute of Police Science in Japan by Zeno Geradts and implementation of the MTV-algorithm at the Netherlands Forensic Science Laboratory. The background and the principles of the different methods are described and some results of artificial and real cases are given. The reason for choosing the non-linear methods will be given. In the first paragraphs the physical backgrounds of blur and noise will be given.


The development in integrated image processing systems in new video system also plays an important role in the acquisition of those images. For this reason attention is given to image processing algorithms that are implemented or expected in image acquisition systems. Other sources for degradation in acquiring the digital images are described.

Further the practical implementation of the algorithms in our imaging software is given. A short manual on installation procedures and use of the system is described. Some examples are given for actual cases and simulated cases.

In the final part of this report conclusions and suggestions for further research are given.


3         Blur and Noise

In forensic investigation nowadays often images are acquired by digitizing images. Also in image databases as the system Drugfire in which cartridge cases of firearms are stored. Another example are video images that are taken from for example bank robberies. In this system blur and noise can be added in the acquired image.

Since the astronomical, military and medical researcher could use parallel computers, they developed many new algorithms. Nowadays the faster computers are also available for forensic laboratories and the algorithms can be used for actual cases and databases. In forensics research often the whole procedure of image processing has to be proven in court. For this reason it is important to know the background of the source for blur and noise. The results of image processing should be handled with great care and qualified examiners should interpret the results of the image processing algorithms.

The next sources for blur are often described in literature:

        Motion blur caused by movement of the system or object

        Gaussian blur caused by defocusing of the system

        Blur caused by air pollution

        Blur caused by the lens system


Also the next sources for noise are well known:

        Quantization noise that is caused because the data in each pixel (separate element of an image) is quantified in grey-values from 0 till 255.

        system noise (white noise), for example noise which is a result of under lightening


In the past few decades many image restoration and reconstruction algorithms have been developed. Most of those methods where applied in the astronomy or medical equipment like X-ray photographs. In astronomy the Gaussian blur often can be determined from the system. This is caused because in practice the images of many objects (for example stars) can be used to determine the blur function. In forensic imaging however often the Gaussian blur function is not known. The shape of the blur function is the other unknown parameter.

4         Maximum Entropy


4.1    Introduction

In this chapter the maximum entropy will be explained. First the basics of the classical maximum entropy will be given as described by Gull Since we use a specially developed toolbox from the NASA research fund for the maximum entropy, the next models will be explained in more detail:

        Classical Bayesian maximum entropy

        Premodel by Gull

        Chi-method of determining regularization parameters by Thompson


The maximum entropy-filter will guarantee positive restoration and is based on modeling the object as a probability density function through Maxwell-Boltzmann statistical arguments. If the object f is normalized to unit energy, then each fi scalar value can be interpreted as a probability (possibly the probability that grey-value is present at the i-th sample of the object). Then the entropy of the object would be given by: Equation 3-1


4.2    Classical Maximum Entropy

In this chapter the Maximum Entropy is explained further. Since the maximum Entropy algorithm requires many higher order functions and numerical approximations, this chapter will be rather mathematical. First the base of the Bayesian Maximum Entropy is described, after this the mathematical part is described in detail.

A simple mode of reasoning allows us to construct general theories. If there is a general theory at all, it must apply to particular cases. In particular, if we already know the answer for the simple cases, this constrains the general theory by falsifying all those which gave the wrong answer. If enough such cases can be found to constrain the general theory completely, then there will not be any freedom be left and the theory will be fully defined.

In real life general theories have some "loose ends", such as the cosmological terms in the general relativity. Those loose ends are allowed by experimental errors on the constraining observations. However, sometimes there are some real loose ends in the theory.


This mode of reasoning will be used three times here, leading to the Bayesian probability theory, the maximum entropy (MaxEnt) and finally to images. Firstly, if there is a general language for assumptions, it must be that of ordinary probability theory, with our assumptions being quantified as probabilities. So we will use the probability theory . However, probability theory describes how we must modulate our assumptions in the light of evidence, but it does not tell us how to assign the prior probabilities that we need in order to start the scheme. Secondly, if there is a general way of assigning positive additive distributions (such as probability distributions), then it must be MaxEnt: one source of this argument is Shore and Johnson. This argument does not address in the reliability of MaxEnt distributions, in the sense that the distributions are not quantified. Thirdly if there is a general quantification of distributions, their probabilities must be exponential in their entropy. Frieden, Gull and Daniel prove this


Of course it remains possible that there is no general theory, in which case all arguments would break down. Different problems need different theories. Although in sociological studies are analyzed in different ways, there are no known examples in which Bayesian methods give a demonstrably incorrect result.

The classic MaxEnt formulae in fact will give much freedom, and the underlying mathematics can be turned into advantage.

The following chapters will summarize the use of Classic MaxEnt in the real image reconstruction.


4.3    Bayesian probability theory: the Cox axioms


In science we will base theories based on experiences in the past. Logical reasoning, added by mathematics, is the intellectual tool we bring in for making the new theories applicable to our experiences.

It is well accepted in science that if we have the different possibilities' k, l, m, n, a minimal requirement is that we are able to rank our preferences consistently

.Equation 3-2

In this way the transitive ranking can be mapped onto real number, by assigning numerical codes P(k), P(l):Equation 3-3

If there is a general common language it must apply for simple cases. The next axioms are given by Cox and are difficult to argue against:


Axiom A:

If we first specify our preference for k being true, then specify our preference for l being true (given k), then we have implicitly defined our preference for k and l together.


Axiom B:

If we specify our preference for k being true, then we have implicitly specified our preference for its negation k

Considering these mild requirements, Cox showed that there is a mapping of the original code P into other codes' pr that obey the usual rules of the probability theory.

Equation 3-4

Equation 3-5


Although many efforts are expended on arguments to contradict axioms A en B with an axiom C, no such contradictory axiom has been demonstrated yet. Bays' theorem itself tells us how to modulate the probabilities in accordance with extra evidence. It does not assign probabilities in the first place. For this we use MaxEnt.


4.4    The axioms of maximum entropy

The probability distribution p(x) of a variable x is an example of a positive additive distribution. It is positive in construction. It is additive in the sense that the overall probability in a domain D equals the sum of the probabilities in any decomposition into sub domains, and we write it as: Equation 3-6

In an optical image (with incoherent light) we will have also a positive, additive distribution.

It is simpler to investigate the general problem of assigning a positive additive distribution (PAD) then the specific problem of probability distribution.


We investigate the general problem of assigning PAD f(x), given some definitive but uncompleted constraints on it. If there is a rule for assigning a PAD, then it must give sensible results in simple cases. The "four entropy-axioms"- so-called because they lead to the entropic formulae relate to such cases. Proofs of the consequences of the axioms as formulated are given by Skilling.


Axiom I : "Subset Independence"

A separate treatment of individual separate distributions should give the same assignment as joint treatment of their union. More formally: if constraint C1 applies to f(x) in domain x D1 and C2 applies to a separate domain x D2, then the assignment

should give: Equation 3-7

where f(D|C) means the PAD assigned in domain D on basis of constraints C. The consequence of this constraint is that the PAD f should be assigned by maximizing over f some integral of the form: Equation 3-8

: function as yet unknown

m: Lesbesgue measure associated with x which must be given before the integral can be defined

The effect of this basic axiom is to eliminate all cross-terms between the different domains.



Axiom 2: "co-ordinate invariance"

The PAD should transform as a density under co-ordinate transformations. The consequence of this axiom is that the PAD should be assigned by maximising over f some integral of invariants:Equation 3-9

And still is unknown. The most important axiom is next.


Axiom 3: System Independence

If a portion q of a population has a certain property, then the proportion of any sub-population having that property should properly assign as q. For example if 1/3 of the people have blue eyes in the Netherlands, then the proportion of the left handed people having blue eyes should also be assigned a value of 1/3.


The consequence of this axiom is that the only integral of invariants whose maximum always selects these assignments, regardless of any other subdivision that may be present, is:Equation 3-10

where c is a constant

Axiom 4: Scaling

In the absence of additional information, the PAD should be assigned equal to the given measure (instead of being merely proportional). This a practical condition, since this is more convenient than that it is a deep requirement.

Consequence: The PAD f should be assigned by maximizing over f: Equation 3-11

m(x)dx: ensure that the global maximum of S, at f(x) = m(x) is zero, which is both convenient and

required for other purposes ix

Caused by the entropic form, we call S as defined in Equation 3-11 the entropy of the positive, additive distribution f. It reduces to the usual cross-entropy formula: Equation 3-12

if f and m are normalized.

The MaxEnt-entropy gives sensible results in simple cases, so if there is general assignment method, it should be MaxEnt, since no other axiom 5 has been proven yet.

The arguments above do not proof that MaxEnt assignment is reliable. Would a slightly different PAD be much more inferior? Furthermore experimental data are usually noisy, so they do not constitute testable information about a PAD-function. Noisy images will define the likelihood or conditional probability p(data|f) as a function of f. In a proper Bayesian analysis we need the quantified probability p(f), because we have to set a measure m.


4.5    Quantification

The reliability of an estimate is usually described in terms of ranges and domains, leading us to investigate probability integrals over domains V of possible PADs f(x), digitized for convenience into r cells as (f1, .Equation 3-13

Where M(f) is the measure on the space of PAD's. By definition the single PAD we most prefer is the most probable. We identify this with the PAD assigned by MaxEnt. Hence p(f|m) must be of the form :Equation 3-14



We do not know which function this might be. Now S has the units (dimensions) of the total f, so this monotonic function must incorporate a dimensional constant . This is not an absolute constant, so that :Equation 3-15


Where is a monotonic function of dimensionless argument and:Equation 3-16

In the partition function that ensures that p(f|m) is properly normalized.

To find , we repeat the same kind of reasoning we had before. Finding a simple case for which the results are known and arguing that any general theory must apply for this specific example. Suppose it is raining (drops each with the quantum size q) at r cells (i=1,2....r) at random with Poisson expectation . This arrangement satisfies the entropic axioms, and the probability of occupation numbers n is known (from symmetry and straightforward counting of possible outcomes) to be: Equation 3-17

Define fi = ni q and mi =i.q to remain finite as the quantum size q is allowed to approach zero. Then the image-space of f becomes constructed from micro cells of volume qr, each associated with one lattice-point of integers (n1, So we have, as q tends to 0:
Equation 3-18

Because we take n large, we may Stirling's formula: Equation 3-19

to obtain (accurately to within O(1/n)):Equation 3-20

Here we recognize the entropy on r cells: Equation 3-21

so that Equation 3-22

If we compare this with the previous formula Equation 3-15, then we can identify: Equation 3-23

Further we can identify: Equation 3-24

It is important that the square-root factors in the Stirling's-formula could enable us to derive the measure M. This allowed us to make the passage between the point wise probability comparisons and full probability integrals over domains.

A natural interpretation of the measure is as the invariant volume of a metric g defined on the space. The space of the PADs will be:
Equation 3-25

which equals the entropy curvature S 2S / f.f. This quantity is also known as the Fisher information matrix.

This analysis used large numbers of small quanta q, so that is large. This limit will also ensure us that each ni will almost certainly be close the expected value I. Further we can expect that: Equation 3-26

To summarize if there is a general prior for positive additive distributions in f, it must be: Equation 3-27

and furthermore: Equation 3-28

where Equation 3-29

In this quantified prior only one variable exists which cannot be fixed a priori because it is dimensional.


4.5.1      Choice of regularization parameter

One drawback of the classical maximum entropy-method is that images do not consist out of raindrops of which are thrown randomly as is given in Equation 3-15. The results of the classical maximum entropy restoration were often disappointing. In the literature different ways for better results are described. For this reason Gull developed some better models for choosing the regularization parameter.

For the regularization parameter, Gullx described a way to determine this variable. The problem with this variable is that it is dimensional, and its best-fitting value varies quite strongly with the type and quality of the data that is available. Further the influence of noise in images is very important for the results of this kind of image processing.

For this reason it can only be determined posterior. For this reason Gull went to the other side of the problem, the likelihood that is written as: Equation 3-30


N: number of data

L(f): all details of experimental set-up and accuracy's of measurement

For the common case of independent, Gaussian errors, this reduces to , but other types of errors like Poisson noises are important. In forensic image processing the overall level of noise is most often not known, so we will eventually generalize to: Equation 3-31

However we assume that the errors are known in advance, so that =1. The joint PDF of data and image is: Equation 3-32

Based on Bayes theorem this is also proportional to the posterior probability distribution for f: p(f|D,,m). The maximum of this distribution as a function of f might be our best reconstruction and will occur at the maximum of: Equation 3-33

Methods of choosing by CHI-square-method

For choosing the smoothing parameter the method of regularization is described by Thomson The objective of those methods is to produce an estimate of some unknown quantity f, usually a vector or a function, with the help of the data g. The estimate can be formulated in: Equation 3-34


: a measure of lack of fidelity to the data g

: a regularization functional, measuring roughness in a well-defined sense

: the regularization parameter

The criterion of Equation 3-34 is expressed in very general terms, and specification of a particular method entails making choices of , and . As far as and are concerned, sensible choices are usually made on the basis of context and/or prior knowledge about the true f.

The models of images can be described as linear additive models. If n denotes the number of pixels, we have:

Equation 3-35

in which g and f are n x 1 vectors, H is the n x n point-spread-matrix defining the systematic blur, and n x 1 vector describes the blur. We assume that the noise has a Gaussian distribution.

This will motivate the choice of: Equation 3-36

For we shall choose a quadratic function of the form: Equation 3-37

In this formula C represents a nonnegative-definite matrix. Our estimate f to be denoted as f() is the solution of :Equation 3-38

It follows that: Equation 3-39

In maximum entropy no explicit minimum exists. In practice however this method also seems to work for other problems in which no explicit minimum are existent. To find a value for one can first determine this value by the chi-test and further trying setting the manually. For this we purpose will use the CHI-test by :Equation 3-40

This will be calculated further numerically.


4.6    Premodel by Gull

In practice often the Bayesian choice of will often lead to a reconstruction that is over-fitted. Despite this the classic entropy-model is the correct answer to the problem. If we would modify variable there would be a lack of theoretical grounds to defend this choice. In fact it was the purity of the derivation of the joint pdf that is conditional on the knowledge of the initial model m. This model was first introduced as a measure on the x-scale of pixels, but is a point in the f-space and acts as a "model" here. The only freedom that was left in the hypothesis space is to consider variation in the model. The fact that the model is flat, expresses our lack of prior information about the structure of the picture.

There are large areas of the picture where the lighting is generally light or dark, with interesting details superimposed. There are correlation's from pixel to pixel present in the image that we have ignored so far. Our earlier MaxEnt-Axiom I forbids us to put the pixel-to-pixel-correlation's directly into our p(f|m,). For this reason we must circumvent the axiom in a subtle way.

Suppose we have a case in which four images are combined in one video. We have for example two images of the counter of the bank and two images of the entrance at night. Axiom I is designed to protect us from letting the reconstruction of the images of the counter influence our images of the entrance at night. There is however nothing to prevent us from taking different m0 -levels for each part. In fact is would be extremely desirable to have different values for those levels. So we can subdivide the space further. If we have the counter there might be a desk and some people working behind it. We can subdivide the image further.

If we continue to subdivide, we can get a better model, closer to the reconstruction and we expect that will increase. On the other side by introducing extra parameters there might be some drawbacks. We might get penalties for this in a way that it makes no sense to modify the model. In this model it is very important that no sharp boundaries exist, since this will influence our image.

We are now in a position to formulate a new, flexible hypothesis that is suitable for the reconstruction itself: Equation 3-41


b: model-blur PSF


To complete the analysis we must assign a prior for a premodel m'. We treat m' as an image and use the entropic prior:Equation 3-42


T: S(m',flat)

: new Lagrange multiplier for the m'-space entropy T

We restrict ourselves to the mathematically tractable case of quadratic S and T, gaussian blurs and uniform noise-level, for which L, S and T matrices are all simultaneously diagonal in Fourier space. The Bayesian calculation of ' and ', now yields in :
Equation 3-43

Equation 3-44

where are eigenvalue of BtB and Equation 3-45

The noise level can be estimated as before:Equation 3-46

Here is once again a neat division of the degrees of freedom between S, T and L.

5         Minimal Total Variation

The Total Variation-technique is described by Rudin in a sequence of papers,.. This method claims to improve the blur and remove noise. The method is based on the total variation in the image that is minimized subject to constraints involving the point spread function and the statistics of the noise. The constraints are implemented using the Lagrange multipliers. L. Rudin claims in one of his articles that the method is non-invasive and non-oscillatory.

The theoretical backgrounds are based on shockwave-theories. The TV-based theories involve minimizing:

the absolute integral of the curvature along a feature curve, so also no smooth curves are allowed

The total variation of the magnitude of the gradient of the intensity along and across feature curves

Total variation of curvature along level sets to allow cusps and piecewise convex interfaces

The variational problems (1) and (2) yield fourth order non-linear PDE's for the evolution part of the restoration procedure, while (3) yield in sixth order PDE. The Total Variation Measure will minimize the geometrical oscillations of 'features' in reconstructed images, subject to image degradation constraints.

Noise might be spatially correlated to the solution of this algorithm. A set of global constraints will reduce the resolution of the restoration. Fine features may be grossly altered when the algorithm interprets them to be the results of the statistical variation. A better restoration is obtained by using the Lagrange-multipliers whose steady state solution satisfies constraints that are local to individual curves and regions. The next constraints are considered:

        blurring kernel and noise statistics

        divergence free and hence-area preserving


5.1    MTV definitions

Suppose we have the following problem:
Equation 4-1


: gaussian white noise

u0(x,y): observed intensity function

u(x,y): the image to be restored

Both images are defined on a region in R1. The method is general, since A only needs to be a compact operator in the Lagrange-operator.

The functional measure used in this chapter is the Total Variation that is a functional based on the derivatives of u:

Equation 4-2

The variation of the image is a direct measure of how oscillatory it is. The space of functions of bounded variation is appropriate for discontinuous functions, which is well know from the field of shock calculations. This space is defined on a bounded convex region in Rd, d = 1, 2, 3, whose boundary is Lipschitz continuous.


C(W) is the set of all continuous function's u on W.

C01(W;Rd) is the set of all continuous differentiable Rd functions u on W for which the closure of {u(x) 0} is bounded in W.

Lp(W) is the set of all measurable function u defined on W such that up < for all 1 p < . A measure is given by up = up dW.

Then we define a set of test functions V:
Equation 3V = {v C01(W;Rd) : v(x) 1 for all x W}

The BV semi-norm, or total variation is defined
Equation 4

J0(u) = sup W (-u div v) dx

over all v V. If v C0(W), it follows by using integration by parts that
Equation 5

J0(u) = W u dx

The space of functions of bounded variation on W is defined by
Equation 6

BV(W) = {u L1(W) : J0(u) < }

This functional is to be minimized subject to the constraints
Equation 7

Equation 8

For notational convenience we set:
Equation 9

Equation 10

Equation 11

The reconstruction problem called Minimal Total Variation (MTV) is

Equation 12

minimise {H(u) I(u) = s2, I1(u) = 4-2}

Using Lagrange multipliers this is equivalent to

Equation 13

minimize {H(u) - lI(u) - l1I1(u)}

Euler - Lagrange equations

To solve this minimization problem we need calculus of variations. The variational derivatives are
Equation 14

Equation 15

Equation 16

where A* denotes the ad joint integral operator. The last equation holds if
Equation 17

and equivalently for A*. This is true up to normalization for A convolution. With these results we derive as the Euler Lagrange equations
Equation 18

in W with on d W

Here A* is just the adjoint integral operator.

The quantity is a Lagrange multiplier chosen such that the constraint Integrating both sides lead to l1 = 0 so l1 0.
Equation 19

where the brackets define the integral over W and

dI(u)2 = dI(u), dI(u)

Explicitly this can be written by the gradient-projection method of Rosen


for t > 0, (x,y) W,

, on W and

u(x,y,0) is given such that the constraints are satisfied. The projection step is the gradient projection method which just amounts to updating (t) such that it remains true in time. This follows just if we define:


We now have a method for restoring the image. Suppose that t, the steady state should be the desired restoration.

After this we consider the restoration involving noise. We can begin with pure denoising. We are given an image uo(x,y) in which

The unknown function u(x,y) is the image which we would like to restore, and (x,y) is the noise. This noise is assumed to be white noise. We know the next information:

(Mean one)

(Given variance)

Existence & uniqueness"

A short resumes of existence and uniqueness defined by Rudin and Professor Pierre-Louis Lions will be given in this chapter. First we need some definitions about function spaces.


L(W) is the set of all function's u such that measure {sup u = } = 0.

Hp(W) is the set of all u Lp(W) such that all distributional derivatives Dau, a 2 belong to Lp(W).

Example If u(x) = x then u H1(W), but u Hp(W)

We consider the following minimization problem:

Inf {H(u) I(u) = s2, I1(u) = 4-5

Existence is guaranteed by

Theorem: The minimization problem admits at least one solution if A is compact from L2(W) into L2(W).

Proof: omitted

The fact that A must be compact excludes the case when A is the identity operator. In this case the minimization problem turns out to be highly non-trivial.

Next we study equations of thermo dynamical systems, namely


for x W, t > 0. A is a bounded linear operator from L2(W) into L2(W), A* denotes the ad joint. Further 4-7, i = 1,2, where a is smooth, say C2 with bounded derivatives up to order 2, a is spherically symmetric (rotational invariance). In this case a(p) = p. However, this function induces such singularities that a proper mathematical analysis does not seem to be possible. We therefore study slightly regularized variants of this choice. This regularization also happens when the discretized version of (3.7) is considered. Therefore, we assume that there exists a n (0,1) such that


for all p R2 in the sense of symmetric matrices.

For the real choice of a, a(p) = p, (3.8) does not hold for p near zero and p . In practice we truncate near p = 0 and p large.

The second constraint requires


And l is given by




Furthermore we prescribe an initial condition

ut=0 = u0 in W

and boundary condition

4-12 = 0 on W

Finally we need to assume that u0 satisfies


The existence of some u0 satisfying this equation is not obvious. This existence is certainly ensured by the assumption

R(A) is dense in L2(W)

This condition implies there exist v and w L2(W), or as smooth as we wish by density and continuity of A, satisfying


and it is enough to take u0 = qv + (1 - q)w for some convenient q (0,1).

If A is a convolution operator, i.e. A(u) = h * u then holds if and only if the Fourier transform 4-15 of h satisfies

measure {x R2 4-16(x) = 0} = 0

The image obtained is unique.

The equality holds if u = a w + b, where a, b e R

5.2    Discretization

The numerical method in two spatial dimensions is as follows. We let

xi = ih, yi = jh, i, j = 0, 1, ..., N.

Nh = 1

tn=n D t, n = 0, 1, ...


The modified initial data are chosen that the constraints are satisfied.

The numerical approximation is :
Equation 20


for i,j = 1, ..., N.

The difference operator is defined by 4-19 and similarly for 4-20.

The boundary conditions are

4Equation 21

To determine l we can discrete it by it definition:

un+1 = pn - ln qn

Now the second constraint becomes


using discrete norms. This yields a quadratic equation for l (dropping the superscript n):
Equation 22

For stability a step-size restriction is given :
Equation 23

The solutions are given by

l = c1 c1 4-24





If c2 > 1 we just set l = c1. If c2 < 1, we take l = l-. It has been found [OR93] that this drives the procedure to the manifold defined by the second constraint (3.5).


If ui is on the manifold for i = 1, ..., n, then c2 is close to 0.


In our notation, pn denotes the old value un added by a derivative, whose geometrical interpretation is a line tangent to u. The deviation of the manifold is corrected by lqn. On the manifold we may assume that pn lqn. Then un+1 pn, A(un+1) A(pn) so that s2 = A(un+1) - g2 A(pn) - g2.

With the form of c2 and with the use of the Taylor expansion


A step size restriction is imposed for stability:


which comes from the fact that we assume that our model without constraints can be regarded as a diffusion equation with the MTV-measure as the inverse diffusion constant:


Its explicit difference scheme is

ui,jn+1 := a ui+1,jn + a ui-1,jn + a ui,j+1n + a ui,j-1n + (1- 4a) ui,jn

is stable if 0 4a 1. Furthermore a Dx2 = Dt/MTV, so


And MTV > 1 gives the desired result.


1) If A = I, the solution is not unique: suppose we have the original one-dimensional signal {1, 1, 11, 11, 1, 1}.

Adding N(0,1) noise, the noisy signal is {2, 0, 12, 10, 2, 0}. It's MTV norm is 2 + 12 + -2 + -8 + -2 = 26.

A MTV solution satisfying both constraints is {2, 2, 11, 10, 1, 0}. This solution has MTV norm 20, just like the original signal.

2) Suppose A = {0.25, 0.50, 0.25} and the original signal is the same as in 1), then the blurred signal denotes {1, 3.5, 8.5, 8.5, 3.5, 1}. The 'blur variance' is 4 (2.5)2/6 = 25/6. To satisfy the condition on which a is imaginary,


a noise with variance 1 is sufficient. So in general the second constraint will not be satisfied initially.

5.3    MTV algorithm

The algorithm to find the solution picture is briefly given by

(1) Initialise the pixel matrix u0. Compute the convolution A(g).

(2) (A) Compute 4-32.

Compute vi,jx = 4-33 and similar vi-1,jx, vi,jy and vi,j-1y.

If both 4-34 are 0, then the corresponding v = 0. The MTV norm is zero, so all elements of the picture are equal and nothing can be done anymore.

(B) Compute wx = vi,jx - vi-1,jx and wy = vi,jy - vi,j-1y.

If vi,jx = 1 then wx = 0. Similar for wy. This implies that constant areas remain unaltered.

(C) Compute A(un), A*A(un).

Define p and q and compute A(q), A(p) and l from eq. (3.18).

(D) Determine un+1 and check the constraints and the MTV-norm. Set un = un+1 and go back to (A).

MTV Implementation in Khoros

For the implementation in Khoros the discretization as given in this chapter is implemented by J. Bijhold and A. Lentz at the Netherlands Forensic Science Laboratory. To find a first iteration picture causes problems. In the case of MTV without blur this picture is found by using adding function f to the given signal:


It is chosen such that the modified initial data so that both constraints are satisfied initially, i.e. f has mean zero and L2 norm one. But now information about steps in the order of 4 * s can be lost:

Suppose a one dimensional signal containing an even number of points xi with noise ni, satisfying the constraints, so n has mean zero and variance s2. The observed signal u becomes

ui = xi + ni

We take f as above mentioned and derive as our first approximation

ui0 = xi + ni + fi

Now suppose n and f satisfy fi = ni = (-1)is then

ui+10 - ui0 = xi+1 - xi + 4s

Therefore pictures with low detail and high noise can not be restored without loss of information.

We did not make use of the function m(a,b), as defined in. This function only uses forward difference. It will leave (high) peaks unaltered, while they usually appear due to noise. If necessary, e.g. when a small detail is important, this function can be used.

The min-function should be used, since we want to allow the appearance of discontinuities, i.e. usually a picture contains object that is to be clearly separated from the background. Numerically the border appears as a large number for 4-36. Ignoring this function will lead to a smoothened border and loss of information.

6         Experiments

6.1    Image processing package Khoros

For development of image processing algorithms, the software package Khoros of the University of New Mexico is used. The software is available free from the internet and runs on UNIX-systems. In the Laboratory it is installed on SUN-Sparc-stations (SUN OS 4.1.3) and on a PC with the Linux operating system.

There are two versions of Khoros, version 1.0.5 and version 2.0.2. For the research the version 1.0.5 has been used since this is the most stable version. Further software is available for this version for maximum entropy and some workspaces are available from the National Laboratory of Forensic Science in the Netherlands in which MTV can be tested. A more detailed description on using this software is available in appendix I.

One of the advantages of the Khoros software is that many universities, institutes and companies use the software and develop new algorithms for this. Often the algorithms will be available for other institutes to test them. This works convenient, since the new developments in articles will be available to the scientific community soon. Since the mathematical algorithms get more complex, this saves a lot of time.

Further by the visual development kit cantata, it becomes even easier to modify the algorithms or to implement new algorithms, without having trouble to go deep in the C-source-codes.


6.2    Images for experiments


For the testing of the algorithm experiments we used two kinds of images:

an image of text with blur and noise

an image of blurred Lenna with blur and noise (this image is often used in literature for image processing)

Figure 1: Image of text blurred with a gaussian with a variance of 6 pixels and a noise with a variance of 100 greyvalues


Figure 2: Image of Lenna blurred with a gaussian with a variance of 6 pixels and a noise with a variance of 100 grey-values

For testing the algorithms the images of text and Lenna were blurred with a gaussian blur function with a variance of 6 pixels and gaussian noise with a variance of 100 grey-values. It is assumed that these kinds of blur functions approximate real life experiments. In literature most often these kinds of filters are described for image restorations.

Since our human visual system also operates in many ways like a very good image restoration system, it is important to notice that the results often will be poor. Mathematically spoken the results are good, but for our visual system the results appear not very well.

The real case of the number plate is restored in different ways; with different settings blur functions and noise-distributions. Since the real noise function and the real blur function are not known, we tried different settings for the algorithms for restoring the number plate. This case is difficult, since the number of the car contains only a few pixels of information.


6.3    Methods used in Experiments

For the experiments we used the next methods for deblurring:

        Wiener restoration

        Canon filtering

        Expectation Maximization

        Maximum Entropy - chi-test

        Maximum Entropy - Bayesian method

        Maximum Entropy pre-model by Gull

        MTV-method by L. Rudin

The Maximum Entropy-method (premodel by Gull) and the MTV-method are tested more intensively with different noise-levels and blur functions. The other restoration methods are described and some experiments are shown with a Gaussian blur function with a variance of 6 and a Gaussian noise of 100 to 255 grey-values.

6.3.1      Experiment with Wiener-filter

For the Wiener-filter we used the workspace that is shown in Appendix 2. This workspace contains a normal Wiener-restoration in which the Point Spread Function and the Wiener-parameter can be set. The Wiener-parameter is chosen such that a good result is obtained.

Figure 3: upper part: Wiener-restoration of image, lower part : original image


6.3.2      Experiment with Cannon-filter

The mathematical formulas of the Cannon-filter are not further explained in this report The algorithm is described in Cannon and Trussel compromises between the low pass effect of the Wiener filter and the early singularity if a plain inverse filter. This method will determine the blur function automatically. A nice example is demonstrated of a blurred image of an airline is demonstrated

Figure 4: Left: restored image with Canon-filter; right original image


The example above is the very ideal one. It contains no noise and a good blur-function. If we try to restore our own image with blur and noise, the restoration will not work properly. So this method depends on the kind of blur and the noise in the image. In Figure 5 an example is given of the restoration. The blur function that is determined from this image is not correct.

Figure 5 : left : blurred image, right restored image. This is not working properly since the blur function is determined incorrectly by the Cannon filter

6.3.3      Experiments with Maximum Entropy

With the different methods for restoration with Maximum Entropy experiments have been done on the images. The Bayesian method and the chi-square-method will not use information on the Point Spread Function. However in the tests the noise-level for the images is given. The first two methods appear to work well for the noisy images. Deblurring however is a difficult task without giving a blurring function. I think this is caused by the fact that those algorithms are suitable for astronomy. In those images the information of stars is easy to translate into the point spread function. However in our images, we have some different information. So the methods for chi-square and Bayesian maximum entropy appear not to work very well for forensic images.

Figure 6: Restored test-image with Bayesian method. Left side: original, Right: restored image

Figure 7 : Restored image with chi-square maximum Entropy-method. Left side : original image, right side : restored image

Further some tests were conducted with the premodel of Gull. They gave better results, which are mathematically better than the results with the Wiener-restoration. The instability, which is present in the Wiener-restoration, is not present in this restoration.

Figure 8 : Restoration with premodel, kernelsize 7x7

Figure 9 : Maximum entropy-restoration by premodel. Left side: original image, right side : restored image

Further experiments have been done with series of noise and blur-functions. It appears that the results are not very good. In a few cases the results of the restorations is better. However in most cases the results are worse than the original image. It depends however on the noise-level and the blur-level. For testing the different noise-levels, the commands in Khoros have been translated in a C-source-file. This file is included in Appendix 4. The results are given in Appendix 5. It appears that the filter will give instable results if there is not enough blur in the image. The results are best for a gaussian blur function with a variance of 4.

The computing time on a Sparc 10 is on average 4 minutes per image of 256 x 256.

It appears that the premodel-restoration will give some good results for the photographs and text images. The algorithm will present the image in a more realistic way than with Wiener-restoration. If we compare the text and real images with each other in Appendix 5, it appears that the visual improvement of the image of Lenna, is often not really there. This is caused by psychological factors and the method, which is used for printing this report. Of course our visual system, also will make some image processing, so this will give other results. Since the method of Maximum Entropy will cluster the information, some clusters of information will be created


6.3.4      Experiments with MTV

For the MTV-method a workspace is developed at the Netherlands Forensic Science Laboratory. Since this mathematics for this workspace is complex, the workspace might look complex, because it is programmed visually. For the MTV-method the same images have been tested.

Figure 10 : Result with restoration with MTV-method of Lenna, c=0.1 (Top : original, Bottom : restored image)

In this image the MTV-parameters were stable, and it appears that the restoration is visually not better than the original. The noise-suppression is however very well and this is the good part of the MTV-method : it suppresses the noise very well. Depending on the image better results might be obtained.

Figure 11: Restored text image by MTV-method with c=0.1(bottom) / original (top)


In the text image part, we can see that the MTV-method does not work very well. The larger black blocks and circles are restored better. The noise is suppressed, but visually this image is not very well suited for this method.

The MTV-method takes in average 30 minutes per calculation on a Sparc 10.

To investigate the results of the MTV-method further, we restored more images with different blur-functions and different amounts of noise and different values for the scale-factor c. Since with MTV the noise level and blur have to be large, the results are shown for the optimal working of the algorithm. In appendix 7 the results are given. It appears that for the photograph of Lenna the results are somewhat artificial, and for the text in most of the cases there were no better results. The large black areas operate better with this kind of restoration.

7         Conclusions

The Maximum Entropy method with the premodel of Gulls appears to work fine with blurred images and can restore some of the noise in the image. It will become instable if the noise level is too low.

The MTV-method works fine for images with large black surfaces. It is necessary to improve the algorithm and make experiments with other images (video-images in night-time), before the optimal results are acquired

The higher order algorithms require much computer time. Mathematical solutions or faster hardware (parallel computers) are necessary to improve performance. In practice interactive image processing will improve the final results

Further research is necessary to compare simple methods for restoration with the higher order methods. Sometimes simple methods might work better and faster for enhancement problems, which is often demonstrated in patent literature on image processing.


A problem with real world image processing is that the human system is also capable in good image processing, see for example Brent Baxter. This will often lead to poor results for the human visual system

Research to interaction between new image processing methods in video systems and the real image is necessary, since some of those methods might change the information in the images

It might be useful to develop methods for estimating the blur functions of real video systems in a more appropriate way

More practical experiments are necessary for evaluating the value of the new methods. Sometime simple local contrast enhancement methods might also give a better result

Research in the interaction between the new image processing algorithms and the actual image is necessary, since some of those methods might change the information in the video images