Ill-posedness of inverse problems


Sergey A. Rukolaine


Major peculiarities, inherent in inverse problems, can be illustrated by the following synthetic example. Consider radiative heat transfer in nonabsorbing, nonemitting, and nonscattering gray plane-parallel “medium” of thickness h (see Fig. 1). The bounding surfaces, y = 0 and y = h, are assumed to be black. The temperature of the upper surface is T(x) (i.e., it depends on x). The temperature of the bottom surface is equal to zero. In view of the symmetry, radiative heat transfer (Özişik, 1973; Siegel and Howell, 2002; Modest, 2003) is described by the radiative transfer equation

with boundary conditions

where I ≡ I(x,y,ϕ) is the radiation intensity, qhout is the radiosity on the upper surface, σ is the Stefan–Boltzmann constant, T(x) is temperature at the point x of the surface y = h, the refraction index is equal to unity. Clearly, I(x,y,ϕ) = 0 for 0 < y < h and 0 < ϕ < π.

Plane-parallel “medium” and bounding surfaces.

Figure 1. Plane-parallel “medium” and bounding surfaces.

Radiative heat flux, incident onto the bottom surface,

is “measured”. An inverse problem is posed as follows: given the incident radiative heat flux q0inc(x), measured on the bottom surface, restore temperature T(x), or, equivalently, the radiosity qhout(x) (which is also, in this case, the blackbody flux) on the upper surface. This problem admits analytic solution, and the incident flux q0inc can be expressed explicitly in terms of the radiosity qhout. Therefore, the inverse problem can be written in the form of the Fredholm integral equation of the first kind


with the kernel


Because the integral equation (1) is of convolution type, it can be solved with the help of the Fourier transform (Tikhonov and Arsenin, 1977; Tikhonov et al., 1995; Bertero and Boccacci, 1998)

The Fourier transform of the integral equation (1) gives

This equation gives the Fourier transform hout of the radiosity, and therefore, the inverse Fourier transform gives the sought-for radiosity


However, in reality, the measured incident flux q0inc always contains an error, i.e., q0inc(x) = q0inc(x)+e(x), where q0inc is an exact flux and e is the error. Therefore, the obtained radiosity (3) differs from the exact one

and the difference between the obtained and exact radiosities is


The problem is that (ξ) tends to zero very rapidly as ξ → ∞. Indeed,


where K1 is the modified Bessel function of the order 1 (Abramowitz and Stegun, 1972), and

(designation of the kernel by K and of the modified Bessel function by K1 is just a coincidence, both designations are conventional). The plot of , for h = 1, is shown in Fig. 2.

Fourier transform (5) of the kernel K (2) for h = 1.

Figure 2. Fourier transform Fourier transform (5) of the kernel K (2) for h = 1. (5) of the kernel K(2) for h = 1.

Therefore, if the error e contains high-frequency Fourier harmonics, such that ê(ξ) is not small, then ê(ξ)/(ξ) takes very large values for these frequencies ξ, and the difference (4) between the obtained and exact radiosities takes very large values. For example, if e(x) = εcos(ωx), ω > 0, then ê(ξ) = επ[δ(ξ+ω)+δ(ξ-ω)], where δ is the Dirac δ-function, and the difference (4) is qhout(x)-qhout(x) = εcos(ωx)/hωK1(hω) ≡ e(x)/hωK1(hω).

Figure 3 shows the errors e(x) = 0.01cos(4πx) and e(x) = 0.01cos(6πx) and corresponding differences qhout(x)-qhout(x) for h = 1. In Fig. 3, the differences between the obtained and exact radiosities are 6.27×104 and 2.77×107 times larger than the errors, respectively. All the quantities are dimensionless.

Errors e(<i>x</i>) = 0.01cos(4πx) (top) and e(<i>x</i>) = 0.01cos(6πx) (bottom) and corresponding differences between the obtained and exact radiosities qhout(<i>x</i>)-qhout(<i>x</i>) for h = 1. The differences q1out(<i>x</i>)-q1out(<i>x</i>) are 6.27×104 and 2.77×107 times larger than the errors, respectively. All quantities are dimensionless.

Figure 3. Errors e(x) = 0.01cos(4πx) (top) and e(x) = 0.01cos(6πx) (bottom) and corresponding differences between the obtained and exact radiosities qhout(x)-qhout(x) for h = 1. The differences q1out(x)-q1out(x) are 6.27×104 and 2.77×107 times larger than the errors, respectively. All quantities are dimensionless.


The example, given above, is considered as an inverse problem, of interpretation of experimental data. In this inverse problem one needs to restore the temperature T(x) on the upper surface from the radiative heat flux q0inc(x), incident onto the bottom surface. The flux q0inc is obtained experimentally, and therefore contains an error, i.e., q0inc(x) = q0inc(x)+e(x), where q0inc is an exact flux, which is unknown.

The same example can be considered as an optimal design problem. In this case, the exact flux q0inc, incident onto the bottom surface, is prescribed. The optimal design problem consists of finding the temperature T(x) or, equivalently, the radiosity qhout, on the upper surface, which provides the flux q0inc. The example is synthetic and admits analytic solution. Real optimal design problems are usually solved numerically. In this case, an exact solution cannot generally be obtained and a designer obtains an approximate solution qhout, providing the flux q0inc. Just as in the given example, even if the difference between the flux q0inc and the exact one q0inc is very small, the difference between the obtained approximate solution qhout and the exact one qhout may be very large. Therefore, optimal design and control problems should be considered as inverse problems.


The integral equation (1) is a particular case of the linear Fredholm integral equation of the first kind (Wing, 1991)


where K is a given kernel of the integral equation (the kernel can be considered as the apparatus function of a measuring device), f is a given function (input data), u is a sought-for solution, in the general case -∞≤ a < b ≤∞ and -∞≤ c < d ≤∞.

A more general form of the linear Fredholm integral equation of the first kind, encountered in practice, is


where D1 and D2 are domains in n.

The integral equations (6) and (7) are conveniently expressed in the form of a linear operator equation of the first kind


where the operator (mapping) A is linear [recall that the operator A is linear if A(u+v) = Au+Av].

If the dependence of f on u is not linear, then the operator equation of the first kind is conveniently expressed by


[though, sometimes, the form (8) is also used, if A is not linear]. Equation (9) can be considered as the most general form of inverse problems.

In Eqs. (6)–(9), u is an unknown “cause”, f is an observed (or desired) “effect” or “result” (at the same time, f presents input data of an inverse problem), and A expresses the dependence of the effect on the cause. “Inverse problems are associated with the reversal of the cause-effect sequence, and consist of finding the unknown causes of known consequences” (Turchin et al., 1971). A more general definition, including optimal design and control problems, is “Inverse problems are concerned with determining causes for a desired or an observed effect” (Engl et al., 1996). In this sentence, the word “desired” is related to optimal design or control problems, while the word “observed” is related to problems of interpretation of experimental data, such that restoration (recovery) and parameter identification problems.

The cause of the difficulties in solving Eqs. (8) and (9) is the compactness of the operator A (Engl et al., 1996; Kress, 1999), which is usually the case in practice [the notion of compactness can be easily understood with the help of the singular value expansion (SVE); see Hansen (1998); Engl et al. (1996); Bertero and Boccacci (1998); Kress (1999), which is a generalization of the singular value decomposition (SVD), see details below]. The compactness of the operator A implies the ill-posedness of the inverse problems (8) and (9), particularly, the instability of the solutions u with respect to small variations of f (Engl et al., 1996; Kress, 1999).


The concept of well- and ill-posedness goes back to Hadamard (1923). However, the definition was given by Courant in Courant and Hilbert (1937). According to the definition, a mathematical problem is called well posed (in the sense of Hadamard) or properly posed or correctly posed if:

  1. a solution of the problem exists in a given set of “admissible” solutions (existence condition),

  2. the solution is unique (uniqueness condition),

  3. the solution depends continuously on input data (stability condition).

If at least one of the three conditions fails, then the problem is called ill-posed or improperly posed or incorrectly posed.

It is necessary to emphasize here that Hadamard was never engaged in the study of inverse problems. On the contrary, he had come to this concept when he was studying direct problems (Hadamard, 1923). Moreover, the reasoning behind the concept was mainly mathematical.


Equations (6)–(9) are continuous mathematical models of inverse problems. To solve an inverse problem numerically, one has to discretize the equation in one way or another (Wing, 1991; Hansen, 1998). Note that, in inverse problems of interpretation of experimental data, the input data f are always given in a discrete form because they are obtained experimentally. Major peculiarities of discretized ill-posed inverse problems can be illustrated by linear problems.

Discretization of the integral equations (6) and (7) [or the operator equation (8)] results in a linear algebraic system


where A m×n [i.e., A is a real (m×n)-matrix, or a matrix with m rows and n columns], f ≡ (f1,...,fm) m is a given vector, u ≡ (u1,...,un) n is a sought-for solution.

The matrix A inherits the compactness of the integral operator A if the discretization is fine enough, then the matrix A is ill-conditioned. The notion of matrix condition is conveniently explained with the help of the SVD.

Suppose for simplicity that m ≥ n. The SVD is that the matrix A m×n [real (m×n)-matrix] can be represented in the form (Forsythe and Moler, 1967; Golub and Van Loan, 1996; Björck, 1996; Bertero and Boccacci, 1998; Hansen, 1998)


where V = [v1...vm] m×m and W = [w1...wn] n×n (i.e., V and W are real square matrices of the order m and n, having columns v1,...,vm m and w1,...,wn n, respectively), the superscript T indicates the transposition, the matrices V and W are orthogonal, i.e.,

(or, equivalently, V T = V -1 and WT = W-1), where Im m×m and In n×n are the identity matrices; in other words, the columns v1, ..., vm of the matrix V and the columns w1, ..., wn of the matrix W are orthonormal, i.e.,


where δij is the Kronecker delta.

Σ is a rectangular “diagonal” matrix,

where σ1 ≥ σ2 ≥ ... ≥ σn ≥ 0. From here on, it is supposed for simplicity that σn > 0.

The numbers σ1, ..., σn are called the singular values of A, while the vectors v1, ..., vm and w1, ..., wn are called the left and right singular vectors of A, respectively.

The condition number of the matrix A (with respect to the Euclidean vector norm ) is defined by

If cond(A) is large, then the matrix A is said to be an ill-conditioned or poorly conditioned matrix. The condition number measures the sensitivity of the solutions (in a general sense) of the system (10) to perturbations of A and f.

The sensitivity analysis of the system (10) is conveniently explained with the help of the SVD (11). For more general sensitivity analysis, see (Golub and Van Loan, 1996) and (Björck, 1996).

If m = n, then the matrices V, Σ, and W are square and the (unique) solution of the system (10) is

If m > n [i.e., the number of equations in the system (10) is more than the number of unknowns (the system is overdetermined)], then the solution of the system (10) does not generally exist (the inverse of the matrix Σ is not defined). In this case, a more general definition of the solution of the system (10) is introduced: instead of the system (10), the least-squares problem


where is the Euclidean norm, is considered. Clearly, one can assume that in (12), m ≥ n, because in the case m = n, the least-squares solution is exactly the solution of the system (10). The least-squares problem (12) has a unique solution (since σn > 0), which is a (unique) solution of the system of normal equations (Forsythe and Moler, 1967; Golub and Van Loan, 1996; Björck, 1996)


The solution of the system (13) (the least-squares solution) is


which is considered as a generalized solution of the system (10) in the case m ≥ n.

In reality, f contains an error (i.e., f = f + e, where f is an exact “value” and e is the error). The matrix A may also be known with an error because it depends on radiative properties of a medium, whose numerical values are obtained from experimental data. But here the matrix A is supposed, for simplicity, to be exact. In this case, the solution (14) differs from the exact solution

and the difference between the obtained and exact solutions is

As is written above, the compactness of the integral operator A leads to the poor condition of the matrix A. Moreover, the better the system (10) approximates the integral equation [(6) or (7)], the larger the condition number of the matrix A (Bertero and Boccacci, 1998). The behavior of the singular values σi, as a “function” of the index i, is similar to the behavior of the Fourier transform (ξ) in Fig. 2 as a function of the frequency ξ: the singular values σi tend to zero, more or less gradually, as i increases [examples are given in Hansen (1998)].

Because the matrix A is ill-conditioned, the singular values σi with large indices i are very small. Therefore, it may be that, for some errors e, the corresponding dot products vi · e are not small, and hence, the coefficients (vi · e)/σi are large. In this case, the difference u - u is large. And, the larger the condition number of the matrix A is the larger the difference u - u may be.


Abramowitz, M. and Stegun, I. A. (eds.), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, National Bureau of Standards, Washington, 1972.

Bertero, M. and Boccacci, P., Introduction to Inverse Problems in Imaging, Institute of Physics Publishing, Bristol and Philadelphia, 1998.

Björck, Å., Numerical Methods for Least Squares Problems, SIAM, Philadelphia, 1996.

Courant, R. and Hilbert, D., Methoden der mathematischen Physik, Bd. II, Julius Springer, Berlin, 1937; revised English ed., Methods of Mathematical Physics, Vol. II, Partial Differential Equations, Wiley, Hoboken, NJ, 1962.

Engl, H. W., Hanke, M., and Neubauer, A., Regularization of Inverse Problems, Kluwer, Dordrecht, 1996.

Forsythe, G. E. and Moler, C. B., Computer Solution of Linear Algebraic Systems, Prentice Hall, Englewood Cliffs, NJ, 1967.

Golub, G. H. and Van Loan, C. F., Matrix Computations, John Hopkins University Press, Baltimore, 1996.

Hadamard, J., Lectures on Cauchy’s Problem in Linear Partial Differential Equations, Yale University Press, New Haven, 1923 (reprinted by Dover Publ., New York, 1952); revised French ed., Le problème de Cauchy et les équations aux dérivées partielles linéaires hyperboliques, Hermann, Paris, 1932.

Hansen, P. C., Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, SIAM, Philadelphia, 1998.

Kress, R., Linear Integral Equations, Springer–Verlag, New York, 1999.

Modest, M. F., Radiative Heat Transfer, Academic Press, New York, 2003.

Özişik, M. N., Radiative Transfer and Interactions with Conduction and Convection, Wiley, Hoboken, NJ, 1973.

Siegel, R. and Howell, J. R., Thermal Radiation Heat Transfer, Taylor & Francis, New York, 2002.

Tikhonov, A. N. and Arsenin, V. Y., Solution of Ill-Posed Problems, Wiley, Hoboken, NJ, 1977.

Tikhonov, A. N., Goncharsky, A. V., Stepanov, V. V., and Yagola, A. G., Numerical Methods for Solving Ill-Posed Problems, Kluwer, Dordrecht, 1995.

Turchin, V. F., Kozlov, V. P., and Malkevich, M. S., The Use of Mathematical-Statistics Methods in the Solution of Incorrectly Posed Problems, Sov. Phys. Usp., vol. 13, pp. 681–703, 1971.

Wing, G. M., A Primer on Integral Equations of the First Kind: The Problem of Deconvolution and Unfolding, SIAM, Philadelphia, 1991.

Number of views: 8982 Article added: 7 September 2010 Article last modified: 24 February 2011 © Copyright 2010-2017 Back to top