Fast Algorithms for Total Variation Image Restoration

Introduction

In electrical engineering and computer science, image processing refers to any form of signal processing in which the input is an image and the output can be either an image or a set of parameters related to the image. Generally, image processing includes image enhancement, restoration and reconstruction, edge and boundary detection, classification and segmentation, object recognition and identification, compression and communication, etc. Among them, image restoration is a classical problem and is generally a preprocessing stage of higher level processing. In many applications, the measured images are degraded by blurs; e.g. the optical system in a camera lens may be out of focus, so that the incoming light is smeared out, and in astronomical imaging the incoming light in the telescope has been slightly bent by turbulence in the atmosphere. In addition, images that occur in practical applications inevitably suffer from noise, which arise from numerous sources such as radiation scatter from the surface before the image is sensed, electrical noise in the sensor or camera, transmission errors, and bit errors as the image is digitized, etc. In such situations, the image formation process is usually modeled by the following equation

f(x)=(k*u)(x)+ω(x),x∈Ω,

(1)

where u(x) is an unknown clean image over a region Ω⊂R², “*" denotes the convolution operation, k(x),n(x) and f(x) are real-valued functions from R² to R representing, respectively, convolution kernel, additive noise, and the blurry and noisy observation. Usually, the convolution process neither absorbs nor generates optical energy, i.e., ∫_Ωk(x)normaldx=1, and the additive noise has zero mean.

Deblurring or decovolution aims to recover the unknown image u(x) from f(x) and k(x) based on (Equation 1). When k(x) is unknown or only an estimate of it is available, recoveringu(x) from f(x) is called blind deconvolution. Throughout this module, we assume that k(x) is known and ω(x) is either Gaussian or impulsive noise. When k(x) is equal to the Dirac delta, the recovery of u(x) becomes a pure denoising problem. In the rest of this section, we review the TV-based variational models for image restoration and introduce necessary notation for analysis.

Total Variation for Image Restoration

The TV regularization was first proposed by Rudin, Osher and Fatemi in [12] for image denoising, and then extended to image deblurring in [11]. The TV of u is defined as

TV(u)=∫_Ω∥∇u(x)∥normaldx.

(2)

When ∇u(x) does not exist, the TV is defined using a dual formulation [18], which is equivalent to (Equation 2) when u is differentiable. We point out that, in practical computation, discrete forms of regularization are always used where differential operators are replaced by ceratin finite difference operators. We refer TV regularization and its variants as TV-like regularization. In comparison to Tikhonov-like regularization, the homogeneous penalty on image smoothness in TV-like regularization can better preserve sharp edges and object boundaries that are usually the most important features to recover. Variational models with TV regularization and ℓ₂ fidelity has been widely studied in image restoration; see e.g. [4], [3]and references therein. For ℓ₁ fidelity with TV regularization, its geometric properties are analyzed in [2], [16], [17]. The superiority of TV over Tikhonov-like regularization was analyzed in [1], [5] for recovering images containing piecewise smooth objects.

Besides Tikhonov and TV-like regularization, there are other well studied regularizers in the literature, e.g. the Mumford-Shah regularization [9]. In this module, we concentrate on TV-like regularization. We derive fast algorithms, study their convergence, and examine their performance.

Discretization and Notation

As used before, we let ∥·∥ be the 2-norm. In practice, we always discretize an image defined on Ω, and vectorize the two-dimensional digitalized image into a long one-dimensional vector. We assume that Ω is a square region in R². Specifically, we first discretize u(x) into a digital image represented by a matrix U∈R^n×n. Then we vectorize U column by column into a vector u∈R^n², i.e.

u_i=U_pq,i=1,...,n²,

(3)

where u_i denotes the ith component of u, U_pq is the component of U at pth row and qth column, and p and q are determined by i=(q−1)n+p and 1≤q≤n. Other quantities such as the convolution kernel k(x), additive noise ω(x), and the observation f(x) are all discretized correspondingly. Now we present the discrete forms of the previously presented equations. The discrete form of (Equation 1) is

f=Ku+ω,

(4)

where in this case, u,ω,f∈R^n² are all vectors representing, respectively, the discrete forms of the original image, additive noise and the blurry and noisy observation, and K∈R^n²×n² is a convolution matrix representing the kernel k(x). The gradient ∇u(x) is replaced by certain first-order finite difference at pixel i. Let D_i∈R^2×n² be a first-order local finite difference matrix at pixel i in horizontal and vertical directions. E.g. when the forward finite difference is used, we have

D_iu=(

u_i+n−u_i

u_i+1−u_i

)∈R²,

(5)

for i=1,...,n² (with certain boundary conditions assumed for i>n²−n). Then the discrete form of TV defined in (Equation 2) is given by

TV(u)=

n²

∑

i=1

∥D_iu∥.

(6)

We will refer to

minTV(u)+

∥Ku−f∥²

(7)

with discretized TV regularization (Equation 6) as TV/L². For impulsive noise, we replace the ℓ₂ fidelity by ℓ₁ fidelity and refer to the resulted problem as TV/L¹.

Now we introduce several more notation. For simplicity, we let ∑_i be the summation taken over all pixels. The two first-order global finite difference operators in horizontal and vertical directions are, respectively, denoted by D⁽¹⁾ and D⁽²⁾ which are n²-by-n² matrices (boundary conditions are the same as those assumed on D_i). As such, it is worth noting that the two-row matrix D_i is formed by stacking the ith row of D⁽¹⁾ on that of D⁽²⁾. For vectors v₁ and v₂, we let v=(v₁;v₂)≜(v₁^⊤,v₂^⊤)^⊤, i.e. v is the vector formed by stacking v₁ on the top of v₂. Similarly, we let D=(D⁽¹⁾;D⁽²⁾)=((D⁽¹⁾)^⊤,,,(D⁽²⁾)^⊤)^⊤. Given a matrix T, we let diag(T) be the vector containing the elements on the diagonal of T, and F(T)=FTF⁻¹, where F∈n²×n² is the 2D discrete Fourier transform matrix.

Existing Methods

Since TV is nonsmooth, quite a few algorithms are based on smoothing the TV term and solving an approximation problem. The TV of u is usually replaced by

TV_ϵ(u)=∑\∥D_iu∥²+ϵ,

(8)

where ϵ>0 is a small constant. Then the resulted approximate TV/L² problem is smooth and many optimization methods are available. Among others, the simplest method is the gradient descent method as was used in [12]. However, this method suffers slow convergence especially when the iterate point is close to the solution. Another important method is the linearized gradient method proposed in [14] for denoising and in [15] for deblurring. Both the gradient descent and the linearized gradient methods are globally and at best linearly convergent. To obtain super linear convergence, a primal-dual based Newton method was proposed in [13]. Both the linearized gradient method and this primal-dual method need to solve a large system of linear equations at each iteration. When ϵ is small and/or K becomes more ill-conditioned, the linear system becomes more and more difficult to solve. Another class of well-known methods for TV/L² are the iterative shrinkage/thresholding (IST) based methods [8]. For IST-based methods, a TV denoising problem needs to be solved at each iteration. Also, in [7] the authors transformed the TV/L² problem into a second order cone program and solved it by interior point method.

A New Alternating Minimization Algorithm

In this section, we derive a new algorithm for the TV/L² problem

min∑∥D_iu∥+

∥Ku−f∥².

(9)

In (Equation 9), the fidelity term is quadratic with respect to u. Moreover, K is a convolution matrix and thus can be easily diagonalized by fast transforms (with proper boundary conditions assumed on u). Therefore, the main difficulty in solving (Equation 9) is caused by the nondifferentiability and the universal coupling of variables of the TV term. Our algorithm is derived from the well-known variable-splitting and penalty techniques in optimization. First, we introduce an auxiliary variable w_i∈R² at pixel i to transfer D_iu out of the nondifferentiable term ∥·∥. Then we penalize the difference between w_i and D_iu quadratically. As such, the auxiliary variables w_i's are separable with respect to one another. For convenience, in the following we letw≜[w₁,...,w_n²]. The approximation model to (Equation 9) is given by

min∑∥w_i∥+

∑∥w_i−D_iu∥²+

∥Ku−f∥²,

(10)

where β≫0 is a penalty parameter. It is well known that the solution of (Equation 10) converges to that of (Equation 9) as β→∞. In the following, we concentrate on problem (Equation 10).

Basic Algorithm

The benefit of (Equation 10) is that while either one of the two variables u and w is fixed, minimizing the objective function with respect to the other has a closed-form formula that we will specify below. First, for a fixed u, the first two terms in (Equation 10) are separable with respect to w_i, and thus the minimization for w is equivalent to solving

min∥w_i∥+

∥w_i−D_iu∥²,i=1,2,...,n².

(11)

It is easy to verify that the unique solutions of (Equation 11) are

w_i=max{∥,D_i,u∥−,

,,,0}

D_iu

∥D_iu∥

,i=1,...,n²,

(12)

where the convention 0·(0/0)=0 is followed. On the other hand, for a fixed w, (Equation 10) is quadratic in u and the minimizer u is given by the normal equations

(∑,D_i^⊤,D_i,+,

,K^⊤,K)u=∑D_i^⊤w_i+

K^⊤f.

(13)

By noting the relation between D and D_i and a reordering of variables, (Equation 13) can be rewritten as

(D^⊤,D,+,

,K^⊤,K)u=D^⊤w+

K^⊤f,

(14)

where

w≜(

w₁

w₂

)∈R^2n²andw_j≜(

(w₁)_j

⋮

(w_n²)_j

),j=1,2.

(15)

The normal equation (Equation 14) can also be solved easily provided that proper boundary conditions are assumed on u. Since both the finite difference operations and the convolution are not well-defined on the boundary of u, certain boundary assumptions are needed when solving (Equation 14). Under the periodic boundary conditions for u, i.e. the 2D image u is treated as a periodic function in both horizontal and vertical directions, D⁽¹⁾, D⁽²⁾ and K are all block circulant matrices with circulant blocks; see e.g. [10], [6]. Therefore, the Hessian matrix on the left-hand side of (Equation 14) has a block circulant structure and thus can be diagonalized by the 2D discrete Fourier transform F, see e.g. [6]. Using the convolution theorem of Fourier transforms, the solution of (Equation 14) is given by

u=F⁻¹(

F(D^⊤,w,+,(μ/β),K^⊤,f)

diag(F,(,D^⊤,D,+,(μ/β),K^⊤,K,))

(16)

where the division is implemented by componentwise. Since all quantities but w are constant for given β, computing u from (Equation 16) involves merely the finite difference operation on w and two FFTs (including one inverse FFT), once the constant quantities are computed.

Since minimizing the objective function in (Equation 10) with respect to either w or u is computationally inexpensive, we solve (Equation 10) for a fixed β by an alternating minimization scheme given below.

Algorithm :

Input f, K and μ>0. Given β>0 and initialize u=f.
While “not converged”, Do
1. Compute w according to (Equation 12) for fixed u.
2. Compute u according to (Equation 16) for fixed w (or equivalently w).
End Do

Optimality Conditions and Convergence Results

To present the convergence results of Algorithm "Basic Algorithm" for a fixed β, we make the following weak assumption.

Assumption 1 N(K)∩N(D)={0{, where N(·) represents the null space of a matrix.

Define

M=D^⊤D+

K^⊤KandT=DM⁻¹D^⊤.

(17)

Furthermore, we will make use of the following two index sets:

L={i:∥,D_i,u^*,∥<,

}andE={1,...,n²{∖L.

(18)

Under Assumption 1, the proposed algorithm has the following convergence properties.

Theorem 1 For any fixed β>0, the sequence {(w^k,u^k){ generated by Algorithm "Basic Algorithm" from any starting point (w⁰,u⁰) converges to a solution (w^*,u^*) of (Equation 10). Furthermore, the sequence satisfies

∥w_E^k+1−w_E^*∥≤\ρ((T²)_EE)∥w_E^k−w_E^*∥;
(19)
∥u^k+1−u^*∥_M≤\ρ(T_EE)∥u^k−u^*∥_M;
(20)

for all k sufficiently large, where T_EE=[T_i,j]_{i,j∈E∪(n²+E)} is a minor of T, ∥v∥_M²=v^⊤Mv and ρ(·) is the spectral radius of its argument.

Extensions to Multichannel Images and TV/L

The alternating minimization algorithm given in "A New Alternating Minimization Algorithm" can be extended to solve multichannel extension of (Equation 9) when the underlying image has more than one channels and TV/L¹ when the additive noise is impulsive.

Multichannel image deconvolution

Let u=[u⁽¹⁾;...;u^(m)]∈R^mn² be an m-channel image, where, for each j, u^(j)∈R^n² represents the jth channel. An observation of u is modeled by (Equation 4), in which case f=[f⁽¹⁾;...;f^(m)] and ω=[ω⁽¹⁾;...;ω^(m)] have the same size and the number of channels as u, and K is a multichannel blurring operator of the form

K=[

K₁₁	K₁₂	⋯	K_1m
K₂₁	K₂₂	⋯	K_2m
⋮	⋮	⋱	⋮
K_m1	K_m2	⋯	K_mm

]∈R^mn²×mn²,

(21)

where K_ij∈R^n²×n², each diagonal submatrix K_ii defines the blurring operator within the ith channel, and each off-diagonal matrix K_ij, i≠j, defines how the jth channel affects the ith channel.

The multichannel extension of (Equation 9) is

min∑∥(I_m⊗D_i)u∥+

∥Ku−f∥²,

(22)

where I_m is the identity matrix of order m, and “⊗" is the Kronecker product. By introducing auxiliary variables w_i∈R^2m, i=1,...,n², (Equation 22) is approximated by

min∑∥w_i∥+

∑∥w_i−(I_m⊗D_i)u∥²+

∥Ku−f∥².

(23)

For fixed u, the minimizer function for w is given by (Equation 12) in which D_iu should be replaced by (I_m⊗D_i)u. On the other hand, for fixed w, the minimization for u is a least squares problem which is equivalent to the normal equations

(I₃,⊗,(D^⊤D),+,

,K^⊤,K)u=(I₃⊗D)^⊤w+

K^⊤f,

(24)

where w is a reordering of variables in a similar way as given in (Equation 15). Under the periodic boundary condition, (Equation 24) can be block diagonalized by FFTs and then solved by a low complexity Gaussian elimination method.

Deconvolution with Impulsive Noise

When the blurred image is corrupted by impulsive noise rather than Gaussian, we recover u as the minimizer of a TV/L¹ problem. For simplicity, we again assume u∈R^n² is a single channel image and the extension to multichannel case can be similarly done as in "Multichannel image deconvolution". The TV/L¹ problem is

min∑∥D_iu∥+μ∥Ku−f∥₁.

(25)

Since the data-fidelity term is also not differentiable, in addition to w, we introduce z∈R^n² and add a quadratic penalty term. The approximation problem to (Equation 25) is

min∑(∥,w_i,∥+,

,∥w_i−D_iu∥²)+μ(∥z∥₁,+,

,∥z−(Ku−f)∥²),

(26)

where β,γ≫0 are penalty parameters. For fixed u, the minimization for w is the same as before, while the minimizer function for z is given by the famous one-dimensional shrinkage:

z=max{|Ku−f|−,

,,,0}·sgn(Ku−f).

(27)

On the other hand, for fixed w and z, the minimization for u is a least squares problem which is equivalent to the normal equations

(D^⊤,D,+,

μγ

,K^⊤,K)u=D^⊤w+

μγ

K^⊤(f+z).

(28)

Similar to previous arguments, (Equation 28) can be easily solved by FFTs.

Experiments

In this section, we present the practical implementation and numerical results of the proposed algorithms. We used two images, Man (grayscale) and Lena (color) in our experiments, seeFigure 1. The two images are widely used in the field of image processing because they contain nice mixture of detail, flat regions, shading area and texture.

**Figure 1:** Test images: Man (left, 1024×1024) and Lena (right, 512×512).

We tested several kinds of blurring kernels including Gaussian, average and motion. The additive noise is Gaussian for TV/L² problems and impulsive for TV/L¹ problem. The quality of image is measured by the signal-to-noise ratio (SNR) defined by

SNR≜10*log₁₀

∥u−E(u)∥²

∥u−u∥²

(29)

where u is the original image and E(u) is the mean intensity value of u. All blurring effects were generated using the MATLAB function “imfilter " with periodic boundary conditions, and noise was added using “imnoise ". All the experiments were finished under Windows Vista Premium and MATLAB v7.6 (R2008a) running on a Lenovo laptop with an Intel Core 2 Duo CPU at 2 GHz and 2 GB of memory.

Practical Implementation

Generally, the quality of the restored image is expected to increase as β increases because the approximation problems become closer to the original ones. However, the alternating algorithms converge slowly when β is large, which is well-known for the class of penalty methods. An effective remedy is to gradually increase β from a small value to a pre-specified one.Figure 2 compares the different convergence behaviors of the proposed algorithm when with and without continuation, where we used Gaussian blur of size 11 and standard deviation 5 and added white Gaussian noise with mean zero and standard deviation 10⁻³.

**Figure 2:** Continuation vs. no continuation: u^* is an “exact” solution corresponding to β=2¹⁴. The horizontal axis represents the number of iterations, and the vertical axis is the relative errore_k=∥u^k−u^*∥/∥u^*∥.

In this continuation framework, we compute a solution of an approximation problem which used a smaller beta, and use the solution to warm-start the next approximation problem corresponding to a bigger β. As can be seen from Figure 2, with continuation on β the convergence is greatly sped up. In our experiments, we implemented the alternating minimization algorithms with continuation on β, which we call the resulting algorithm “Fast Total Variation de-convolution” or FTVd, which, for TV/L², the framework is given below.

[FTVd]:

Input f, K and μ>0. Given β_max>β₀>0.
Initialize u=f, u_p=0, β=β₀ and ϵ>0.
While β≤β_max, Do
1. Run Algorithm "Basic Algorithm" until an optimality condition is met.
2. β←2*β.
End Do

**Figure 3:** SNRs of images recovered from () for different β.

Figure 4: Results recovered from TV/L². Image Man is blurred by a Gaussian kernel, while image Lena is blurred by a cross-channel kernel. Gaussian noise with zero mean and standard deviation 10⁻³is added to both blurred images. The left images are the blurry and noisy observations, and the right ones are recovered by FTVd.

(a)

(b)

Generally, it is difficult to determine how large β is sufficient to generate a solution that is close to be a solution of the original problems. In practice, we observed that the SNR values of recovered images from the approximation problems are stabilized once β reached a reasonably large value. To see this, we plot the SNR values of restored images corresponding toβ=2⁰,2¹,⋯,2¹⁸ in Figure 3. In this experiment, we used the same blur and noise as we used in the testing of continuation. As can be seen from Figure 3, the SNR values on both images essentially remain constant for β≥2⁷. This suggests that β need not to be excessively large from a practical point of view. In our experiments, we set β₀=1 and β_max=2⁷ in Algorithm "Practical Implementation". For each β, the inner iteration was stopped once an optimality condition is satisfied. For TV/L¹ problems, we also implement continuation on γ, and used similar settings as used in TV/L².

Recovered Results

In this subsection, we present results recovered from TV/L² and TV/L¹ problems including (Equation 9), (Equation 25) and their multichannel extensions. We tested various of blurs with different levels of Gaussian noise and impulsive noise. Here we merely present serval test results. Figure 4 gives two examples of blurry and noisy images and the recovered ones, where the blurred images are corrupted by Gaussian noise, while Figure 5 gives the recovered results where the blurred images are corrupted by random-valued noise. For TV/L¹problems, we set γ=2¹⁵ and β=2¹⁰ in the approximation model and implemented continuation on both β and γ.

Figure 5: Results recovered from TV/L¹. Image Lena is blurred by a cross-channel kernel and corrupted by 40% (left) and 50% (right) random-valued noise. The top row contains the blurry and noisy observations and the bottom row shows the results recovered by FTVd.

(a)

(b)

Concluding Remarks

We proposed, analyzed and tested an alternating algorithm FTVd which for solving the TV/L² problem. This algorithm was extended to solve the TV/L¹ model and their multichannel extensions by incorporating an extension of TV. Cross-channel blurs are permitted when the underlying image has more than one channels. We established strong convergence results for the algorithms and validated a continuation scheme. Numerical results are given to demonstrate the feasibility and efficiency of the proposed algorithms.

REFERENCES

Acar, R. and Vogel, C. R. (1994). Analysis of total variation penalty methods. Inv. Probl., 10, 1217-1229.
Chan, T. F. and Esedoglu, S. (2005). Aspects of total variation regularized function approximation. SIAM Journal on Applied Mathematics, 65(5), 1817–1837.
Chan, T. F. and Esedoglu, S. and Park, F. and Yip, A. (2004). Recent Developments in Total Variation Image Restoration. (05-01). CAM Report. Department of Mathematics, UCLA.
Chambolle, A. and Lions, P. L. (1997). Image Recovery via Total Variation Minimization and Related Problems. Numer. Math., 76(2), 167-188.
Dobson, D. C. and Santosa, F. (1996). Recovery of blocky images from noisy and blurred data. SIAM J. Appl. Math., 56, 1181–1198.
Gonzalez, R. and Woods, R. (1992). Digital Image Processing. Addison-Wesley.
Goldforb, D. and Yin, W. (2005). Second-Order Cone Programming Methods for Total Variation-based Image Restoration. SIAM J. Sci. Comput., 27(2), 622-645.
I. Daubechies, M. Defriese and Mol, C. De. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math., LVII, 1413-1457.
Mumford, D. and Shah, J. (1989). Optimal approximations by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math., 42, 577-685.
NG, Michael K. and Chan, Raymond H. and Tang, Wuncheung. (1999). A fast algorithm for deblurring models with neumann boundary conditions. SIAM J. Sci. Comput., 21(3), 851–866.
Rudin, L. and Osher, S. (1994). Total Variation Based Image Restoration with Free Local Constraints. Proc. 1st IEEE ICIP, 1, 31-35.
Rudin, L. and Osher, S. and Fatemi, E. (1992). Nonlinear Total Variation Based Noise Removal Algorithms. Phys. D, 60, 259-268.
T. F. Chan, G. H. Golub and Mulet, P. (1999). A nonlinear primal dual method for total variation based image restoration. SIAM J. Sci. Comput., 20, 1964-1977.
Vogel, C. R. and Oman, M. E. (1996). Iterative Methods for Total Variation Denoising. SIAM J. Sci. Comput., 17(1), 227-238.
Vogel, C. and Oman, M. (1998). Fast, robust total variation-based reconstruction of noisy, blurred images. IEEE Trans. Image processing, 7(6), 813–824.
Yin, W. and Goldfarb, D. and Osher, S. (2005). Image cartoon-texture decomposition and feature selection using the total variation regularized functional. In Leture Notes in Computer Science: Vol. 3752. Variational, Geometric, and Level Set Methods in Computer Vision. (pp. 73-84). Springer.
Yin, W. and Goldfarb, D. and Osher, S. (2006). The total variation regularized model for multiscale decomposition. SIAM Journal on Multiscale Modeling and Simulation, 6(1), 190–211.
Ziemer, W. P. (1989). Graduate Texts in Mathematics: Weakly Differentiable Functions: Sobolev Spaces and Functions of Bounded Variation. Springer