Theory

This page explains some of the theory of the implemented methods. Simple or very common operations (with plenty of explanations elsewhere) are not described (e.g. linear convolutional filters, polynomial/spline interpolations). References are also given. Also note that I switch between continuous and discrete notations whenever convenient.

Interpolations
Interpolations are the most common methods of image resizing, and bicubic interpolations are the most common of these. Essentially, the pixels are used to fit a bicubic function; then, the resized image queries the function to get the values of the pixels between the original pixels.

A less common image interpolation, but one better for edge preservation, is to use the lanczos kernel to fit instead, which is given by: $$ L_a(x,y) = L_a(x)L_a(y) $$ where $$ L_a(x)= \begin{cases} 1 & \text{if }x=0\\ \frac{a\sin(\pi x/a)\sin(\pi x)}{\pi^2x^2} & \text{if }0\lt|x|\lt a\\ 0 & \text{elsewhere} \end{cases} $$ for which a is usually 3.

Tristate Median
The tristate median filter is an extension of the standard non-linear median filter, given by: $$ p_{out}(x,y) = \begin{cases} p_{in}(x,y) & T \geq d_1\\ CW_{\Omega}(x,y) & d_2 \leq T \lt d_1\\ M_{\Omega}(x,y) & T \lt d_2 \end{cases} $$ where $p$ is the pixel value, $d_1 = ||p_{in}(x,y)-M_{\Omega}(x,y)||$, $d_2 = ||p_{in}(x,y) - CW_{\Omega}(x,y)||$, $\Omega$ is the local pixel neighbourhood, $M$ is the median, and $CW$ is a center-weighted median. It was introduced in Tristate median filter for median denoising , Chen, Ma, and Chen, 1999, IEEE Transactions on Image Processing.

Bilateral Filter
The bilateral neighbourhood filter is a denoising approach. It is basically a smart averaging method that searches the local neighbourhood and chooses a new pixel value based on the local pixels, weighted differently by both the locational and pixel value distances, as: $$ B_{h,\rho}[{u}(\vec{x})]_i = \frac{1}{C(\vec{x})} \iint\limits_{\Omega(\vec{x})} {u}_i(\vec{y}) \exp\left( -\frac{||\vec{x}-\vec{y}||}{\rho^2} \right) \exp\left( -\frac{|u_i(\vec{x})-u_i(\vec{y})|}{h^2} \right) d\vec{y} $$ where $$ C(\vec{x}) = \iint\limits_{\Omega(\vec{x})} \exp\left( -{||\vec{x}-\vec{y}||}/{\rho^2} \right) \exp\left( -{|u_i(\vec{x})-u_i(\vec{y})|}/{h^2} \right) d\vec{y} $$ is a normalizing factor, $\Omega(\vec{x})$ is the local pixel neighbourhood of $\vec{x}$, $\rho$ and $h$ obviously control the weighting strengths, $i\in\{r,g,b\}$, and $u$ is the input image. It was introduced in Bilateral Filtering for Gray and Color Images by Tomasi and Manduchi, 1998. I find this is probably the best denoising method, and it is not even that slow.

Non-Local Means
The NLM method is a sort of generalization of the bilateral filter. Instead of looking at neighbourhood pixels, it looks at patches across the whole image (actually just in a "research radius" larger than the patch radius, since using the whole image is soul-crushingly slow), via: $$ \mathcal{N}_v(i) = \sum\limits_{p \in I} v(p) \frac{ \exp\left( -{G_\alpha(||v(\mathcal{N}_i)-v(\mathcal{N}_p)||_2)}/{h^2} \right) }{ \sum\limits_{j\in I} \exp\left( -{G_\alpha(||v(\mathcal{N}_i)-v(\mathcal{N}_j)||_2)}/{h^2} \right) } $$ where $I$ is the research radius patch, $v$ is the input image, $\mathcal{N}_i$ is an image patch centered at $i$, and $G_\alpha(||\ldots||)$ denotes taking the vector norm on a Gaussian filtered image with $\sigma=\alpha$. It was introduced in A non-local algorithm for image denoising by Buades, 2005, Computer Vision and Pattern Recognition.

Whole-Image Singular Value Removal
This essentially removes the "least important" information from the image in a very simple fashion via singular value decomposition (SVD). Basically, let the image $M$ be considered an $m\times n$ real-valued matrix, with each colour channel considered separately (i.e. in $\mathbb{R}^{m\times n \times 3}$). Then it has SVD factorization: $$ M = U\Sigma V^T $$ with (rooted) eigenvalues $\mu_i$ on the diagonal of $\Sigma$ called singular values. Then, given some percent $p\in [0,1]$ of the information content to remove, we construct $\Sigma_p$ by setting some number of singular values $k$ to 0, where $k$ is chosen to remove as close to $p$ percent of the (linearly matrix encoded) "information" as possible; that is: $$ \sum\limits_{i=1}^k \mu_i \approx p\sum\limits_{\mu_i\in\text{diag}(\Sigma)}\mu_i $$ Then: $M_{out} = U\Sigma_p V^T$. There are much, much more advanced denoising methods based on SVD, but this crude method can sometimes produce interesting results. And it was easy to write.

Shannon Entropy
The Shannon entropy (classical information content) of an image is computed simplistically here: $$ \mathbb{H}(I) = \sum\limits_{i\in\{r,g,b\}} \sum\limits_{x\in v} p(x)\ln p(x) $$ where $v$ is the domain of pixel values for the probability distribution defined for the random variable $x$. Entropy can also obivously be computed using a trivariate probability distribution over the colours; this will be an option in future versions.

Gradient and curvature
Jsrtaa mainly computes gradients using the fourth order central finite difference equation (or first order central/one-sided finite differences on edges and/or corners). It can also use convolution with the Sobel gradient kernel to do it. Notationally, we write the gradient as: $$ \nabla I_i(x,y) = \left( \frac{\partial I_i}{\partial x}(x,y) , \frac{\partial I_i}{\partial y}(x,y) \right) $$ where $I$ is the image and $i$ is the colour channel. I use standard shortcut notation whenever possible.

As usual, we can then define the Hessian in an obvious way as follows: $$ H_i(x,y) = \begin{bmatrix} \frac{\partial^2}{\partial x \partial x}I_i & \frac{\partial^2}{\partial x \partial y}I_i\\ \frac{\partial^2}{\partial y \partial x}I_i & \frac{\partial^2}{\partial y \partial y}I_i\\ \end{bmatrix} $$
We can also compute the differential geometric curvature quantities. For the sake of my fingers, we write the partial derivatives as $\mathcal{I}_q$ where $q$ is the variable with respect to which the differential is taken, with $\mathcal{I}$ as a vector of colour channels.

The mean curvature is given by: $$ \mathcal{H}(\mathcal{I}) = -\frac{\nabla\cdot\hat{n}}{2} = \frac{ (1 + \mathcal{I}_x^2)\mathcal{I}_{yy} - 2\mathcal{I}_x\mathcal{I}_y\mathcal{I}_{xy} + (1 + \mathcal{I}_y^2)\mathcal{I}_{xx} }{ \left[ \sqrt{ 1 + \mathcal{I}_x^2 + \mathcal{I}_y^2 } \right]^3 } = \frac{\kappa_1 + \kappa_2}{2} $$ and the Gaussian curvature by: $$ G(\mathcal{I}) = \frac{ -\det \begin{vmatrix} H(\mathcal{I}) & \nabla\mathcal{I}^T \\ \nabla\mathcal{I} & 0 \end{vmatrix} }{ |\nabla\mathcal{I}|^4 } = \frac{ \mathcal{I}_{xx}\mathcal{I}_{yy} - \mathcal{I}_{xy}^2 }{ (\mathcal{I}_x^2 + \mathcal{I}_y^2 + 1)^2 } = \kappa_1\kappa_2 $$ where the $\kappa$ values are the principal curvatures (eigenvalues of the shape operator).

In addition, we define the Laplacian operator in the standard way: $$ \Delta \mathcal{I} = \frac{\partial^2 \mathcal{I}}{\partial x^2} + \frac{\partial^2 \mathcal{I}}{\partial y^2} $$ As well as the divergence operator: $$ \nabla\cdot \mathcal{I} = \frac{\partial \mathcal{I}}{\partial x} + \frac{\partial \mathcal{I}}{\partial y} $$

Non-linear Anisotropic Diffusion
These methods solve a discretized PDE to run a physics-based simulation of (surprisingly) diffusion of the pixel values. This essentially treats edges as "walls": i.e. it tries to preserve edges by prohibiting diffusion across them, but it filters (theoretically, denoises) other areas. Jsrtaa tries to implement two simple types: Perona-Malik and curvature-driven. Note that a directional diffusion tensor is not used, so these methods are somewhat isotropic in that sense.

Perona-Malik diffusion obeys the following PDE: $$ \frac{\partial \mathcal{I}}{\partial t} = g_k(\nabla \mathcal{I}) \Delta\mathcal{I} + \nabla g_k(\nabla \mathcal{I}) \cdot \nabla \mathcal{I} $$ where $g_k$ can be any of the exp, invsq, or tukey biweight functions: $$ a_k(\nabla \mathcal{I}) = \exp\left( \frac{-||\nabla \mathcal{I}||^2}{k^2} \right) $$ $$ b_k(\nabla \mathcal{I}) = \left[ 1 + \frac{||\nabla I||^2}{k^2} \right]^{-1} $$ $$ c_k(\nabla \mathcal{I}) = \begin{cases} \frac{1}{2}\left[ 1- \frac{||\nabla I||^2}{k^2} \right]^2 & \text{if }||\nabla I|| \leq k \\ 0 & \text{elsewhere} \end{cases} $$ See Scale-space and edge detection using anisotropic diffusion , Perona and Malik, 1990, and On the choice of the parameters for anisotropic diffusion in image processing , Tsiotsios and Petrou, 2012, for details.

Curvature-driven diffusion obeys the following PDE: $$ \frac{\partial \mathcal{I}}{\partial t} = \nabla \cdot \left[ \Phi\left( \lambda C_I(x,y) \right) \nabla\mathcal{I}(x,y) \right] $$ where $C_I$ can either be the mean $\mathcal{H}$ or gaussian $G$ curvature functions on the image, $\lambda\in\mathbb{R}$, and $\Phi$ can be any of the absolute value, affine invariant (cube root), identity, or total variation diminishing ($tvd$) functions, where: $$ tvd(\kappa) = \kappa/||\nabla \mathcal{I}|| $$ Jsrtaa mostly follows the description given in Noise Removal with Gauss Curvature-driven diffusion , Lee and Seo, 2005.

Image Comparison
We have the following standard measures for computing the distance between images: $$ {RMSE}_{k}(I_1,I_2) = \sqrt{ \frac{1}{|X||Y|} \sum\limits_{i\in \mathcal{X}} \sum\limits_{j\in \mathcal{Y}} \sum\limits_{p\in \mathcal{P}} || I_1(i,j,p) - I_2(i,j,p) ||_{\mathcal{L}_k} } $$ and the peak signal to noise ratio $$ PSNR_k(I_1,I_2) = 20\log_{10}(\max U / RMSE_k(I_1,I_2) ) $$ where $k$ denotes the type of $\mathcal{L}_k$ metric space in which to calculate the pixel distance, $\mathcal{P}$ is the set of colour channels, $\max U$ is the maximum pixel value possible in the image, and $i,j$ range spatially over the image.

We can also compute the normalized information-theoretic difference in a metric space using the Jensen-Shannon divergence: $$ \mathcal{J}(\mathcal{I}_1,\mathcal{I}_2) = \frac{1}{3\ln 2} \sum\limits_{i\in \mathcal{P}} \left[ \frac{1}{2} \int\limits_{-\infty}^{\infty} p_1(x_i) \ln \frac{p_1(x_i)}{q_m(x_i)}\,dx_i + \frac{1}{2} \int\limits_{-\infty}^{\infty} p_2(y_i) \ln \frac{p_2(y_i)}{q_m(y_i)}\,dy_i \right] % P = {r,g,b} % M = probdist of I1+I2 / 2 % p and q are the prob dists of I1 and I2 $$ This has the same caveat as our Shannon entropy (above).

Noise
We can add noise to the images in the standard ways: adding uniformly or gaussian distributed additive noise to each pixel, or (like Matlab) consider each image pixel as being Poisson distributed and drawing a new value from the distribution. Impulse noise (salt-and-pepper noise) replaces a given pixel with a fully randomly valued pixel, with given probability $p$.

We can also estimate the noise levels (useful for some algorithms, especially denoising ones,unsurprisingly). One method is the so-called Canny noise estimator; see, for example, the Perona-Malik anisotropic diffusion paper, which uses it. The Gaussian or non-parametric noise distribution variance can also be estimated by the difference in results of Laplacian kernel convolution, in a simple fashion: see Fast Noise Variance Estimation , J Immerkaer, 1995.

Locally Adaptive Zooming Algorithm
LAZA is a rule-based super-resolution algorithm, defined in A Locally-Adaptive Zooming Algorithm for Digital Images by Battiato et al, 2002.