Motivation:

Within the context of optimisation, differentiable approximations of the and operators on are very useful. In particular, I am interested in analytical approximations :

\begin{equation} f_N: \mathbb{R}^n \rightarrow \mathbb{R} \end{equation}

\begin{equation} g_N: \mathbb{R}^n \rightarrow \mathbb{R} \end{equation}

where :

\begin{equation} \forall m > N, \max(\lvert f_m(X)-\max(X) \rvert, \lvert g_m(X)-\max(X) \rvert) < \epsilon \end{equation}

I found a few proposed solutions to a related question on MathOverflow but when I tested these methods I found that none are numerically stable with respect to the relative error .

This motivated me to come up with a numerically stable solution inspired by properties of the infinity norm:

\begin{equation} \forall X \in \mathbb{R}^n, \lVert X \rVert_{\infty} = \max(X) = \bar{x} \in \mathbb{R} \end{equation}

where the reason for (4) is that where :

\begin{equation} \lVert X \rVert_{\infty} = \lim_{n \to \infty} (\sum_{i=1}^n x_i^n)^{\frac{1}{n}} = \lim_{n \to \infty} \bar{x} \cdot \big(\sum_{i=1}^n \big(\frac{x_i}{\bar{x}}\big)^n\big)^{\frac{1}{n}} = \bar{x} \end{equation}

which may be used to define analytic approximations to both the min and max operators on provided that we use an order-preserving transformation from to .

Analysis of several proposed solutions:

Without loss of generality, let’s focus on approximations to the operator. In total, I analysed three different analytic approximations to this operator. Two which were taken from the MathOverflow [2] and the smooth max, which has its own Wikipedia page [3].

I actually encourage the interested reader to experiment with the following methods which are all correct in principle but all susceptible to overflow errors:

The generalised mean:

\begin{equation} \forall X \in \mathbb{R_+}^n \forall N \in \mathbb{N}, GM(X,N)= \Big(\frac{1}{n}\sum_{i=1}^n x_i^N\Big)^{\frac{1}{N}} \end{equation}

converges to since:

\begin{equation} \lim_{N \to \infty} \Big(\frac{1}{n} \sum_{i=1}^n x_i^N \Big)^{\frac{1}{N}} = \lim_{N \to \infty} \bar{x} \cdot \Big(\frac{1}{n} \sum_{i=1}^n \big(\frac{x_i}{\bar{x}}\big)^N \Big)^{\frac{1}{N}} \end{equation}

where

\begin{equation} n^{-\frac{1}{N}} \leq \Big(\frac{1}{n} \sum_{i=1}^n \big(\frac{x_i}{\bar{x}}\Big)^N \leq 1 \end{equation}

In Julia this may be implemented as follows:

function GM(X::Array{Float64, 1},N::Int64)

	## generalised mean: https://en.wikipedia.org/wiki/Generalized_mean

	## this method returns a type error unless all elements of X are positive:
	## https://math.stackexchange.com/questions/317528/how-do-you-compute-negative-numbers-to-fractional-powers/317546#317546

	return mean(X.^N)^(1/N)

end

It is worth noting that this particular method returns a type error for odd exponents unless all elements of are positive.

Exponential generalised mean:

\begin{equation} \forall X \in \mathbb{R}^n \forall N \in \mathbb{N}, EM(X,N)= \frac{1}{N} \cdot \log \big(\frac{1}{n}\sum_{i=1}^n e^{N \cdot x_i}\big) \end{equation}

converges to since:

\begin{equation} \frac{e^{N \bar{x}}}{n} \leq \frac{1}{n} \sum_{i=1}^n e^{N x_i} \leq e^{N \bar{x}} \end{equation}

so we have:

\begin{equation} \bar{x} - \frac{\log n}{N} \leq \frac{1}{N} \cdot \log \big(\frac{1}{n}\sum_{i=1}^n e^{N \cdot x_i}\big) \leq \bar{x} \end{equation}

In Julia this may be implemented as follows:

function exp_GM(X::Array{Float64, 1},N::Int64)

	### This method is terrible. Overflow errors everywhere. 

	exp_ = mean(exp.(N*X))

	return (1/N)*log(exp_)

end

Unlike the previous method, this one is defined but it too is numerically unstable.

The smooth max:

\begin{equation} \forall X \in \mathbb{R}^n \forall N \in \mathbb{N}, SM(X,N)= \frac{\sum_{i=1}^n x_i \cdot e^{N \cdot x_i}}{\sum_{i=1}^n e^{N \cdot x_i}} \end{equation}

converges to since so we may define the sets and where and so we have:

\begin{equation} \frac{\sum_{i=1}^n x_i \cdot e^{N \cdot x_i}}{\sum_{i=1}^n e^{N \cdot x_i}} = \frac{\sum_{i=1}^n x_i \cdot e^{N \cdot (x_i-\bar{x})}}{\sum_{i=1}^n e^{N \cdot (x_i-\bar{x})}} = \frac{k\bar{x} + \sum_{x_i < \bar{x}} x_i \cdot e^{N \cdot (x_i-\bar{x})}}{k + \sum_{x_i < \bar{x}} e^{N \cdot (x_i-\bar{x})}} \end{equation}

and we note that:

\begin{equation} \lim_{N \to \infty} \max \big(\sum_{x_i < \bar{x}} x_i \cdot e^{N \cdot (x_i-\bar{x})},\sum_{x_i < \bar{x}} e^{N \cdot (x_i-\bar{x})}\big) = 0 \end{equation}

In Julia this may be implemented as follows:

function smooth_max(X::Array{Float64, 1},N::Int64)

	## implementation of the smooth maximum: 
	## https://en.wikipedia.org/wiki/Smooth_maximum

	exp_ = exp.(X*N)

	return sum(X.*exp_)/sum(exp_)

end

This method appears to be the industry standard but like the other methods it is vulnerable to overflow errors as it fails to normalise the input vector . In fact, the reader might want to explore an IJulia notebook where I analysed each method.

My proposed solution:

This analysis motivated me to come up with my own solution inspired by the properties of the infinity norm where I first rescale the vectors so they have zero mean and unit variance:

\begin{equation} \forall X \in \mathbb{R}^n \forall N \in \mathbb{N}, AM(\hat{X},N) = \sigma_X \cdot \big(\frac{1}{N} \log \big(\sum_{i=1}^n e^{N\cdot \hat{x_i}} \big)\big) + \mu_{X} \end{equation}

where is defined as follows:

\begin{equation} \hat{X} = \frac{X - 1_n\cdot \mu_X}{\sigma_X} \end{equation}

and the partial derivative of with respect to is simply the softmax:

\begin{equation} \frac{\partial}{\partial \hat{x_i}} AM(\hat{X},N) = \frac{e^{N \cdot \hat{x_i}}}{\sum_{i=1}^n e^{N\cdot \hat{x_i}}} \end{equation}

This method may be readily generalised to approximate both the and operators on in the Julia programming language:

using Statistics

function analytic_min_max(X::Array{Float64, 1},N::Int64,case::Int64)

    """
        An analytic approximation to the min and max operators

        Inputs: 
            X: a vector from R^n where n is unknown

            N: an integer such that the approximation of max(X) 
               improves with increasing N.

            case: If case == 1 apply analytic_min(), otherwise 
                  apply analytic_max() if case == 2

        Output: 

            An approximation to min(X) if case == 1, and max(X) if 
            case == 2
    """

    if (case != 1)*(case != 2)

        return print("Error: case isn't well defined")

    else

        ## q is the degree of the approximation: 
        q = N*(-1)^case

        mu, sigma = mean(X), std(X)

        ## rescale vector so it has zero mean and unit variance:
        Z_score = (X.-mu)./sigma

        exp_sum = sum(exp.(Z_score*q))

        log_ = log(exp_sum)/q

        return (log_*sigma)+mu

    end

end

and the reader may check that it passes the following numerical stability test:

function numerical_stability(method,type::Int64)

	"""
		A simple test for numerical stability with respect to the relative error. 

		Input: 
			
			method: the approximation used

			type: 1 for min() and 2 for max()

		Output: 

			Check that the average relative error is less than 10%.  

	"""

	## test will be run 100 times 
	relative_errors = zeros(100)

	for i = 1:100

		## a vector sampled uniformly from [-1000,1000]^100
		X = (2*rand(100).-1.0)*1000

		## the test for min operators
		if type == 1

			min_ = minimum(X)

			relative_errors[i] = abs(min_-method(X,i))/abs(min_)

		## the test for max operators
		else 

			max_ = maximum(X)

			relative_errors[i] = abs(max_-method(X,i))/abs(max_)

		end

	end

	return mean(relative_errors) < 0.1

end

Discussion:

It took me about an hour to come up with my solution so I doubt this method is either original or state-of-the-art. In fact, I wouldn’t be surprised at all if numerical analysts have a better solution than the one I propose here.

If you know of a more robust method, feel free to join the discussion on the MathOverflow.

References:

  1. Aidan Rocke (https://mathoverflow.net/users/56328/aidan-rocke), analytic approximations of the min and max operators, URL (version: 2020-02-13): https://mathoverflow.net/q/352548

  2. eakbas (https://mathoverflow.net/users/5223/eakbas), A differentiable approximation to the minimum function, URL (version: 2016-08-12): https://mathoverflow.net/q/35191

  3. Wikipedia contributors. Smooth maximum. Wikipedia, The Free Encyclopedia. March 25, 2019, 21:07 UTC. Available at: https://en.wikipedia.org/w/index.php?title=Smooth_maximum&oldid=889462421. Accessed February 12, 2020.

  4. Wikipedia contributors. (2019, December 3). Generalized mean. In Wikipedia, The Free Encyclopedia. Retrieved 21:37, February 13, 2020, from https://en.wikipedia.org/w/index.php?title=Generalized_mean&oldid=929065968

  5. alwayscurious (https://stats.stackexchange.com/users/194748/alwayscurious), What is the reasoning behind standardization (dividing by standard deviation)?, URL (version: 2019-03-18): https://stats.stackexchange.com/q/398116

  6. Sergey Ioffe & Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. 2015.

  7. J. Cook. Basic properties of the soft maximum. Working Paper Series 70, UT MD Anderson Cancer Center Department of Biostatistics, 2011. https://www.johndcook.com/soft_maximum.pdf

  8. M. Lange, D. Zühlke, O. Holz, and T. Villmann, “Applications of lp-norms and their smooth approximations for gradient based learning vector quantization,” in Proc. ESANN, Apr. 2014, pp. 271-276.

  9. Aidan Rocke. analytic_min-max_operators(2020).GitHub repository, https://github.com/AidanRocke/analytic_min-max_operators