## Motivation:

Within the context of optimisation, differentiable approximations of the $\min$ and $\max$ operators on $\mathbb{R}^n$ are very useful. In particular, I am interested in analytical approximations $f_N,g_N \in C^{\infty}$:

$$f_N: \mathbb{R}^n \rightarrow \mathbb{R}$$

$$g_N: \mathbb{R}^n \rightarrow \mathbb{R}$$

where $\forall X \in \mathbb{R}^n \forall \epsilon > 0 \exists N \in \mathbb{N}$:

$$\forall m > N, \max(\lvert f_m(X)-\max(X) \rvert, \lvert g_m(X)-\max(X) \rvert) < \epsilon$$

I found a few proposed solutions to a related question on MathOverflow but when I tested these methods I found that none are numerically stable with respect to the relative error $\frac{\lvert \delta x \rvert}{\lvert x \rvert}$.

This motivated me to come up with a numerically stable solution inspired by properties of the infinity norm:

$$\forall X \in \mathbb{R}^n, \lVert X \rVert_{\infty} = \max(X) = \bar{x} \in \mathbb{R}$$

where the reason for (4) is that $\forall X \in \mathbb{R_{+}}^n$ where $\max(X) \neq 0$:

$$\lVert X \rVert_{\infty} = \lim_{n \to \infty} (\sum_{i=1}^n x_i^n)^{\frac{1}{n}} = \lim_{n \to \infty} \bar{x} \cdot \big(\sum_{i=1}^n \big(\frac{x_i}{\bar{x}}\big)^n\big)^{\frac{1}{n}} = \bar{x}$$

which may be used to define analytic approximations to both the min and max operators on $\mathbb{R}^n$ provided that we use an order-preserving transformation from $\mathbb{R}^n$ to $\mathbb{R_{+}}^n$.

## Analysis of several proposed solutions:

Without loss of generality, let’s focus on approximations to the $\max$ operator. In total, I analysed three different analytic approximations to this operator. Two which were taken from the MathOverflow [2] and the smooth max, which has its own Wikipedia page [3].

I actually encourage the interested reader to experiment with the following methods which are all correct in principle but all susceptible to overflow errors:

### The generalised mean:

$$\forall X \in \mathbb{R_+}^n \forall N \in \mathbb{N}, GM(X,N)= \Big(\frac{1}{n}\sum_{i=1}^n x_i^N\Big)^{\frac{1}{N}}$$

converges to $\max(X), \forall X \in \mathbb{R_+}^n$ since:

$$\lim_{N \to \infty} \Big(\frac{1}{n} \sum_{i=1}^n x_i^N \Big)^{\frac{1}{N}} = \lim_{N \to \infty} \bar{x} \cdot \Big(\frac{1}{n} \sum_{i=1}^n \big(\frac{x_i}{\bar{x}}\big)^N \Big)^{\frac{1}{N}}$$

where

$$n^{-\frac{1}{N}} \leq \Big(\frac{1}{n} \sum_{i=1}^n \big(\frac{x_i}{\bar{x}}\Big)^N \leq 1$$

In Julia this may be implemented as follows:

function GM(X::Array{Float64, 1},N::Int64)

## generalised mean: https://en.wikipedia.org/wiki/Generalized_mean

## this method returns a type error unless all elements of X are positive:
## https://math.stackexchange.com/questions/317528/how-do-you-compute-negative-numbers-to-fractional-powers/317546#317546

return mean(X.^N)^(1/N)

end


It is worth noting that this particular method returns a type error for odd exponents unless all elements of $X$ are positive.

### Exponential generalised mean:

$$\forall X \in \mathbb{R}^n \forall N \in \mathbb{N}, EM(X,N)= \frac{1}{N} \cdot \log \big(\frac{1}{n}\sum_{i=1}^n e^{N \cdot x_i}\big)$$

converges to $\max(X), \forall X \in \mathbb{R}^n$ since:

$$\frac{e^{N \bar{x}}}{n} \leq \frac{1}{n} \sum_{i=1}^n e^{N x_i} \leq e^{N \bar{x}}$$

so we have:

$$\bar{x} - \frac{\log n}{N} \leq \frac{1}{N} \cdot \log \big(\frac{1}{n}\sum_{i=1}^n e^{N \cdot x_i}\big) \leq \bar{x}$$

In Julia this may be implemented as follows:

function exp_GM(X::Array{Float64, 1},N::Int64)

### This method is terrible. Overflow errors everywhere.

exp_ = mean(exp.(N*X))

return (1/N)*log(exp_)

end


Unlike the previous method, this one is defined $\forall X \in \mathbb{R}^n$ but it too is numerically unstable.

### The smooth max:

$$\forall X \in \mathbb{R}^n \forall N \in \mathbb{N}, SM(X,N)= \frac{\sum_{i=1}^n x_i \cdot e^{N \cdot x_i}}{\sum_{i=1}^n e^{N \cdot x_i}}$$

converges to $\max(X), \forall X \in \mathbb{R}^n$ since $x_i - \bar{x} \leq 0$ so we may define the sets $\{x_i = \bar{x}\}$ and $% $ where $\lvert \{x_i = \bar{x}\} \rvert = k$ and $% $ so we have:

$$\frac{\sum_{i=1}^n x_i \cdot e^{N \cdot x_i}}{\sum_{i=1}^n e^{N \cdot x_i}} = \frac{\sum_{i=1}^n x_i \cdot e^{N \cdot (x_i-\bar{x})}}{\sum_{i=1}^n e^{N \cdot (x_i-\bar{x})}} = \frac{k\bar{x} + \sum_{x_i < \bar{x}} x_i \cdot e^{N \cdot (x_i-\bar{x})}}{k + \sum_{x_i < \bar{x}} e^{N \cdot (x_i-\bar{x})}}$$

and we note that:

$$\lim_{N \to \infty} \max \big(\sum_{x_i < \bar{x}} x_i \cdot e^{N \cdot (x_i-\bar{x})},\sum_{x_i < \bar{x}} e^{N \cdot (x_i-\bar{x})}\big) = 0$$

In Julia this may be implemented as follows:

function smooth_max(X::Array{Float64, 1},N::Int64)

## implementation of the smooth maximum:
## https://en.wikipedia.org/wiki/Smooth_maximum

exp_ = exp.(X*N)

return sum(X.*exp_)/sum(exp_)

end


This method appears to be the industry standard but like the other methods it is vulnerable to overflow errors as it fails to normalise the input vector $X$. In fact, the reader might want to explore an IJulia notebook where I analysed each method.

## My proposed solution:

This analysis motivated me to come up with my own solution inspired by the properties of the infinity norm where I first rescale the vectors so they have zero mean and unit variance:

$$\forall X \in \mathbb{R}^n \forall N \in \mathbb{N}, AM(\hat{X},N) = \sigma_X \cdot \big(\frac{1}{N} \log \big(\sum_{i=1}^n e^{N\cdot \hat{x_i}} \big)\big) + \mu_{X}$$

where $\hat{X}$ is defined as follows:

$$\hat{X} = \frac{X - 1_n\cdot \mu_X}{\sigma_X}$$

and the partial derivative of $AM(\hat{X},N)$ with respect to $\hat{x_i}$ is simply the softmax:

$$\frac{\partial}{\partial \hat{x_i}} AM(\hat{X},N) = \frac{e^{N \cdot \hat{x_i}}}{\sum_{i=1}^n e^{N\cdot \hat{x_i}}}$$

This method may be readily generalised to approximate both the $\min$ and $\max$ operators on $\mathbb{R}^n$ in the Julia programming language:

using Statistics

function analytic_min_max(X::Array{Float64, 1},N::Int64,case::Int64)

"""
An analytic approximation to the min and max operators

Inputs:
X: a vector from R^n where n is unknown

N: an integer such that the approximation of max(X)
improves with increasing N.

case: If case == 1 apply analytic_min(), otherwise
apply analytic_max() if case == 2

Output:

An approximation to min(X) if case == 1, and max(X) if
case == 2
"""

if (case != 1)*(case != 2)

return print("Error: case isn't well defined")

else

## q is the degree of the approximation:
q = N*(-1)^case

mu, sigma = mean(X), std(X)

## rescale vector so it has zero mean and unit variance:
Z_score = (X.-mu)./sigma

exp_sum = sum(exp.(Z_score*q))

log_ = log(exp_sum)/q

return (log_*sigma)+mu

end

end


and the reader may check that it passes the following numerical stability test:

function numerical_stability(method,type::Int64)

"""
A simple test for numerical stability with respect to the relative error.

Input:

method: the approximation used

type: 1 for min() and 2 for max()

Output:

Check that the average relative error is less than 10%.

"""

## test will be run 100 times
relative_errors = zeros(100)

for i = 1:100

## a vector sampled uniformly from [-1000,1000]^100
X = (2*rand(100).-1.0)*1000

## the test for min operators
if type == 1

min_ = minimum(X)

relative_errors[i] = abs(min_-method(X,i))/abs(min_)

## the test for max operators
else

max_ = maximum(X)

relative_errors[i] = abs(max_-method(X,i))/abs(max_)

end

end

return mean(relative_errors) < 0.1

end


## Discussion:

It took me about an hour to come up with my solution so I doubt this method is either original or state-of-the-art. In fact, I wouldn’t be surprised at all if numerical analysts have a better solution than the one I propose here.

If you know of a more robust method, feel free to join the discussion on the MathOverflow.

## References:

1. Aidan Rocke (https://mathoverflow.net/users/56328/aidan-rocke), analytic approximations of the min and max operators, URL (version: 2020-02-13): https://mathoverflow.net/q/352548

2. eakbas (https://mathoverflow.net/users/5223/eakbas), A differentiable approximation to the minimum function, URL (version: 2016-08-12): https://mathoverflow.net/q/35191

3. Wikipedia contributors. Smooth maximum. Wikipedia, The Free Encyclopedia. March 25, 2019, 21:07 UTC. Available at: https://en.wikipedia.org/w/index.php?title=Smooth_maximum&oldid=889462421. Accessed February 12, 2020.

4. Wikipedia contributors. (2019, December 3). Generalized mean. In Wikipedia, The Free Encyclopedia. Retrieved 21:37, February 13, 2020, from https://en.wikipedia.org/w/index.php?title=Generalized_mean&oldid=929065968

5. alwayscurious (https://stats.stackexchange.com/users/194748/alwayscurious), What is the reasoning behind standardization (dividing by standard deviation)?, URL (version: 2019-03-18): https://stats.stackexchange.com/q/398116

6. Sergey Ioffe & Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. 2015.

7. J. Cook. Basic properties of the soft maximum. Working Paper Series 70, UT MD Anderson Cancer Center Department of Biostatistics, 2011. https://www.johndcook.com/soft_maximum.pdf

8. M. Lange, D. Zühlke, O. Holz, and T. Villmann, “Applications of lp-norms and their smooth approximations for gradient based learning vector quantization,” in Proc. ESANN, Apr. 2014, pp. 271-276.

9. Aidan Rocke. analytic_min-max_operators(2020).GitHub repository, https://github.com/AidanRocke/analytic_min-max_operators