Differentiable approximations to the min and max operators
Motivation:
Within the context of optimisation, differentiable approximations of the and operators on are very useful. In particular, I am interested in analytical approximations :
\begin{equation} f_N: \mathbb{R}^n \rightarrow \mathbb{R} \end{equation}
\begin{equation} g_N: \mathbb{R}^n \rightarrow \mathbb{R} \end{equation}
where :
\begin{equation} \forall m > N, \max(\lvert f_m(X)\max(X) \rvert, \lvert g_m(X)\max(X) \rvert) < \epsilon \end{equation}
I found a few proposed solutions to a related question on MathOverflow but when I tested these methods I found that none are numerically stable with respect to the relative error .
This motivated me to come up with a numerically stable solution inspired by properties of the infinity norm:
\begin{equation} \forall X \in \mathbb{R}^n, \lVert X \rVert_{\infty} = \max(X) = \bar{x} \in \mathbb{R} \end{equation}
where the reason for (4) is that where :
\begin{equation} \lVert X \rVert_{\infty} = \lim_{n \to \infty} (\sum_{i=1}^n x_i^n)^{\frac{1}{n}} = \lim_{n \to \infty} \bar{x} \cdot \big(\sum_{i=1}^n \big(\frac{x_i}{\bar{x}}\big)^n\big)^{\frac{1}{n}} = \bar{x} \end{equation}
which may be used to define analytic approximations to both the min and max operators on provided that we use an orderpreserving transformation from to .
Analysis of several proposed solutions:
Without loss of generality, let’s focus on approximations to the operator. In total, I analysed three different analytic approximations to this operator. Two which were taken from the MathOverflow [2] and the smooth max, which has its own Wikipedia page [3].
I actually encourage the interested reader to experiment with the following methods which are all correct in principle but all susceptible to overflow errors:
The generalised mean:
\begin{equation} \forall X \in \mathbb{R_+}^n \forall N \in \mathbb{N}, GM(X,N)= \Big(\frac{1}{n}\sum_{i=1}^n x_i^N\Big)^{\frac{1}{N}} \end{equation}
converges to since:
\begin{equation} \lim_{N \to \infty} \Big(\frac{1}{n} \sum_{i=1}^n x_i^N \Big)^{\frac{1}{N}} = \lim_{N \to \infty} \bar{x} \cdot \Big(\frac{1}{n} \sum_{i=1}^n \big(\frac{x_i}{\bar{x}}\big)^N \Big)^{\frac{1}{N}} \end{equation}
where
\begin{equation} n^{\frac{1}{N}} \leq \Big(\frac{1}{n} \sum_{i=1}^n \big(\frac{x_i}{\bar{x}}\Big)^N \leq 1 \end{equation}
In Julia this may be implemented as follows:
function GM(X::Array{Float64, 1},N::Int64)
## generalised mean: https://en.wikipedia.org/wiki/Generalized_mean
## this method returns a type error unless all elements of X are positive:
## https://math.stackexchange.com/questions/317528/howdoyoucomputenegativenumberstofractionalpowers/317546#317546
return mean(X.^N)^(1/N)
end
It is worth noting that this particular method returns a type error for odd exponents unless all elements of are positive.
Exponential generalised mean:
\begin{equation} \forall X \in \mathbb{R}^n \forall N \in \mathbb{N}, EM(X,N)= \frac{1}{N} \cdot \log \big(\frac{1}{n}\sum_{i=1}^n e^{N \cdot x_i}\big) \end{equation}
converges to since:
\begin{equation} \frac{e^{N \bar{x}}}{n} \leq \frac{1}{n} \sum_{i=1}^n e^{N x_i} \leq e^{N \bar{x}} \end{equation}
so we have:
\begin{equation} \bar{x}  \frac{\log n}{N} \leq \frac{1}{N} \cdot \log \big(\frac{1}{n}\sum_{i=1}^n e^{N \cdot x_i}\big) \leq \bar{x} \end{equation}
In Julia this may be implemented as follows:
function exp_GM(X::Array{Float64, 1},N::Int64)
### This method is terrible. Overflow errors everywhere.
exp_ = mean(exp.(N*X))
return (1/N)*log(exp_)
end
Unlike the previous method, this one is defined but it too is numerically unstable.
The smooth max:
\begin{equation} \forall X \in \mathbb{R}^n \forall N \in \mathbb{N}, SM(X,N)= \frac{\sum_{i=1}^n x_i \cdot e^{N \cdot x_i}}{\sum_{i=1}^n e^{N \cdot x_i}} \end{equation}
converges to since so we may define the sets and where and so we have:
\begin{equation} \frac{\sum_{i=1}^n x_i \cdot e^{N \cdot x_i}}{\sum_{i=1}^n e^{N \cdot x_i}} = \frac{\sum_{i=1}^n x_i \cdot e^{N \cdot (x_i\bar{x})}}{\sum_{i=1}^n e^{N \cdot (x_i\bar{x})}} = \frac{k\bar{x} + \sum_{x_i < \bar{x}} x_i \cdot e^{N \cdot (x_i\bar{x})}}{k + \sum_{x_i < \bar{x}} e^{N \cdot (x_i\bar{x})}} \end{equation}
and we note that:
\begin{equation} \lim_{N \to \infty} \max \big(\sum_{x_i < \bar{x}} x_i \cdot e^{N \cdot (x_i\bar{x})},\sum_{x_i < \bar{x}} e^{N \cdot (x_i\bar{x})}\big) = 0 \end{equation}
In Julia this may be implemented as follows:
function smooth_max(X::Array{Float64, 1},N::Int64)
## implementation of the smooth maximum:
## https://en.wikipedia.org/wiki/Smooth_maximum
exp_ = exp.(X*N)
return sum(X.*exp_)/sum(exp_)
end
This method appears to be the industry standard but like the other methods it is vulnerable to overflow errors as it fails to normalise the input vector . In fact, the reader might want to explore an IJulia notebook where I analysed each method.
My proposed solution:
This analysis motivated me to come up with my own solution inspired by the properties of the infinity norm where I first rescale the vectors so they have zero mean and unit variance:
\begin{equation} \forall X \in \mathbb{R}^n \forall N \in \mathbb{N}, AM(\hat{X},N) = \sigma_X \cdot \big(\frac{1}{N} \log \big(\sum_{i=1}^n e^{N\cdot \hat{x_i}} \big)\big) + \mu_{X} \end{equation}
where is defined as follows:
\begin{equation} \hat{X} = \frac{X  1_n\cdot \mu_X}{\sigma_X} \end{equation}
and the partial derivative of with respect to is simply the softmax:
\begin{equation} \frac{\partial}{\partial \hat{x_i}} AM(\hat{X},N) = \frac{e^{N \cdot \hat{x_i}}}{\sum_{i=1}^n e^{N\cdot \hat{x_i}}} \end{equation}
This method may be readily generalised to approximate both the and operators on in the Julia programming language:
using Statistics
function analytic_min_max(X::Array{Float64, 1},N::Int64,case::Int64)
"""
An analytic approximation to the min and max operators
Inputs:
X: a vector from R^n where n is unknown
N: an integer such that the approximation of max(X)
improves with increasing N.
case: If case == 1 apply analytic_min(), otherwise
apply analytic_max() if case == 2
Output:
An approximation to min(X) if case == 1, and max(X) if
case == 2
"""
if (case != 1)*(case != 2)
return print("Error: case isn't well defined")
else
## q is the degree of the approximation:
q = N*(1)^case
mu, sigma = mean(X), std(X)
## rescale vector so it has zero mean and unit variance:
Z_score = (X.mu)./sigma
exp_sum = sum(exp.(Z_score*q))
log_ = log(exp_sum)/q
return (log_*sigma)+mu
end
end
and the reader may check that it passes the following numerical stability test:
function numerical_stability(method,type::Int64)
"""
A simple test for numerical stability with respect to the relative error.
Input:
method: the approximation used
type: 1 for min() and 2 for max()
Output:
Check that the average relative error is less than 10%.
"""
## test will be run 100 times
relative_errors = zeros(100)
for i = 1:100
## a vector sampled uniformly from [1000,1000]^100
X = (2*rand(100).1.0)*1000
## the test for min operators
if type == 1
min_ = minimum(X)
relative_errors[i] = abs(min_method(X,i))/abs(min_)
## the test for max operators
else
max_ = maximum(X)
relative_errors[i] = abs(max_method(X,i))/abs(max_)
end
end
return mean(relative_errors) < 0.1
end
Discussion:
It took me about an hour to come up with my solution so I doubt this method is either original or stateoftheart. In fact, I wouldn’t be surprised at all if numerical analysts have a better solution than the one I propose here.
If you know of a more robust method, feel free to join the discussion on the MathOverflow.
References:

Aidan Rocke (https://mathoverflow.net/users/56328/aidanrocke), analytic approximations of the min and max operators, URL (version: 20200213): https://mathoverflow.net/q/352548

eakbas (https://mathoverflow.net/users/5223/eakbas), A differentiable approximation to the minimum function, URL (version: 20160812): https://mathoverflow.net/q/35191

Wikipedia contributors. Smooth maximum. Wikipedia, The Free Encyclopedia. March 25, 2019, 21:07 UTC. Available at: https://en.wikipedia.org/w/index.php?title=Smooth_maximum&oldid=889462421. Accessed February 12, 2020.

Wikipedia contributors. (2019, December 3). Generalized mean. In Wikipedia, The Free Encyclopedia. Retrieved 21:37, February 13, 2020, from https://en.wikipedia.org/w/index.php?title=Generalized_mean&oldid=929065968

alwayscurious (https://stats.stackexchange.com/users/194748/alwayscurious), What is the reasoning behind standardization (dividing by standard deviation)?, URL (version: 20190318): https://stats.stackexchange.com/q/398116

Sergey Ioffe & Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. 2015.

J. Cook. Basic properties of the soft maximum. Working Paper Series 70, UT MD Anderson Cancer Center Department of Biostatistics, 2011. https://www.johndcook.com/soft_maximum.pdf

M. Lange, D. Zühlke, O. Holz, and T. Villmann, “Applications of lpnorms and their smooth approximations for gradient based learning vector quantization,” in Proc. ESANN, Apr. 2014, pp. 271276.

Aidan Rocke. analytic_minmax_operators(2020).GitHub repository, https://github.com/AidanRocke/analytic_minmax_operators