Miguel L´azaro-Gredilla [email protected]

January 2013 Machine Learning Group http://www.tsc.uc3m.es/~miguel/MLG/

Contents

Towards a modern machine learning

Probabilistic programming languages

Infer.NET The software stack: A digression The modeling language

References

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Classical machine learning I

I

Large number of tools with diverse backgrounds I

k-means

I

Principal Component Analysis

I

Independent Component Analysis

I

Classical Neural Networks

I

Support Vector Machines

I

Density estimation via Parzen windows

I

Recursive Least Squares

I

Least Mean Squares

I

(just an arbitrary sample, we could go on and on...)

Pragmatic, unsystematic, non-probabilistic 1/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Third generation machine learning I

Data sets described as instances of a probabilistic model I

Gaussian Process regression/classification

I

Latent Dirichlet Allocation

I

Bayes Point Machine

I

...

I

We can infer unknowns in a principled way

I

Features I

Each tool can be expressed as a Bayesian network

I

Systematic, modular, standardized approach

I

Proposals are models, not algorithms

I

Detach inference from model 2/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Updating classical machine learning (I/V) I

Should we just dump classical ML and jump on the Bayesian bandwagon?

I

Most classical ML tools can be written as some type of inference on some Bayesian network

I

So just update classical ML using a Bayesian interpretation I

Gain insight on the model behind the algorithm

I

See overlaps emerge

I

Use other types of inference

I

It may become obvious how to enhance them

3/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Updating classical machine learning (II/V) I

Classical algorithm: k-means

I

Bayesian model (assuming normalized data) p(xn |zn , {µk }) = N (xn |µzn , v I) p(v ) = InvGamma(v |1, 1) p(µk ) = N (µk |0, 10I) p(zn |w) = Discrete(zn |w) p(w) = Dirichlet(w|1k×1 )

I

Classical model corresponds to I

Maximum likelihood for p({xn }|{zn }, {µk }), obtained using hard-EM optimization

I

Particular case of Gaussian mixture model 4/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Updating classical machine learning (III/V) I

Classical algorithm: Extended Recursive Least Squares (adaptive)

I

Bayesian model p(xn |wn ) = N (xn |w> n un , v ) p(wn |wn−1 , α, β) = N (wn |(1 − β)wn−1 , βαI) p(v ) = InvGamma(v |1, 1) p(α) = InvGamma(α|1, 1)

I

p(β/(1 − β)) = InvGamma(β/(1 − β)|1, 1) Classical model corresponds to I

Posterior mean for wn , for some magically selected α and v

I

Particular case of Kalman filter

I

Contrived assumptions needed to represent exponentially weighted RLS in this framework. I

Hints that it might not make sense, see [KRLST]. 5/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Updating classical machine learning (IV/V) I

Classical algorithm: Principal Component Analysis

I

Bayesian model p(xn |yn , W, v ) = N (xn |W> yn , v I) p(yn ) = N (yn |0, I) p([W]d-th col ) = N (wd |0, diag([α1 , . . . , αD ]) p(v ) = InvGamma(v |1, 1) p(αd ) = InvGamma(v |1, 1)

I

Classical model corresponds to I

Maximum likelihood for p(xn |W, v ) when v → 0, with W product of orthogonal and ordered diagonal matrix

I

Restrictions on W make it unique, but don’t change the model

I

New possibilities: What if we do max. lik. for p(xn |{yn }, v )? 6/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Updating classical machine learning (V/V)

I

Classical algorithm: Support Vector Machines

I

Bayesian model I

See: Sollich, P. (2002). Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities. Machine Learning, 46:21-52.

I

Ingenious approach, but not very natural (involves using three classes to solve a binary problem)

7/27

Contents

Towards a modern machine learning

Probabilistic programming languages

Infer.NET The software stack: A digression The modeling language

References

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Some observations According to the previous slides I

Most ML tools have a Bayesian model description

I

Full description takes a few lines

I

The type of inference (ML, MAP, point estimates, full Bayesian posterior) is independent of the model I

Though the tractability of each does depend on the model

In the process of creating new ML tools I

What makes a new ML tool worth is: A novel model

I

What we spend most time working on is: Making inference tractable on the new model

8/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

The idea Probabilistic programming I

Define a language to describe Bayesian models (“programs”)

I

Create some “compiler” that understands those programs and generates inference engines for them

The new workflow (emphasis is on model design) I

Spend more time thinking about the model

I

Program it (just a few lines!)

I

Optional: Sample data from the model

I

Feed data to the inference engine and assess the results

I

Model wasn’t that good, do it over 9/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Programming paradigms (I/II) A random assortment of them: I

Imperative (Matlab)

I

Procedural (C)

I

Object oriented (C++)

I

Declarative (SQL)

I

Functional (Ocaml, F#)

I

Metaprogramming (LISP)

I

Domain specific language (Spice)

10/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Programming paradigms (II/II) For probabilistic programming: I A domain specific language may be simpler for the user, but I

Doesn’t integrate well with existing codebases

I

Doesn’t interface well with DB access, plotting capabilities, etc.

I

Functional languages with metaprogramming can be useful to write a “guest probabilistic program” within a “host programming language” (we’ll see F# examples)

I

This can be done with imperative style, but looks uglier (we’ll see Python examples)

11/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

An incomplete list of probabilistic programming languages I

BUGS: Bayesian inference using Gibbs sampling

I

HANSEI: Extends OCaml, discrete distributions only

I

Hierarchical Bayesian Compiler (HBC): Large-scale models and non-parametric process priors

I

PyMCMC: MCMC algorithms for Python classes

I

Church: Extends Scheme to describe Bayesian models

I

Infer.NET: Provides a probabilistic language within the .NET platform

See more on http://probabilistic-programming.org

12/27

Contents

Towards a modern machine learning

Probabilistic programming languages

Infer.NET The software stack: A digression The modeling language

References

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

The Java case Three big operating systems I

Linux, OSX, Windows

...and one language to rule them all I

Sun Microsystems designed Java to run on the JVM

I

...and implemented the JVM to run on linux, mac, and windows

I

platform independence, bliss for programmers

13/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

The .NET case Microsoft reacted and created the JVM counterpart: The CLR Which language was used to target the CLR? I

The .NET languages: VB.NET, C#, F#, IronPython...

I

Different languages with a common set of libraries, it is easy to interface them

Which operating systems did the CLR run on? I

Microsoft released the specification and standardized it

I

CLR-like implementations arose for linux/mac: Mono

I

...but the Windows version is always ahead, more complete, with additional tools, better tested, etc.

Microsoft tries to win in the cross-platform territory (sweet spot for developers) while still favoring its flagship product, Windows: Opposing objectives 14/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Infer.NET’s interoperability I

Infer.NET targets all .NET languages, with a focus on C#, F#, and IronPython

I

It can be used on Mono (Mac/Linux) or the CLR (Windows) (better experience/less buggy on the latter)

I

The .NET languages have a growing set of tools for scientific computing, but nowhere near Matlab yet

I

IronPython cannot use Numpy/Scipy/Matplotlib natively

I

A new tool called Sho provides an IronPython shell with Matlab-like capabilities (but Windows only)

15/27

Contents

Towards a modern machine learning

Probabilistic programming languages

Infer.NET The software stack: A digression The modeling language

References

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Using Infer.NET Infer.NET provides: I A probabilistic modeling language embedded in other languages I

I

F# allows a more natural embedding

Compilation to three inference engines I

EP: Approximation includes all non-zero probability points

I

VB: Approximation avoids zero probability points

I

MCMC: Gibbs sampling, slower

Let’s browse Microsoft Research examples and create a new model http://research.microsoft.com/en-us/um/cambridge/ projects/infernet/

16/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Two coins (F#)

17/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Two coins (IronPython)

18/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Learning a Gaussian (F#)

19/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Learning a Gaussian (IronPython)

20/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Truncated Gaussian (F#)

21/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Truncated Gaussian (IronPython)

22/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Gaussian Mixture (F#)

23/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Gaussian Mixture (IronPython)

24/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

k-means (F#)

[Demo]

25/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

Conclusions

I

Probabilistic programming is in its infancy

I

We might produce alternative language definitions/implementations

I

We can leverage it to test new models faster

I

We can build custom models on the fly

26/27

Towards a modern machine learning

Probabilistic programming languages

Infer.NET

References

References I [BayesSVM] Sollich, P. (2002). Bayesian Methods for Support Vector

Machines: Evidence and Predictive Class Probabilities. Machine Learning, 46:21-52. I [ProbProg] The probabilistic programming wiki

http://probabilistic-programming.org/wiki/Home I [InferNET] T. Minka, J. Winn, J. Guiver, and D. Knowles Infer.NET 2.5,

Microsoft Research Cambridge, 2012. http://research.microsoft.com/infernet I [KRLST] M. L´ azaro-Gredilla, S. Van Vaerenbergh, and I. Santamar´ıa. “A

Bayesian approach to tracking with kernel recursive least-squares”, MLSP 2011.

27/27