# What is a Bayesian Neural Network?

A Bayesian neural network (BNN) refers to extending standard networks by treating the weights as random variables. Thus, training a BNN focuses on posterior inference given data. In practice, they arise naturally when priors are placed on the weights of a network. They are relatively simple beasts, and, in my opinion, far more interesting than the model itself is the approximate posterior inference associated with them.

Standard NN training via optimization is (from a probabilistic perspective) equivalent to maximum likelihood estimation (MLE) for the weights. For many reasons this is unsatisfactory. One reason is that it lacks proper theoretical justification from a probabilistic perspective: why maximum likelihood? Why just point estimates? Using MLE ignores any uncertainty that we may have in the proper weight values. From a practical standpoint, this type of training is susceptible to overfitting, as NNs often do.

One partial fix for this is to introduce regularization. From a Bayesian perspective, this is equivalent to inducing priors on the weights (say Gaussian distributions if we are using L2 regularization). Optimization in this case is akin to searching for MAP estimators rather than MLE. Again from a probabilistic perspective, this is not the right thing to do, though it certainly works well in practice.

The correct (i.e., theoretically justifiable) thing to do is posterior inference [^{1}, ^{2}],
though this is very challenging both from a modelling and computational point of view. BNNs
are neural networks that take this approach. In the past this was all but impossible, and we
had to resort to poor approximations such as Laplace’s method (low complexity) or MCMC
(long convergence, difficult to diagnose). However, lately there have been some super-interesting
results on using variational inference to do this [^{3}], and this has sparked a great deal
of interest in the area.

BNNs are important in specific settings, especially when we care about uncertainty very much. Some examples of these cases are decision making systems, (relatively) smaller data settings, Bayesian Optimization, model-based reinforcement learning and others.

*Originally a Quora answer, later a KDnuggets post*