Bayesian Layers: A Module for Neural Network Uncertainty

As demonstration, we fit a 10-billion parameter ‘Bayesian Transformer’ on 512 TPUv2 cores, which replaces attention layers with their Bayesian counterpart. 🔥

For more details, visit the source.