The JAX Autodiff Cookbook

  

JAX’s autodiff is very general. It can calculate gradients of numpy functions, differentiating them with respect to nested lists, tuples and dicts. It can also calculate gradients of gradients and even work with complex numbers!

Random Search - NAS!

  

They show that random search of architectures is a strong baseline for architecture search. In fact, random search gets near state-of-the-art results on PTB (RNNs) and CIFAR-10 (ConvNets).

How Can We Be So Dense?

  

Most artificial networks today rely on dense representations, whereas biological networks rely on sparse representations. In this paper we show how sparse representations can be more robust to noise and interference, as long as the underlying dimensionality is sufficiently high.

Common statistical tests are linear models (or: how to teach stats)

  

This document is summarised in the table below. It shows the linear models underlying common parametric and “non-parametric” tests. Formulating all the tests in the same language highlights the many similarities between them.

Visualizing memorization in RNNs

  

Inspecting gradient magnitudes in context can be a powerful tool to see when recurrent units use short-term or long-term contextual understanding

Neural Transfer Learning for NLP

  

I finally got around to submitting my thesis. The thesis touches on the four areas of transfer learning that are most prominent in current Natural Language Processing (NLP): domain adaptation, multi-task learning, cross-lingual learning, and sequential transfer learning.

PEARL: Meta-RL

  

It is 20-100x faster than prior methods, with better final performance, using soft actor-critic and order-invariant context embedding:

BIGGan!

  

“Best GAN samples ever yet? Very impressive ICLR submission! BigGAN improves Inception Scores by >100.” The above Tweet is from renowned Google DeepMind research scientist Oriol Vinyals. It was retweeted last week by Google Brain researcher and “Father of Generative Adversarial Networks” Ian Goodfellow, and picked up momentum and praise from AI researchers on social media.

Are Deep Neural Networks Dramatically Overfitted?

  

If you are like me, entering into the field of deep learning with experience in traditional machine learning, you may often ponder over this question: Since a typical deep neural network has so many parameters and training error can easily be perfect, it should surely suffer from substantial overfitting. How could it be ever generalized to out-of-sample data points?

Measuring the Effects of Data Parallelism on Neural Network Training

  

Data parallelism can improve the training of #NeuralNetworks, but how to obtain the most benefit from this technique isn’t obvious. Check out new research that explores different architectures, batch sizes, and datasets to optimize training efficiency