30 Mar 2019, Prathyush SP
  
JAX’s autodiff is very general. It can calculate gradients of numpy functions, differentiating them with respect to nested lists, tuples and dicts. It can also calculate gradients of gradients and even work with complex numbers!
30 Mar 2019, Prathyush SP
  
They show that random search of architectures is a strong baseline for architecture search. In fact, random search gets near state-of-the-art results on PTB (RNNs) and CIFAR-10 (ConvNets).
28 Mar 2019, Prathyush SP
  
Most artificial networks today rely on dense representations, whereas biological networks rely on sparse representations. In this paper we show how sparse representations can be more robust to noise and interference, as long as the underlying dimensionality is sufficiently high.
27 Mar 2019, Prathyush SP
  
This document is summarised in the table below. It shows the linear models underlying common parametric and “non-parametric” tests. Formulating all the tests in the same language highlights the many similarities between them.
25 Mar 2019, Prathyush SP
  
Inspecting gradient magnitudes in context can be a powerful tool to see when recurrent units use short-term or long-term contextual understanding
23 Mar 2019, Prathyush SP
  
I finally got around to submitting my thesis. The thesis touches on the four areas of transfer learning that are most prominent in current Natural Language Processing (NLP): domain adaptation, multi-task learning, cross-lingual learning, and sequential transfer learning.
22 Mar 2019, Prathyush SP
  
It is 20-100x faster than prior methods, with better final performance, using soft actor-critic and order-invariant context embedding:
22 Mar 2019, Prathyush SP
  
“Best GAN samples ever yet? Very impressive ICLR submission! BigGAN improves Inception Scores by >100.”
The above Tweet is from renowned Google DeepMind research scientist Oriol Vinyals. It was retweeted last week by Google Brain researcher and “Father of Generative Adversarial Networks” Ian Goodfellow, and picked up momentum and praise from AI researchers on social media.
22 Mar 2019, Prathyush SP
  
If you are like me, entering into the field of deep learning with experience in traditional machine learning, you may often ponder over this question: Since a typical deep neural network has so many parameters and training error can easily be perfect, it should surely suffer from substantial overfitting. How could it be ever generalized to out-of-sample data points?
21 Mar 2019, Prathyush SP
  
Data parallelism can improve the training of #NeuralNetworks, but how to obtain the most benefit from this technique isn’t obvious. Check out new research that explores different architectures, batch sizes, and datasets to optimize training efficiency