Uber Manifold

  

A Model-Agnostic Visual Debugging Tool for Machine Learning at Uber

Convolution is all you need.

  

Pay Less Attention with Lightweight and Dynamic Convolutions

AMI Corpus

  

The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings. For a gentle introduction to the corpus, see the corpus overview. To access the data, follow the directions given there. Around two-thirds of the data has been elicited using a scenario in which the participants play different roles in a design team, taking a design project from kick-off to completion over the course of a day. The rest consists of naturally occurring meetings in a range of domains. Detailed information can be found in the documentation section.

Towards Federated Learning at Scale: System Design

  

Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions.

ProteinNet

  

ProteinNet is a standardized data set for machine learning of protein structure. It provides protein sequences, structures (secondary and tertiary), multiple sequence alignments (MSAs), position-specific scoring matrices (PSSMs), and standardized training / validation / test splits.

https://github.com/aqlaboratory/proteinnet

Flickr-Faces-HQ (FFHQ)

  

Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

Responsible Data Science

  

The first wave of data science focused on accuracy and efficiency – on what we can do with data. The second wave focuses on responsibility – on what we should and shouldn’t do. Irresponsible use of data science can cause harm on an unprecedented scale. Algorithmic changes in search engines can sway elections and incite violence; irreproducible results can influence global economic policy;

Fixup Initialization: Residual Learning Without Normalization

  

They manage to train deep nets (10k layers!) w/o BatchNorm, by careful init scaling & initializing the 2nd residual conv to 0.

Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet

  

Deep Neural Networks (DNNs) excel on many complex perceptual tasks but it has proven notoriously difficult to understand how they reach their decisions. We here introduce a high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain.

The Evolved Transformer

  

Goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolutionary architecture search with warm starting by seeding our initial population with the Transformer. T