Saturday, November 24, 2018

Measuring the Effects of Data Parallelism on Neural Network Training

Important paper from Google on large batch optimization. They do impressively careful experiments measuring # iterations needed to achieve target validation error at various batch sizes. The main "surprise" is the lack of surprises. [thread]

https://arxiv.org/abs/1811.03600