Tweeted By @jaschasd

on 2020-11-11 (UTC)
learning survey research

A simple prescription that will improve your models: When using LayerNorm, do mean subtraction *before* rather than after the affine transformation.

This, and an in-depth empirical investigation of statistical properties of common normalizers in https://t.co/poAWNmpSah pic.twitter.com/Ojcj0hsnTM
— Jascha Sohl-Dickstein (@jaschasd) November 11, 2020

Tweeted By @jaschasd

Tags