Turing-NLG: A 17-billion-parameter language model
— hardmaru (@hardmaru) February 11, 2020
“Any model with more than 1.3B parameters cannot fit into a single GPU (even one with 32GB memory)… The resulting T-NLG model has 78 Transformer layers with a hidden size of 4256 and 28 attention heads.”https://t.co/bRjEacrZma