Tweeted By @srush_nlp
New Preprint: Diff Pruning (https://t.co/c4yTd7s47W) by Demi Guo / Yoon Kim Lab (😀)
— Sasha Rush (@srush_nlp) December 15, 2020
How many BERT parameters do you really need to change during fine-tuning? Turns out the answer is 0.5%
Allows new task adaption by shipping extremely small param diff's pic.twitter.com/ZkSQfvBUvg