Homepage
Close
Menu

Site Navigation

  • Home
  • Archive(TODO)
    • By Day
    • By Month
  • About(TODO)
  • Stats
Close
by gneubig on 2019-05-28 (UTC).

What happens when you remove most of BERT's heads? Answer: surprisingly little! Check out @pmichelX's new preprint on pruning heads from multi-head attention models, with interesting analysis and 81% inference-time accuracy gains on BERT-based models! https://t.co/XtnUoP9stc

— Graham Neubig (@gneubig) May 28, 2019
researchnlp
by diegovogeid on 2019-05-28 (UTC).

Did you see this paper (by @perez) where they proved something similar theoretically? (Transformer with a single attention head are Turing complete)

- On the Turing Completeness of Modern Neural Network Architectureshttps://t.co/DaHP1biIa8

— Diego Francisco Valenzuela Iturra (@diegovogeid) May 28, 2019
research

Tags

learning tutorial misc nlp rstats gan ethics research dataviz survey python tool security kaggle video thought bayesian humour tensorflow w_code bias dataset pytorch cv tip application javascript forecast swift golang rl jax julia gnn causal surey diffusion
© Copyright Philosophy 2018 Site Template by Colorlib