Tweeted By @Tim_Dettmers
We release LLM.int8(), the first 8-bit inference method that saves 2x memory and does not degrade performance for 175B models by exploiting emergent properties. Read More:
— Tim Dettmers (@Tim_Dettmers) August 17, 2022
Paper: https://t.co/eNpinXS0Z5
Software: https://t.co/hBuVyQhLqS
Emergence: https://t.co/oPGRhACNEe pic.twitter.com/vNWxrDHlOh