Tweeted By @thukeg
GLM-130B reaches INT4 quantization w/ no perf degradation, allowing effective inference on 4*3090 or 8*2080 Ti GPUs, the most ever affordable GPUs required for using 100B-scale models?
— Tsinghua KEG (@thukeg) October 10, 2022
Paper: https://t.co/f2bj1N8JTN
Model weights & code & demo & lessons: https://t.co/aKZNGEDmks pic.twitter.com/kVRV0b8Y56