LoRA is an abbreviation for Low-Rank Adaptation of Large Language Models, which is the low-rank adaptation of large language models. It freezes the weights of the pre-trained model and injects a trainable rank factorization matrix into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.
Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirements by 3 times. LoRA performs on par or better than fine-tuning in terms of model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, higher training throughput, and, unlike adapters, no additional inference latency.
Using RoBERTa (Liu et al., 2019) base and large and DeBERTa (He et al., 2020) XXL 1.5B achieved comparable or better results than fully fine-tuned on the GLUE benchmark, while only training and storing A small set of parameters.
Click on the numbers below to download the RoBERTa and DeBERTa LoRA checkpoints
On GPT-2, LoRA outperforms full fine-tuning and other efficient tuning methods such as adapter (Houlsby et al., 2019) and prefix tuning (Li and Liang, 2021). Here are the evaluations of the E2E NLG Challenge, DART and WebNLG:
#LoRA #Homepage #Documentation #Downloads #Rank #Adaptation #Large #Language #Models #Development details