GLM homepage, documentation and downloads – a general pre-training framework for natural language understanding and generation – Development details

GLM (General Language Model) It is a general-purpose language model pre-trained with autoregressive filling-in-the-blank targets launched by Tsinghua University, which can be fine-tuned for various natural language understanding and generation tasks.

GLM improves on gap-fill pre-training by adding 2D positional encoding and allowing prediction spans in arbitrary order, resulting in better performance than BERT and T5 on NLU tasks. At the same time, GLM can be pre-trained for different types of tasks by changing the number and length of blanks. GLM outperforms BERT, T5, and GPT given the same model size and data on a wide range of tasks across NLU, conditional, and unconditional generation, and achieves 1.25x BERT Larger parameters from a single pretrained model The best performance of , indicating its generalizability to different downstream tasks.

For a detailed description of GLM, please refer to the paper GLM: General Language Model Pretraining with Autoregressive Blank Infilling (ACL 2022)

ChatGLM-6B It is optimized for Chinese QA and dialogue on the basis of the GLM framework.

pre-trained model

Available from OneDrive or Tsinghua-Cloud Download the pretrained model used in the paper.














nameParamsLanguageCorpusobjectiveFileConfig
GLM-Base110MEnglishWiki+BookTokenglm-base-blank.tar.bz2model_blocklm_base.sh
GLM-Large335MEnglishWiki+BookTokenglm-large-blank.tar.bz2model_blocklm_large.sh
GLM-Large-Chinese335MChineseWuDao CorporaToken+Sent+Docglm-large-chinese.tar.bz2model_blocklm_large_chinese.sh
GLM-Doc335MEnglishWiki+BookToken+Docglm-large-generation.tar.bz2model_blocklm_large_generation.sh
GLM-410M410MEnglishWiki+BookToken+Docglm-1.25-generation.tar.bz2model_blocklm_1.25_generation.sh
GLM-515M515MEnglishWiki+BookToken+Docglm-1.5-generation.tar.bz2model_blocklm_1.5_generation.sh
GLM-RoBERTa335MEnglishROBERTaTokenglm-roberta-large-blank.tar.bz2model_blocklm_roberta_large.sh
GLM-2B2BEnglishPileToken+Sent+Docglm-2b.tar.bz2model_blocklm_2B.sh
GLM-10B10BEnglishPileToken+Sent+Docdownloadmodel_blocklm_10B.sh
GLM-10B-Chinese10BChineseWuDao CorporaToken+Sent+Docdownloadmodel_blocklm_10B_chinese.sh

Unzip the downloaded file into a local folder and set in the corresponding scriptCHECKPOINT_PATHis the folder path.

result

SuperGLUE

Validation set, single model, single task fine-tuning






modelCOPAWSCRTEWiCCBMultiRCBool QReCoRD
GLM-10B98.095.293.175.798.7/98.288.1/63.388.794.4/94.0
DeBERTa-XXLarge-v297.093.587.8/63.688.394.1/93.7

Seq2Seq

CNN/Daily Mail (test set, no extra data used)








modelROUGE-1ROUGE-2ROUGE-L
GLM-10B44.721.441.4
T5-11B43.521.640.7
PEGASUS-Large44.221.541.4
BART-Large44.221.340.9

XSum (test set, no additional data used)







modelROUGE-1ROUGE-2ROUGE-L
GLM-10B48.925.740.4
PEGASUS-Large47.224.639.3
BART-Large45.122.337.3

Language Modeling

test set, zero sample









modelLAMBADA (accuracy)Wikitext103 (perplexity)
GLM-10B (bi)72.3511.33
GLM-10B (uni)67.1812.22
GPT-252.6617.48
Megatron-LM (8.3B)66.5110.81
Turing-NLG67.9810.21

#GLM #homepage #documentation #downloads #general #pretraining #framework #natural #language #understanding #generation #Development details

Leave a Reply

Your email address will not be published. Required fields are marked *