The RedPajama project aims to create a leading set of fully open source large language models. Currently, the project has completed the first step, successfully replicating more than 1.2 trillion data tokens from the LLaMA training dataset. The project is jointly developed by Together, Ontocord.ai, ETH DS3Lab, Stanford University CRFM, Hazy Research and MILA Quebec AI Institute.
RedPajama consists of three main components: pre-training data, basic model, and instruction tuning data and model.
#RedPajama #Homepage #Documentation #Downloads #Large #Language #Model #Development details