Chris Martin - MPT-7B

While I wish development efforts would coalesce around a single, capable, open source language model, it is undoubtedly interesting to see the variety of new entrants in this space.

Mosaic:

Introducing MPT-7B, the latest entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B… we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+

The base model, the instruction-tuned model, and the 65k model are all commercially licensed. The chat model has a Creative Commons non-commercial license because it was finetuned on data from OpenAI.

I tried out the chat model on Hugging Face. At first glance, it seems comparable to other open source models I have previously used.

The most surprising announcement here is certainly StoryWriter-65k. The (still unreleased) large context-length version of GPT-4 will be able to handle 32,000 tokens—less than half of what is possible here.