Open-Source AI Breakthrough 2x ChatGPT-16k | Meta AI LLaMA LLM Context Size 32k, and up to 600x
Title: EXTENDING CONTEXT WINDOW OF LARGE LANGUAGE MODELS VIA POSITION INTERPOLATION
Arxiv link: https://arxiv.org/pdf/2306.15595.pdf
This work is done by:
Shouyuan Chen
Sherman Wong
Liangjian Chen
Yuandong Tian
Meta Platforms Inc.
Summary
They say, We present Position Interpolation that extends the context window sizes of rotary position encoder (RoPE) based pretrained Large Language Models such as LLaMA models from 2k to up to 32k tokens with minimal finetuning (that is, within 1000 steps). And they do this while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7 Billion to 65 Billion parameters.
Meanwhile, the extended model by Position Interpolation preserves quality relatively well on tasks within its original context window. And the new approach here that allows rotary position encoders to extend context window without degrading performance is written here. To achieve this goal, Position Interpolation linearly down scales the input position indices to match the original context window size, rather than extrapolating beyond the trained context length which may lead to catastrophically high attention scores that completely ruin the self-attention mechanism.
And Finally, referring to the stability of the attention score, they say , “Our theoretical study shows that the upper bound of interpolation is at least approximately 600 times smaller than that of extrapolation, further demonstrating its stability. The Model extended via Position Interpolation retains its original architecture and can reuse most pre-existing optimization and infrastructure”.
The Results:
1. Position Interpolation can easily enable very long context windows (for example, 32K tokens), requiring only fine tuning for 1000 steps on the Pile to achieve a good quality.
The cost of fine-tuning is negligible compared to the pre-training costs. This confirms our hypothesis that it is relatively easy for the models to adapt to interpolated position encodings.
2. Position Interpolation generates strong models that can effectively make use of much extended context window. We show that models extended by Position Interpolation enjoy significant perplexity gains from greatly extended context windows for text modeling, and we show that the perplexity reduces graceful with the enlargement of context windows.
We also applied Position Interpolation in a long text summarization task, and demonstrate competitive performances.
3. Position Interpolation preserves model quality relatively well for tasks within its original context window sizes. We present a variety of evaluation results for the extended LLaMA models on the original LLaMA benchmark. Compared with original LLaMA models, the extended LLaMA models saw a minor degradation on several standard benchmarks within a 2048 token limit.”
#ainews
#生成式ai
#生成ai
#gemini
#mozaicml #databricks #xgen7b #stabilityai #stablediffusion #inflectionai #metaai #wizardcoder
#superhot #wizardlm #wizardcoder #laion #googleai #googlebrain #deepmind #openai #samaltman #sundirpachai #markzuckerberg #opensource #huggingface #gptengineer #xgen7b #exllama #privategpt
#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience
#IntelligentSystems
#Automation
#TechInnovation
Add comment