Open-Source AI Breakthrough 2x ChatGPT-16k | Meta AI LLaMA LLM Context Size 32k, and up to 600x

February 16, 2024

7 views

3 min read

Cinema Mode

Details in the following research paper from Meta AI which extends the context window size up to 600x for models with rotary position encoding.

Title: EXTENDING CONTEXT WINDOW OF LARGE LANGUAGE MODELS VIA POSITION INTERPOLATION

Arxiv link: https://arxiv.org/pdf/2306.15595.pdf

This work is done by:
Shouyuan Chen
Sherman Wong
Liangjian Chen
Yuandong Tian

Meta Platforms Inc.

Summary
They say, We present Position Interpolation that extends the context window sizes of rotary position encoder (RoPE) based pretrained Large Language Models such as LLaMA models from 2k to up to 32k tokens with minimal finetuning (that is, within 1000 steps). And they do this while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7 Billion to 65 Billion parameters.

Meanwhile, the extended model by Position Interpolation preserves quality relatively well on tasks within its original context window. And the new approach here that allows rotary position encoders to extend context window without degrading performance is written here. To achieve this goal, Position Interpolation linearly down scales the input position indices to match the original context window size, rather than extrapolating beyond the trained context length which may lead to catastrophically high attention scores that completely ruin the self-attention mechanism.

And Finally, referring to the stability of the attention score, they say , “Our theoretical study shows that the upper bound of interpolation is at least approximately 600 times smaller than that of extrapolation, further demonstrating its stability. The Model extended via Position Interpolation retains its original architecture and can reuse most pre-existing optimization and infrastructure”.

The Results:

1. Position Interpolation can easily enable very long context windows (for example, 32K tokens), requiring only fine tuning for 1000 steps on the Pile to achieve a good quality.
The cost of fine-tuning is negligible compared to the pre-training costs. This confirms our hypothesis that it is relatively easy for the models to adapt to interpolated position encodings.

2. Position Interpolation generates strong models that can effectively make use of much extended context window. We show that models extended by Position Interpolation enjoy significant perplexity gains from greatly extended context windows for text modeling, and we show that the perplexity reduces graceful with the enlargement of context windows.
We also applied Position Interpolation in a long text summarization task, and demonstrate competitive performances.

3. Position Interpolation preserves model quality relatively well for tasks within its original context window sizes. We present a variety of evaluation results for the extended LLaMA models on the original LLaMA benchmark. Compared with original LLaMA models, the extended LLaMA models saw a minor degradation on several standard benchmarks within a 2048 token limit.”

#ainews
#生成式ai
#生成ai
#gemini
#mozaicml #databricks #xgen7b #stabilityai #stablediffusion #inflectionai #metaai #wizardcoder
#superhot #wizardlm #wizardcoder #laion #googleai #googlebrain #deepmind #openai #samaltman #sundirpachai #markzuckerberg #opensource #huggingface #gptengineer #xgen7b #exllama #privategpt
#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience
#IntelligentSystems
#Automation
#TechInnovation

Open-Source AI Breakthrough 2x ChatGPT-16k | Meta AI LLaMA LLM Context Size 32k, and up to 600x

Add comment

Cancel reply

Categories

All Topics

210,000 CODERS lost jobs as NVIDIA released NEW coding language.

Kurzweil: AI will be smarter than all humans combined by 2029

The AI Revolution: Will Robots Take Your Job?

Artificial Intelligence | 60 Minutes Full Episodes

The A.I. Dilemma – March 9, 2023

In the Age of AI (full documentary) | FRONTLINE

Open-Source AI Breakthrough 2x ChatGPT-16k | Meta AI LLaMA LLM Context Size 32k, and up to 600x

You may also like

Add comment

Categories

All Topics