Self extend
Enhancing LLMs with Self-Extend
Self-Extend offers an innovative approach to increase the context window of Large Language Models (LLMs) without the usual need for re-tuning. This method adapts the attention mechanism during the inference phase and eliminates the necessity for additional training or fine-tuning.
For in-depth technical insights, refer to their research paper.
Activating Self-Extend for LLMs
To activate the Self-Extend feature while loading your model, use the following command:
Enable Self-Extend
curl http://localhost:3928/inferences/llamacpp/loadmodel \
-H 'Content-Type: application/json' \
-d '{
"llama_model_path": "/path/to/your_model.gguf",
"ctx_len": 8192,
"grp_attn_n": 4,
"grp_attn_w": 2048,
}'
Note:
- For optimal performance,
grp_attn_w
should be as large as possible, but smaller than the training context length. - Setting
grp_attn_n
between 2 to 4 is recommended for peak efficiency. Higher values may result in increased incoherence in output.