Self extend

Enhancing LLMs with Self-Extend

Self-Extend offers an innovative approach to increase the context window of Large Language Models (LLMs) without the usual need for re-tuning. This method adapts the attention mechanism during the inference phase and eliminates the necessity for additional training or fine-tuning.

For in-depth technical insights, refer to their research paper.

Activating Self-Extend for LLMs

To activate the Self-Extend feature while loading your model, use the following command:

Enable Self-Extend
curl http://localhost:3928/inferences/llamacpp/loadmodel \
  -H 'Content-Type: application/json' \
  -d '{
    "llama_model_path": "/path/to/your_model.gguf",
    "ctx_len": 8192,
    "grp_attn_n": 4,
    "grp_attn_w": 2048,
  }'

Note:

For optimal performance, grp_attn_w should be as large as possible, but smaller than the training context length.
Setting grp_attn_n between 2 to 4 is recommended for peak efficiency. Higher values may result in increased incoherence in output.

Enhancing LLMs with Self-Extend​

Activating Self-Extend for LLMs​

Enhancing LLMs with Self-Extend

Activating Self-Extend for LLMs