stereoplegic 's Collections Long context
updated
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper
• 2310.15494
• Published • 2
A Long Way to Go: Investigating Length Correlations in RLHF
Paper
• 2310.03716
• Published • 10
YaRN: Efficient Context Window Extension of Large Language Models
Paper
• 2309.00071
• Published • 82
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper
• 2308.10882
• Published • 1
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language
Models
Paper
• 2308.16137
• Published • 41
Scaling Transformer to 1M tokens and beyond with RMT
Paper
• 2304.11062
• Published • 3
Investigating Answerability of LLMs for Long-Form Question Answering
Paper
• 2309.08210
• Published • 15
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
Long Sequence Transformer Models
Paper
• 2309.14509
• Published • 21
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
• 2309.12307
• Published • 89
PoSE: Efficient Context Window Extension of LLMs via Positional
Skip-wise Training
Paper
• 2309.10400
• Published • 26
CLEX: Continuous Length Extrapolation for Large Language Models
Paper
• 2310.16450
• Published • 10
Code Llama: Open Foundation Models for Code
Paper
• 2308.12950
• Published • 29
CAT-LM: Training Language Models on Aligned Code And Tests
Paper
• 2310.01602
• Published • 1
LongBench: A Bilingual, Multitask Benchmark for Long Context
Understanding
Paper
• 2308.14508
• Published • 2
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper
• 2309.11568
• Published • 11
Paper
• 2309.03450
• Published • 8
Effective Long-Context Scaling of Foundation Models
Paper
• 2309.16039
• Published • 31
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios
via Prompt Compression
Paper
• 2310.06839
• Published • 4
Context Compression for Auto-regressive Transformers with Sentinel
Tokens
Paper
• 2310.08152
• Published • 1
Learning to Compress Prompts with Gist Tokens
Paper
• 2304.08467
• Published • 3
Long-range Language Modeling with Self-retrieval
Paper
• 2306.13421
• Published • 17
Can Retriever-Augmented Language Models Reason? The Blame Game Between
the Retriever and the Language Model
Paper
• 2212.09146
• Published • 3
Knowledge-Augmented Reasoning Distillation for Small Language Models in
Knowledge-Intensive Tasks
Paper
• 2305.18395
• Published • 1
LLM+P: Empowering Large Language Models with Optimal Planning
Proficiency
Paper
• 2304.11477
• Published • 3
SayCanPay: Heuristic Planning with Large Language Models using Learnable
Domain Knowledge
Paper
• 2308.12682
• Published • 2
Combiner: Full Attention Transformer with Sparse Computation Cost
Paper
• 2107.05768
• Published • 1
Paper
• 2203.08913
• Published • 2
Adapting Language Models to Compress Contexts
Paper
• 2305.14788
• Published • 1
Lost in the Middle: How Language Models Use Long Contexts
Paper
• 2307.03172
• Published • 44
L-Eval: Instituting Standardized Evaluation for Long Context Language
Models
Paper
• 2307.11088
• Published • 5
A Unified View of Long-Sequence Models towards Modeling Million-Scale
Dependencies
Paper
• 2302.06218
• Published • 1
Blockwise Parallel Transformer for Long Context Large Models
Paper
• 2305.19370
• Published • 3
Blockwise Self-Attention for Long Document Understanding
Paper
• 1911.02972
• Published • 1
LSG Attention: Extrapolation of pretrained Transformers to long
sequences
Paper
• 2210.15497
• Published • 1
Efficient Long-Text Understanding with Short-Text Models
Paper
• 2208.00748
• Published • 1
Cure the headache of Transformers via Collinear Constrained Attention
Paper
• 2309.08646
• Published • 14
Bird-Eye Transformers for Text Generation Models
Paper
• 2210.03985
• Published • 1
Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture
Paper
• 2310.03052
• Published • 4
Efficient Streaming Language Models with Attention Sinks
Paper
• 2309.17453
• Published • 14
LightSeq: Sequence Level Parallelism for Distributed Training of Long
Context Transformers
Paper
• 2310.03294
• Published • 2
Ultra-Long Sequence Distributed Transformer
Paper
• 2311.02382
• Published • 6
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper
• 2310.10638
• Published • 30
Retrieval meets Long Context Large Language Models
Paper
• 2310.03025
• Published • 4
AWESOME: GPU Memory-constrained Long Document Summarization using Memory
Mechanism and Global Salient Content
Paper
• 2305.14806
• Published • 1
mLongT5: A Multilingual and Efficient Text-To-Text Transformer for
Longer Sequences
Paper
• 2305.11129
• Published • 2
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Paper
• 2112.07916
• Published • 2
Unleashing Infinite-Length Input Capacity for Large-scale Language
Models with Self-Controlled Memory System
Paper
• 2304.13343
• Published • 1
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing
Important Tokens
Paper
• 2305.04241
• Published • 1
Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation
of the Reversal Curse
Paper
• 2311.07468
• Published • 1
Never Lost in the Middle: Improving Large Language Models via Attention
Strengthening Question Answering
Paper
• 2311.09198
• Published • 3
SpanDrop: Simple and Effective Counterfactual Learning for Long
Sequences
Paper
• 2208.02169
• Published • 1
System 2 Attention (is something you might need too)
Paper
• 2311.11829
• Published • 43
Attention Sorting Combats Recency Bias In Long Context Language Models
Paper
• 2310.01427
• Published • 1
CoLT5: Faster Long-Range Transformers with Conditional Computation
Paper
• 2303.09752
• Published • 2
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
• 2312.12742
• Published • 13
Axiomatic Preference Modeling for Longform Question Answering
Paper
• 2312.02206
• Published • 10
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long
Documents
Paper
• 2312.01279
• Published • 6
Extending Context Window of Large Language Models via Semantic
Compression
Paper
• 2312.09571
• Published • 16
Zebra: Extending Context Window with Layerwise Grouped Local-Global
Attention
Paper
• 2312.08618
• Published • 13
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper
• 2401.18058
• Published • 24
Extending LLMs' Context Window with 100 Samples
Paper
• 2401.07004
• Published • 16
The What, Why, and How of Context Length Extension Techniques in Large
Language Models -- A Detailed Survey
Paper
• 2401.07872
• Published • 2
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper
• 2401.06951
• Published • 26
Exploring Transformer Extrapolation
Paper
• 2307.10156
• Published • 1
Gated Linear Attention Transformers with Hardware-Efficient Training
Paper
• 2312.06635
• Published • 9
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
• 2401.01325
• Published • 27
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published • 116
Training-Free Long-Context Scaling of Large Language Models
Paper
• 2402.17463
• Published • 24
LOCOST: State-Space Models for Long Document Abstractive Summarization
Paper
• 2401.17919
• Published
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published • 66
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Paper
• 2402.18508
• Published
HMT: Hierarchical Memory Transformer for Long Context Language
Processing
Paper
• 2405.06067
• Published • 2
LLoCO: Learning Long Contexts Offline
Paper
• 2404.07979
• Published • 22
LongHeads: Multi-Head Attention is Secretly a Long Context Processor
Paper
• 2402.10685
• Published • 1
XL3M: A Training-free Framework for LLM Length Extension Based on
Segment-wise Inference
Paper
• 2405.17755
• Published • 1
Base of RoPE Bounds Context Length
Paper
• 2405.14591
• Published
SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context
Large Language Models
Paper
• 2406.05678
• Published • 1
LongSkywork: A Training Recipe for Efficiently Extending Context Length
in Large Language Models
Paper
• 2406.00605
• Published • 2
Equipping Transformer with Random-Access Reading for Long-Context
Understanding
Paper
• 2405.13216
• Published • 1
THEANINE: Revisiting Memory Management in Long-term Conversations with
Timeline-augmented Response Generation
Paper
• 2406.10996
• Published • 35
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding
Extremely Long Sequences with Training-Free Memory
Paper
• 2402.04617
• Published • 6
Farewell to Length Extrapolation, a Training-Free Infinite Context with
Finite Attention Scope
Paper
• 2407.15176
• Published • 3
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Paper
• 2407.15891
• Published
Writing in the Margins: Better Inference Pattern for Long Context
Retrieval
Paper
• 2408.14906
• Published • 144
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive
Study and Hybrid Approach
Paper
• 2407.16833
• Published
ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Paper
• 2408.15496
• Published • 12
General-purpose, long-context autoregressive modeling with Perceiver AR
Paper
• 2202.07765
• Published
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long
Sequences Training
Paper
• 2407.15892
• Published
ContextCite: Attributing Model Generation to Context
Paper
• 2409.00729
• Published • 14
UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs
Paper
• 2406.18173
• Published
MemLong: Memory-Augmented Retrieval for Long Text Modeling
Paper
• 2408.16967
• Published • 3
Efficient LLM Training and Serving with Heterogeneous Context Sharding
among Attention Heads
Paper
• 2407.17678
• Published
E2LLM: Encoder Elongated Large Language Models for Long-Context
Understanding and Reasoning
Paper
• 2409.06679
• Published • 4
InfiniGen: Efficient Generative Inference of Large Language Models with
Dynamic KV Cache Management
Paper
• 2406.19707
• Published
ACON: Optimizing Context Compression for Long-horizon LLM Agents
Paper
• 2510.00615
• Published • 35
Global Context Compression with Interleaved Vision-Text Transformation
Paper
• 2601.10378
• Published • 2