VoladorLuYu 's Collections Efficient LLM
updated
Medusa: Simple LLM Inference Acceleration Framework with Multiple
Decoding Heads
Paper
• 2401.10774
• Published • 60
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper
• 2401.06761
• Published • 1
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention
and Distributed KVCache
Paper
• 2401.02669
• Published • 17
MambaByte: Token-free Selective State Space Model
Paper
• 2401.13660
• Published • 59
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Paper
• 2401.15077
• Published • 20
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
Paper
• 2401.07324
• Published • 3
Hierarchical State Space Models for Continuous Sequence-to-Sequence
Modeling
Paper
• 2402.10211
• Published • 13
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published • 116
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper
• 2402.13720
• Published • 7
Paper
• 2402.13144
• Published • 100
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper
• 2401.18058
• Published • 24
LongHeads: Multi-Head Attention is Secretly a Long Context Processor
Paper
• 2402.10685
• Published • 1
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
• 2401.01325
• Published • 27
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper
• 2401.06951
• Published • 26
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding
Extremely Long Sequences with Training-Free Memory
Paper
• 2402.04617
• Published • 6
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper
• 2402.11131
• Published • 42
Towards Optimal Learning of Language Models
Paper
• 2402.17759
• Published • 18
When Scaling Meets LLM Finetuning: The Effect of Data, Model and
Finetuning Method
Paper
• 2402.17193
• Published • 26
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published • 189
DenseMamba: State Space Models with Dense Hidden Connection for
Efficient Large Language Models
Paper
• 2403.00818
• Published • 19
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Paper
• 2307.02486
• Published • 82
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Paper
• 2403.09919
• Published • 21
DiJiang: Efficient Large Language Models through Compact Kernelization
Paper
• 2403.19928
• Published • 12
ReFT: Representation Finetuning for Language Models
Paper
• 2404.03592
• Published • 101
Rethinking Optimization and Architecture for Tiny Language Models
Paper
• 2402.02791
• Published • 13
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
• 2402.14905
• Published • 134
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
• 2404.02258
• Published • 107
Pre-training Small Base LMs with Fewer Tokens
Paper
• 2404.08634
• Published • 36
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and
Training Strategies
Paper
• 2404.08197
• Published • 29
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published • 94
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published • 66
Skill-it! A Data-Driven Skills Framework for Understanding and Training
Language Models
Paper
• 2307.14430
• Published • 3
Compression Represents Intelligence Linearly
Paper
• 2404.09937
• Published • 28
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published • 69
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
• 2405.12130
• Published • 50
SLAB: Efficient Transformers with Simplified Linear Attention and
Progressive Re-parameterized Batch Normalization
Paper
• 2405.11582
• Published • 17
How Abilities in Large Language Models are Affected by Supervised
Fine-tuning Data Composition
Paper
• 2310.05492
• Published • 2
The Instruction Hierarchy: Training LLMs to Prioritize Privileged
Instructions
Paper
• 2404.13208
• Published • 40
Unlocking Continual Learning Abilities in Language Models
Paper
• 2406.17245
• Published • 30