Releases · pplmx/llm

08 Jan 12:42

github-actions

v0.0.5

2faa54f

v0.0.5 Latest

Latest

Added

SFT (Supervised Fine-tuning):
- SFTDataset for instruction tuning with input masking
- SFTDataModule for data loading
- SFTTask registered as --task sft in CLI
- Tests for all SFT components
DPO (Direct Preference Optimization):
- DPODataset handling chosen/rejected pairs
- DPODataModule for preference data loading
- DPOTask with reference model management and DPO loss
- Registered as --task dpo in CLI
- Tests for all DPO components
Continuous Batching Engine (Serving):
- src/llm/serving/engine.py with ContinuousBatchingEngine class
- Iteration-level scheduling via Scheduler and SlotAllocator
- Pre-allocated KV cache pool for efficient memory management
- Supports mixed prefill/decode batching with automatic padding
- Clean API: requires model and tokenizer instances upfront
- src/llm/serving/scheduler.py with FCFS scheduling logic
LoRA (Low-Rank Adaptation):
- src/llm/core/lora.py with LoRALinear class for parameter-efficient fine-tuning
- apply_lora(), merge_lora(), get_lora_parameters() helper functions
- Device/dtype handling for CUDA compatibility
- 17 tests covering training and weight merging
QLoRA (Quantized LoRA):
- src/llm/core/qlora.py with QLoRALinear class
- NF4 4-bit quantization for base weights (~4x memory reduction)
- LoRA adapters remain in fp16/bf16 for training stability
- apply_qlora() and get_qlora_parameters() helpers
RoPE (Rotary Position Embedding):
- src/llm/core/rope.py with RotaryPositionEmbedding class
- Linear, dynamic, and NTK-aware scaling methods for extended context
- apply_rotary_pos_emb(), get_rope_scaling_factor() utilities
- 15 tests
ALiBi (Attention with Linear Biases):
- src/llm/core/alibi.py with ALiBiPositionBias class
- get_alibi_slopes(), build_alibi_bias() functions
- Cached bias computation for efficiency
- 13 tests
Sliding Window Attention:
- window_size parameter in scaled_dot_product_attention
- Propagated through MultiHeadAttention, TransformerBlock, DecoderModel
- Reduces memory for long sequences by limiting attention scope
- 10 tests
KV Cache Optimization:
- src/llm/core/kv_cache.py with KVCache class for pre-allocated cache buffers
- In-place updates during autoregressive generation (avoids O(n²) memory operations)
- Integrated into MHA, TransformerBlock, DecoderModel
- Factory method KVCache.from_model_config() for easy instantiation
- Backward compatible: legacy past_key_value tuple format still works
E2E Testing Infrastructure:
- tests/e2e/ directory with comprehensive pipeline tests
- test_training.py, test_sft.py, test_dpo.py
- test_gradient_accumulation.py, test_resume_training.py
- Advanced inference and callback tests
Documentation:
- notebooks/quick_start.ipynb interactive tutorial
- Covers model building, training, inference, and advanced features

Changed

SDPA Refactoring:
- Consolidated scaled_dot_product_attention wrapper into src/llm/core/attn/sdpa.py
- Refactored MultiHeadAttention and MultiLatentAttention to use common sdpa wrapper
- Archived custom implementation to _learning/03_lab/experiments/custom_sdpa.py
Test Suite Refactoring:
- Organized test files into subdirectories (tests/training/, tests/inference/, etc.)
- Converted to functional testing style (real components over mocks)
- Added shared fixtures in tests/conftest.py
- Test count: 385 → 432
TrainingEngine:
- Support for dictionary batches in training/validation loops
- Gradient accumulation implementation
DPO Reference Model:
- Use model reconstruction instead of deepcopy for ref_model creation
Documentation:
- Added docs/README.md as documentation entry point
- Added MkDocs Material configuration (mkdocs.yml) for documentation site
- Added GitHub Actions workflow for automatic GitHub Pages deployment
- Added guide-finetuning.md (LoRA/QLoRA) and guide-inference.md (KVCache/GQA/Continuous Batching)
- Enhanced architecture.md with detailed component diagrams and data flow analysis
- Updated ROADMAP Phase 10.2 (Continuous Batching complete)

Assets 2

07 Jan 07:33

github-actions

v0.0.4

1db2890

v0.0.4

Added

Gradient Checkpointing:
- Memory-efficient training via gradient_checkpointing parameter in DecoderModel
- enable_gradient_checkpointing() / disable_gradient_checkpointing() methods
- Automatic incompatibility check with use_cache=True
E2E Pipeline Automation:
- scripts/e2e_pipeline.py for automated Train → Evaluate → Inference workflow
- src/llm/utils/e2e.py with reusable E2E core functions (E2EConfig, E2EResult, run_e2e_pipeline)
- Rich progress UI and configurable CLI options
OpenAI-Compatible Chat API (/v1/chat/completions):
- Compatible with official OpenAI Python SDK
- Streaming and non-streaming chat completions
- Bearer token authentication support
- Multi-turn conversation handling
- 8 new test cases for compatibility layer
Batch Inference:
- batch_generate function in inference.py with left-padding and batched forward pass
- BatchGenerationRequest / BatchGenerationResponse schemas
- /batch_generate API endpoint
- 3 tests for batch inference (basic, single, empty)
Request Queue and Concurrency Control:
- max_concurrent_requests and request_timeout in ServingConfig
- asyncio.Semaphore for concurrency limiting
- asyncio.timeout for request timeout handling (504 response)
CLI Entry Points:
- llm-train command for training models
- llm-serve command for starting inference server
Testing Infrastructure:
- Pytest markers using decorators: quick, slow, heavy, e2e
- MoE integration tests (6 tests for expert routing, gradient flow)
- E2E pipeline tests (full workflow, streaming consistency)
- Gradient checkpointing tests (8 tests)
- Total test count: 296 → 337
Examples Directory:
- inference_demo.py for basic text generation
- openai_client_demo.py for OpenAI SDK usage
Documentation:
- scripts/README.md documenting all available scripts
- HFTokenizer example in usage.md
- Updated root README.md with links to Examples and Scripts

Changed

Makefile Reorganization:
- make test now runs all tests by default
- make test-fast for daily development (excludes heavy/e2e)
- make test-quick for rapid iteration (~6s)
- make test-cov for CI with coverage and allure reports
- Removed redundant test-all and test-integration
CLI Standardization:
- CLI parameters changed from snake_case to kebab-case (--file-path, --batch-size)
- Replace typer with typer-slim[standard] for reduced dependencies
Code Quality Improvements:
- Translate Chinese docstrings to English in serving module
- Remove ~75 lines of redundant comments
- Simplify section comments while preserving algorithm clarity
Documentation Refactoring:
- Eliminated redundancy between README, usage.md, and development.md
- Clear document responsibility separation
- Updated all docs to use new CLI commands
- Enhanced package metadata (keywords, classifiers)
Module Exports:
- Enhanced llm/__init__.py with public API exports (DecoderModel, generate, etc.)
- Enhanced llm.serving module exports (LLMEngine, ServingConfig, OpenAI schemas)

Fixed

Removed obsolete TODO comment in engine.py
Removed duplicate num_kv_heads field in ModelConfig
Fixed MD051/link-fragments in tutorial-cpu-llm.md and faq.md
Fixed train.py task registration for lm task

Assets 2

23 Dec 06:32

github-actions

v0.0.3

a0621e7

v0.0.3

Added

Inference Serving:
- Production-ready REST API with FastAPI
- Streaming support via Server-Sent Events (SSE)
- Advanced sampling strategies (nucleus sampling/top-p, repetition penalty)
- Prometheus metrics endpoint for monitoring
- API key authentication (X-API-Key header)
- Structured logging with python-json-logger
- Real PyTorch model weights loading from checkpoint files
- Pickled tokenizer object loading support
Component Registry:
- Automatic component registration system (ComponentRegistry)
- Core components (MHA, MLP, MoE) auto-registered via side-effect imports
- Prevents "component not found" errors in simplified scripts
Data Abstraction:
- Formalized BaseTokenizer protocol
- BaseDataModule abstraction for flexible data handling
- Environment variable configuration support (e.g., LLM_TRAINING__EPOCHS)
Testing & CLI:
- --num-samples flag in train.py for rapid regression testing
- Scheduler edge case tests (test_scheduler_edge_cases.py)
- Validation logging tests (test_engine_logging.py)
- Component registry tests (test_init.py)
- Model loading verification tests
- Auto-device detection in training scripts (prioritizes CUDA)
Documentation:
- Comprehensive usage guide (docs/usage.md)
- Architecture documentation (docs/architecture.md)
- Engineering documentation (ADRs, PR templates, FAQ)
- VS Code configuration and extensions

Changed

Architecture Modernization:
- Migrated to Pydantic v2 (BaseSettings, BaseModel) for configuration
- Fully typed and validated configuration system
- CLI migration from argparse to typer for better UX
Naming Standardization:
- Unified ffn_hidden_size → intermediate_size across codebase
- Standardized input parameter x → hidden_states in forward methods
- Applied to MLP, LayerNorm, RMSNorm, DecoderModel, TransformerBlock
- Updated all 309 tests to reflect API changes
Code Quality:
- Standardized punctuation in documentation (full-width → half-width)
- Improved type hints and documentation comments
- Refactored TransformerBlock.forward for clarity

Fixed

Core Bugs:
- CosineAnnealingLR T_max calculation when epochs == warmup_epochs (ZeroDivisionError)
- TrainingEngine validation logging crash when gradient_norms is empty (IndexError)
- PAD token generation issue in inference (logits masking)
- SyntheticDataModule prefetch_factor handling with num_workers=0
- TransformerBlock shared norm instance bug (independent norm1/norm2)
- Scheduler/optimizer step order warnings in tests
- PositionalEncoding support for start_pos in incremental generation
- MLP SwiGLU operation order for numerical consistency
- Prompt truncation respecting max_seq_len with new tokens
- Auto AMP dtype resolution for CPU-only environments
Registry & Imports:
- Package auto-registration via import llm
- Component not found errors in simplified execution

Assets 2

23 Dec 06:32

github-actions

v0.0.2

676cd10

v0.0.2

0.0.3 - 2025-12-23

🚀 Features

(inference) Add simple autoregressive generation loop - (4585a2e)
(scripts) Auto-detect device in train_simple_decoder.py - (ff2ad6f)
(scripts) Add best_train.py for efficient distributed training - (b345f82)
(scripts) Add optimized_train_02.py for efficient distributed training - (e585cf2)
(scripts) Add optimized DDP training script - (02787e6)
Support real weights loading and serving enhancements - (1da2b8a)
Implement production-ready inference serving with streaming and observability - (2c21937)
Expand LLM functional test suite and fix core regressions - (60ce0cd)
Finalize architecture modernization and code quality cleanup - (7954f61)
Enhance inference performance and project quality tools - (03d640f)
Implement and integrate Mixture of Experts (MoE) - (deaed1b)
Enhance training framework with modularity and extensibility - (592e1d9)
Introduce modular training framework - (c944960)
Add a training sample - (a5417f2)
Remove something redundant - (31393e9)
Remove sys.path manipulations. - (a4e4c76)
Implement PositionalEncoding. - (f34f019)
Implement core components for a custom LLM framework - (2a4881f)
Introduce moe - (a26b0c0)

🐛 Bug Fixes

(attn) Correct QKV splitting in MultiHeadAttention - (9b5d634)
(core) Ensure distinct norm instances in TransformerBlock - (5e7f4cc)
(core/mlp) Move provided norm instance to target device/dtype to avoid device mismatch - (6a52399)
Test_engine_auto_amp_dtype case failed on cuda env - (a6406ca)
Failed to run on cpu-only env - (4faee5c)
Resolve multiple training stability and correctness issues - (334d76d)
ARG001 Unused function argument: dummy_input - (ff461df)

🚜 Refactor

(arch) Modernize architecture with registry, data abstraction, and robust config - (6f74012)
(core) Unify deep learning naming conventions - (d2c9237)
(moe) Optimize it - (3a91204)
(moe) Optimize it - (7fee39e)
(scripts) Migrate CLI scripts to typer - (f86ad16)
(scripts) Implement train_02.py with rich logging and optimized config management - (47cf2f8)
(scripts) Improve code readability and structure in optimized_train.py - (69eb300)
(scripts) Optimize and modularize PyTorch DDP training script - (e005d6a)
Consolidate environment variable handling and simplify code - (331bbdd)

📚 Documentation

(changelog) Enhance v0.0.2 release notes with detailed information - (cdd384e)
(changelog) Prepare v0.0.3 release notes - (b3bc1ab)
(llm) Add attn - (222095c)
Add usage - (039bf76)
Optimize documentation comments and standardize punctuation - (c382e34)
Add engineering docs and tooling config - (f3818d8)
Comprehensive documentation overhaul and roadmap expansion - (5bd6dac)
Update the roadmap - (e9d6176)
Add roadmap for learning - (ab81c7d)
Add ROADMAP - (c0e57d4)
Update GEMINI.md - (cd30474)
Standardize training documentation filenames to kebab-case - (de4a99c)
Update project documentation - (ebf95e6)
Streamline documentation entry point - (5b197fc)
Refactor and standardize documentation structure - (b2a46ee)
Refine the structure - (e4097d1)
Create comprehensive training framework documentation - (3ae4ace)
Add GEMINI-example.md template - (8eaf0f4)
Update GEMINI.md with commit workflow - (bdbde1a)
Add some docs about transformer - (16774c9)
Add moe - (f82b734)

🎨 Styling

Ruff - (60ab171)
Ruff - (20fb4e3)
Format code for readability and consistency - (7831aa7)
Add .markdownlint.yaml - (c98f6e1)
Format - (bb7bb41)

🧪 Testing

(core) Make device comparisons robust by comparing device.type - (b5229ed)
Increase the coverage - (85be1e7)

⚙️ Miscellaneous Tasks

(release) Use CHANGELOG.md for GitHub releases instead of git-cliff - (a0621e7)
Some minor changes - (09e3a6f)
Update the prek plugins - (6d97ec3)
Upgrade actions/checkout to v5 - (d34c441)
Update GEMINI.md - (7e36f67)
Some minor changes - (4c7e0e7)
Some minor changes - (55c12c8)
Some minor changes - (c330cec)
Add mlp vs moe - (e2478f0)
Remove something redundant - ([b962d05](https://github.com/pplmx/ll...

Contributors

renovate and google-labs-jules

Assets 2

Releases: pplmx/llm

v0.0.5

Added

Changed

Uh oh!

v0.0.4

Added

Changed

Fixed

Uh oh!

v0.0.3

Added

Changed

Fixed

Uh oh!

v0.0.2

0.0.3 - 2025-12-23

🚀 Features

🐛 Bug Fixes

🚜 Refactor

📚 Documentation

🎨 Styling

🧪 Testing

⚙️ Miscellaneous Tasks

Contributors

Uh oh!