Fast delta compression algorithm for similar data chunks in pure Rust. Achieves 496 MB/s encoding and 4.4 GB/s decoding with competitive compression ratios.
A fast delta compression algorithm for similar data chunks, implemented in pure Rust.
gdelta is a Rust implementation of the GDelta algorithm by Haoliang Tan. It provides efficient delta encoding for
similar data chunks (typically 4KB - 64KB) commonly found in deduplication systems.
Key Features:
Performance:
Synthetic benchmarks (2,097 tests):
Real-world git repositories (1.36M comparisons):
Benchmarked on AMD Ryzen 7 7800X3D with 16 cores (Fedora Linux 42). See PERFORMANCE.md for detailed analysis including comparisons with xpatch, vcdiff, qbsdiff, and zstd_dict.
Add this to your Cargo.toml:
[dependencies]
gdelta = "0.2"
Install using cargo:
cargo install gdelta --features cli
Or build from source:
git clone https://github.com/ImGajeed76/gdelta
cd gdelta
cargo build --release --features cli
# Binary at: target/release/gdelta
use gdelta::{encode, decode};
// Create base data and modified version
let base_data = b"Hello, World! This is some base data.";
let new_data = b"Hello, Rust! This is some modified data.";
// Encode the delta
let delta = encode(new_data, base_data)?;
println!("Delta size: {} bytes", delta.len());
// Decode to recover the new data
let recovered = decode(&delta, base_data)?;
assert_eq!(recovered, new_data);
Use
gdelta helpto see the most up-to-date options and descriptions.
The CLI provides a simple interface for creating and applying delta patches:
Create a delta patch:
# Basic usage
gdelta encode old_file.bin new_file.bin -o patch.delta
# With compression (recommended)
gdelta encode old_file.bin new_file.bin -o patch.delta -c zstd
gdelta encode old_file.bin new_file.bin -o patch.delta -c lz4
# With verification
gdelta encode old_file.bin new_file.bin -o patch.delta --verify
Apply a delta patch:
# Auto-detects compression format
gdelta decode old_file.bin patch.delta -o new_file.bin
# Force specific format (if magic bytes conflict)
gdelta decode old_file.bin patch.delta -o new_file.bin --format zstd
Options:
-c, --compress <FORMAT> - Compression: none, zstd, lz4 (default: none)-v, --verify - Verify delta after creation (encode only)-y, --yes - Skip memory warning prompts-f, --force - Overwrite existing files-q, --quiet - Suppress output except errorsExample workflow:
# Create compressed delta
gdelta encode database-v1.db database-v2.db -o update.delta -c zstd
# Later, apply the update
gdelta decode database-v1.db update.delta -o database-v2-restored.db
# Verify the result
diff database-v2.db database-v2-restored.db
Memory Management:
The CLI monitors memory usage and warns when operations might use >80% of available RAM. This is important because gdelta loads entire files into memory.
# For large files, the tool will prompt:
⚠ Memory warning: This operation requires ~12.4 GB
Available: 8.2 GB free (16 GB total)
Continue? [y/N]:
Use -y to skip prompts in automated scripts.
GDelta uses:
The algorithm identifies matching regions between base and new data, then encodes only the differences as a series of copy and literal instructions.
The implementation uses optimized default parameters:
These parameters are tuned for typical deduplication workloads.
| Algorithm | Speed | Compression | Memory | Use Case |
|---|---|---|---|---|
| gdelta | 496 MB/s | 68% | Low | Maximum speed |
| gdelta+lz4 | 430 MB/s | 71% | Low | Fast with compression |
| gdelta+zstd | 305 MB/s | 75% | Low | Balanced speed/compression |
| xpatch | 306 MB/s | 75% | Low | Automatic algorithm selection |
| vcdiff | 94 MB/s | 64% | Medium | Standard delta format |
| qbsdiff | 22 MB/s | 84% | Medium | Maximum compression |
| zstd_dict | 14 MB/s | 55% | Medium | Dictionary-based compression |
mdn/content repository (306K comparisons):
| Algorithm | Compression | Encode Time | Notes |
|---|---|---|---|
| xpatch (tags) | 97.5% saved | 319 µs | Best compression |
| xpatch (seq) | 97.5% saved | 14 µs | Fast alternative |
| gdelta | 97.0% saved | 4 µs | Fastest |
| vcdiff | 95.6% saved | 49 µs | Standard format |
tokio repository (33K comparisons):
| Algorithm | Compression | Encode Time | Notes |
|---|---|---|---|
| xpatch (tags) | 97.9% saved | 306 µs | Best compression |
| xpatch (seq) | 95.6% saved | 24 µs | Balanced |
| gdelta | 94.5% saved | 5 µs | Fastest |
| vcdiff | 93.1% saved | 28 µs | Standard format |
Git benchmark data from xpatch test results. See PERFORMANCE.md for detailed analysis.
Synthetic benchmarks:
Git repositories:
Workload matters: Performance characteristics vary significantly between synthetic edits and real file evolution patterns.
High-Speed Applications:
Version Control Systems:
Backup & Storage:
Network Synchronization:
Standards Compliance:
# Run unit tests
cargo test
# Run integration tests
cargo test --test '*'
# Run CLI test suite
./test_gdelta.sh
# Run simple benchmarks (quick verification)
cargo bench --bench simple
# Run comprehensive benchmarks (15-30 min)
cargo bench --bench comprehensive
# Run comprehensive benchmarks with custom filters
BENCH_FORMATS=json,csv BENCH_ALGOS=gdelta,xpatch cargo bench --bench comprehensive
The comprehensive benchmark supports two modes:
# Quick mode (default): smaller sample size, faster
BENCH_MODE=quick cargo bench --bench comprehensive
# Full mode: larger sample size, more accurate
BENCH_MODE=full cargo bench --bench comprehensive
Results are saved to target/benchmark_report_<timestamp>.md and .json.
This is a Rust implementation of the GDelta algorithm by Haoliang Tan.
Original repositories/resources:
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
Built with ❤️ by ImGajeed76