logarithm
Logarithm: Anomaly Detection Agent
Logarithm is a self-learning diagnostics agent for GNU/Linux systems that uses machine learning to detect anomalies in system logs in real-time.
Built with Crystal for performance and reliability, it trains an autoencoder on normal log patterns to identify potential issues, security threats, or unusual behavior.
Features
- Multi-source Monitoring: Simultaneous ingestion from systemd journal and syslog files
- Real-time Detection: Continuous anomaly detection with configurable thresholds
- Machine Learning: TF-IDF vectorization and autoencoder-based unsupervised learning
- Memory-Efficient Training: Chunked storage system prevents memory accumulation during training on large datasets (100K+ logs)
- Incremental Retraining: Flexible retraining modes to adapt to evolving log patterns
- Security: AES-256 encryption, audit logging, and input validation
- Resilience: Retry logic, circuit breakers, and comprehensive error handling
- CLI Tools: Simple command-line interface for training and monitoring
Quick Start
Prerequisites: Crystal 1.17.1+, GNU Make, systemd dev libs (for journald)
git clone https://gitlab.com/renich/logarithm.git
cd logarithm
make release
Train the model (24 hours on default logs):
bin/logarithm train
Train with custom duration:
bin/logarithm train -t '1h' -j
Monitor for anomalies:
bin/logarithm monitor
Monitor with custom threshold:
bin/logarithm monitor -T 0.8 /var/log/syslog
Monitor from stdin:
tail -f /var/log/app.log | bin/logarithm monitor -i
Command Line Interface
Logarithm provides comprehensive CLI options for both training and monitoring operations. All flags support both long and short forms.
Training Command
The train
command supports extensive customization of the training process:
bin/logarithm train [flags...] [paths...]
Training Flags
Flag | Short | Description | Example |
---|---|---|---|
--time |
-t |
Training duration | -t '30m' , -t '2h' , -t '90s' |
--vocab-size |
-V |
TF-IDF vocabulary size | -V 500 |
--batch-size |
-B |
Training batch size | -B 5000 |
--max-batches |
-M |
Maximum training batches | -M 10 |
--threshold |
-T |
Anomaly detection threshold | -T 0.8 |
--retrain-mode |
-m |
Retraining mode | -m incremental , -m full , -m hybrid |
--expand-vocab |
-e |
Expand vocabulary with new terms | |
--rollback |
-b |
Rollback to previous model version | |
--journald |
-j |
Include systemd journal | |
--recursive |
-r |
Recursively monitor directories | |
--since |
-s |
Start journal from specific time | -s '1 hour ago' |
--data-dir |
-D |
Model storage directory | -D /custom/path |
--log-level |
-L |
Log level | -L DEBUG , -L INFO |
--encryption-key |
-K |
Encryption key for models | -K 'your-key' |
--encryption-fingerprint |
-P |
Encryption key fingerprint | -P 'sha256-hash' |
--config |
-c |
Configuration file path | -c config.yml |
--verbose |
-v |
Enable verbose logging |
Training Examples
Basic training with custom duration:
bin/logarithm train -t '1h' -j
Advanced training with custom parameters:
bin/logarithm train -t '45m' -V 300 -B 2000 -M 8 -T 0.75 -L INFO -v
Full retraining with encryption:
bin/logarithm train --retrain-mode full -K 'my-secret-key' -t '2h'
Training with specific log sources:
bin/logarithm train /var/log/syslog /var/log/auth.log --recursive
Monitoring Command
The monitor
command provides real-time anomaly detection with flexible output options:
bin/logarithm monitor [flags...] [paths...]
Monitoring Flags
Flag | Short | Description | Example |
---|---|---|---|
--stdin |
-i |
Read logs from stdin | |
--output-format |
-f |
Output format | -f json , -f csv , -f human |
--filter |
-F |
Filter logs containing text | -F "ERROR" |
--threshold |
-T |
Anomaly detection threshold | -T 0.8 |
--journald |
-j |
Include systemd journal | |
--recursive |
-r |
Recursively monitor directories | |
--since |
-s |
Start journal from specific time | -s '1 hour ago' |
--data-dir |
-D |
Model directory | -D /custom/path |
--log-level |
-L |
Log level | -L DEBUG |
--encryption-key |
-K |
Encryption key for models | -K 'your-key' |
--encryption-fingerprint |
-P |
Encryption key fingerprint | -P 'sha256-hash' |
--config |
-c |
Configuration file path | -c config.yml |
--verbose |
-v |
Enable verbose logging |
Monitoring Examples
Basic monitoring:
bin/logarithm monitor /var/log/syslog
Monitor with JSON output and filtering:
bin/logarithm monitor -f json -F "WARNING" -T 0.7 /var/log/app.log
Read from stdin with custom threshold:
tail -f /var/log/app.log | logarithm monitor -i -T 0.9
Monitor multiple sources with journal:
bin/logarithm monitor -j /var/log/syslog /var/log/auth.log
CSV output for data analysis:
bin/logarithm monitor -f csv -T 0.8 /var/log/*.log > anomalies.csv
Model Management
Logarithm also provides model management commands:
bin/logarithm model [subcommand] [flags...]
Available subcommands:
info
- Display detailed model informationlist
- List available model versionsdelete
- Remove specific model versions
Examples:
# Show model information
bin/logarithm model info
# List all model versions
bin/logarithm model list
# Delete old model versions
bin/logarithm model delete --older-than 30d
Troubleshooting
Common Issues
"No trained models found" error:
# Train models first
bin/logarithm train -t '1h'
# Or specify custom model directory
bin/logarithm monitor -D /path/to/models
Permission denied on log files:
# Run with appropriate permissions or use sudo
sudo bin/logarithm monitor /var/log/syslog
# Or monitor specific accessible files
bin/logarithm monitor ~/logs/app.log
High memory usage during training:
# Use memory-efficient chunked training (recommended for large datasets)
bin/logarithm train -t '2h' # Automatically uses chunked storage for 100K+ logs
# Or reduce batch size and vocabulary for smaller datasets
bin/logarithm train -B 1000 -V 200 -t '30m'
Slow anomaly detection:
# Adjust threshold for fewer false positives
bin/logarithm monitor -T 0.9
# Use filtering to reduce log volume
bin/logarithm monitor -F "ERROR|WARNING" /var/log/app.log
Performance Tuning
For large log volumes:
- Use smaller batch sizes:
-B 1000
- Reduce vocabulary size:
-V 200
- Limit training batches:
-M 5
For real-time monitoring:
- Use filtering:
-F "important-pattern"
- Adjust threshold:
-T 0.8
- Use JSON output for integration:
-f json
Advanced Training Options
Logarithm supports flexible retraining strategies to adapt to evolving log patterns:
Incremental retraining (default, loads existing models and trains on new logs):
bin/logarithm train --retrain-mode incremental
Full retraining (ignores existing models, starts fresh training):
bin/logarithm train --retrain-mode full
Hybrid retraining (loads models but forces vocabulary expansion):
bin/logarithm train --retrain-mode hybrid
Expand vocabulary (add new terms to existing vectorizer vocabulary):
bin/logarithm train --expand-vocab
Rollback to previous model version (revert to backup models):
bin/logarithm train --rollback
Memory-Efficient Training
Logarithm v0.9.0 introduces a memory-efficient chunked training system that prevents memory accumulation when processing large datasets. This system automatically activates for datasets with 100K+ logs and provides the following benefits:
- Bounded Memory Usage: Memory usage remains constant regardless of dataset size
- Automatic Chunking: Large log files are automatically split into manageable chunks (~900KB each)
- Temporary Storage: Chunks are stored in temporary files, not memory
- Seamless Integration: No changes required to existing training commands
How It Works
When training on large datasets, Logarithm:
- Collects logs in memory until reaching the chunk threshold
- Writes chunks to temporary files in
/tmp/logarithm_training-*
- Processes chunks sequentially during training
- Cleans up temporary files automatically after training
Memory-Efficient Training Examples
Training on large log files (automatic chunking):
# Processes 1M logs with bounded memory usage
bin/logarithm train -t '2h' /var/log/large_app.log
# Multiple large files with chunked processing
bin/logarithm train -t '4h' /var/log/app.log /var/log/access.log
Monitoring chunked training progress:
# Check chunk files during training
watch 'ls -la /tmp/logarithm_training-*/chunk_*.log | wc -l'
# Monitor memory usage (should remain stable)
watch 'ps aux | grep logarithm | grep -v grep | awk "{print \$4\"% \"\$6\"KB\"}"'
Performance Characteristics
- Memory Usage: ~25-50MB regardless of dataset size (vs. GBs without chunking)
- Chunk Size: ~900KB per chunk for optimal I/O performance
- Processing Rate: 1000-5000 logs/second depending on hardware
- Storage Overhead: Minimal (chunks are cleaned up automatically)
Configuration
Logarithm supports configuration via YAML files, environment variables, and command-line flags. Settings are applied in this order of precedence:
- Command-line flags (highest priority) - Override all other settings
- Environment variables - Override config file and defaults
- Configuration file - Override built-in defaults
- Built-in defaults (lowest priority)
Configuration File
Example config file (config.yml
):
data_dir: ~/.local/share/logarithm
threshold: 0.85
duration: 48h
vocab_size: 100
batch_size: 10000
max_batches: 5
Environment Variables
Logarithm respects the following environment variables:
Variable | Description | Example |
---|---|---|
LOGARITHM_DATA_DIR |
Model storage directory | /custom/path |
LOGARITHM_LOG_LEVEL |
Logging verbosity | DEBUG , INFO , WARN |
LOGARITHM_ENCRYPTION_KEY |
Encryption key for models | your-secret-key |
LOGARITHM_CONFIG |
Path to config file | /path/to/config.yml |
CLI Flag Precedence
All command-line flags override their corresponding configuration settings:
# This will use threshold 0.9 regardless of config file or env var settings
bin/logarithm monitor -T 0.9 /var/log/syslog
# This will use custom data directory regardless of LOGARITHM_DATA_DIR
bin/logarithm train -D /tmp/models -t '1h'
Configuration Examples
Using config file:
bin/logarithm train -c /etc/logarithm/config.yml
Using environment variables:
export LOGARITHM_DATA_DIR=/var/lib/logarithm
export LOGARITHM_THRESHOLD=0.8
bin/logarithm monitor
Mixed configuration (CLI overrides all):
LOGARITHM_THRESHOLD=0.7 bin/logarithm monitor -T 0.9 -D /tmp/models
# Result: threshold=0.9, data_dir=/tmp/models
GPL v3 License
This project is licensed under the GPL v3 License - see the LICENSE file for details.
Authors
- Rénich Bon Ćirić - Creator and maintainer - renich@evalinux.com
Acknowledgments
Development and Testing
Fakelogs Tool
Logarithm includes a fakelogs
tool for generating synthetic log data for testing:
# Build the fakelogs tool
make fakelogs
# Generate syslog-style logs
./bin/fakelogs --template syslog --count 1000 > test_logs.log
# Generate JSON logs with anomalies
./bin/fakelogs --template json --anomalies 50 > test_data.json
Testing
Run the full test suite:
make test
Run integration tests:
make -C integration test
Building
Build development version:
make build
Build optimized release:
make release
Documentation
- API Documentation: Generated from source code using
make docs
(uses README.rst) - User Guide: See
USER_GUIDE.rst
for practical usage examples - Fakelogs Guide: See
integration/README.md
for testing tool documentation
logarithm
- 0
- 0
- 0
- 0
- 3
- 29 days ago
- September 14, 2025
GNU General Public License v3.0 or later
Sun, 19 Oct 2025 01:43:43 GMT