predictor
LLM Algorithm Testing Repository
This repository contains algorithm implementations and tests from various Large Language Models (LLMs) to evaluate their problem-solving capabilities, code quality, and understanding of complex concepts.
Test Overview
The primary test consists of challenging an LLM to create a Bitcoin price prediction algorithm that embodies the principle of "wisdom" - defined as being as simple as possible but as accurate as possible.
Test Components
#. Initial Code Analysis: LLMs are given a Crystal language implementation with bugs and asked to fix it #. Algorithm Improvement: Enhance the prediction algorithm while maintaining simplicity #. Wisdom Demonstration: Explain why the approach demonstrates wisdom in algorithm design #. Multi-Horizon Predictions: Generate predictions for 1 hour, 1 day, 1 week, 1 month, 1 year, and 3 years
Evaluation Criteria
LLMs are evaluated on:
- Code Quality: Clean, readable, maintainable code
- Bug Fixing: Ability to identify and fix existing issues
- Algorithm Design: Balance between simplicity and effectiveness
- Market Understanding: Incorporation of real-world market behaviors
- Wisdom Principles: Demonstration of knowing what to include vs. what to exclude
- Documentation: Clear explanation of approach and rationale
Test Results
Each LLM's implementation is stored in its own directory containing:
- The complete algorithm implementation (
predictor.cr) - Comprehensive documentation explaining the approach (
README.rst) - Evidence of wisdom in algorithm design
- Performance characteristics and predictions
Reviews and Rankings
The reviews/ directory contains comprehensive evaluations of all implementations:
gemini.rst- Gemini's evaluation of all LLM implementationsgrok.rst- Grok's evaluation of all LLM implementationsinstructions.rst- Guidelines for reviewing implementations
Key Findings from Reviews:
- Top Performers: Kimi (Bayesian intervals), Gemini (rigorous validation), DeepSeek (Bitcoin fundamentals)
- Critical Wisdom: Epistemic humility (probabilistic ranges) outperforms false precision
- Bitcoin-Specific: Understanding Bitcoin's network properties beats generic technical analysis
- Validation Matters: Walk-forward backtesting essential for proving effectiveness
Wisdom Definition
For the purpose of this test, "wisdom" in algorithm design is defined as:
#. Simplicity: Using the minimum complexity necessary to achieve the goal #. Effectiveness: Producing meaningful, useful results #. Balance: Knowing what features to include and what to exclude #. Reality: Incorporating real-world constraints and behaviors #. Humility: Acknowledging limitations and uncertainties
Why This Test Matters
This test evaluates several key LLM capabilities:
Technical Proficiency: Code debugging and implementation skills
Domain Knowledge: Understanding of financial markets and algorithmic trading
Conceptual Thinking: Ability to grasp abstract concepts like "wisdom"
Communication: Clear documentation and explanation of complex ideas
Pragmatism: Balancing theoretical perfection with practical utility
Market Awareness: Incorporating real-world market behaviors and technical analysis
Future Tests
Additional test scenarios may include:
Different asset classes (stocks, commodities, forex)
Alternative programming languages
More complex prediction scenarios
Real-time data integration
Performance optimization challenges
Extended historical data analysis (beyond 365 days)
Contributing
This repository serves as a benchmark for LLM capabilities. Each implementation represents a different approach to the same fundamental problem, providing valuable insights into how different models reason about complex, ambiguous tasks.
The goal is not to find the "perfect" algorithm, but to understand how LLMs approach problems that require both technical skill and conceptual wisdom.
predictor
- 0
- 0
- 0
- 0
- 0
- 5 days ago
- December 11, 2025
Mon, 15 Dec 2025 22:20:37 GMT