simdish
SIMDish: Learning SIMD with Crystal
This project was generated by AI for use as an educational resource for me.
SIMDish is an educational library that demonstrates how to use SIMD (Single Instruction, Multiple Data) instructions from Crystal language through C FFI (Foreign Function Interface). This project serves as a learning tool to understand how SIMD acceleration works and how to integrate C code with Crystal.
What is SIMD?
SIMD (Single Instruction, Multiple Data) is a computing technique that allows a single instruction to process multiple data elements in parallel. Modern CPUs include SIMD instruction sets like:
- SSE (Streaming SIMD Extensions)
- AVX (Advanced Vector Extensions)
- AVX2
- AVX-512
This library uses SIMD instructions to process 8 single-precision floating-point numbers (Float32) simultaneously.
Cross-Platform Support with SIMDe
SIMDish uses SIMDe (SIMD Everywhere) to provide cross-platform SIMD support. SIMDe is a header-only library that provides portable implementations of SIMD intrinsics on hardware that doesn't natively support them.
Benefits of using SIMDe:
- Works on multiple platforms (x86, ARM, PowerPC, etc.)
- Automatically uses native SIMD instructions when available
- Falls back to portable implementations when needed
- No need to write different code for different platforms
Project Structure
simdish/
├── c/ # C implementation of SIMD functions
│ ├── simd_add.c # Vector addition using SIMD
│ ├── simd_sub.c # Vector subtraction using SIMD
│ ├── simd_mul.c # Vector multiplication using SIMD
│ ├── simd_div.c # Vector division using SIMD
│ ├── simdish.h # Header file with function declarations
│ ├── Makefile # For compiling C code into a shared library
│ └── simde/ # SIMDe library (submodule)
├── src/
│ └── simdish.cr # Crystal FFI bindings and wrapper methods
├── spec/
│ └── simdish_spec.cr # Tests for the library
└── examples/
└── basic.cr # Basic usage example
How It Works
- C Implementation: The core SIMD functions are implemented in C using SIMDe.
- Compilation: The C code is compiled into a shared library (.so file).
- FFI Binding: Crystal's FFI is used to bind to the compiled C functions.
- Crystal Wrapper: User-friendly Crystal methods wrap the FFI calls.
Installation
- Add the dependency to your
shard.yml
:
dependencies:
simdish:
github: kojix2/simdish
-
Run
shards install
-
Initialize and update the SIMDe submodule:
cd lib/simdish
git submodule update --init
- Compile the C code:
cd lib/simdish/c
make
Usage
Before running the examples or tests, ensure that the library path is set correctly. This can be done by setting the DYLD_LIBRARY_PATH
and LD_LIBRARY_PATH
environment variables:
# For macOS
DYLD_LIBRARY_PATH=$(pwd)/c crystal examples/basic.cr
# For Linux
LD_LIBRARY_PATH=$(pwd)/c crystal examples/basic.cr
require "simdish"
# Create Float32 arrays with 8 elements (size must be a multiple of 8)
a = Slice[1.0_f32, 2.0_f32, 3.0_f32, 4.0_f32, 5.0_f32, 6.0_f32, 7.0_f32, 8.0_f32]
b = Slice[8.0_f32, 7.0_f32, 6.0_f32, 5.0_f32, 4.0_f32, 3.0_f32, 2.0_f32, 1.0_f32]
# SIMD addition
result = SIMDish.add(a, b)
puts result # => [9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0]
# SIMD subtraction
result = SIMDish.sub(a, b)
puts result
# SIMD multiplication
result = SIMDish.mul(a, b)
puts result
# SIMD division
result = SIMDish.div(a, b)
puts result
Understanding the Code
C Implementation with SIMDe
Let's look at how vector addition is implemented using SIMDe:
// From c/simd_add.c
#define SIMDE_ENABLE_NATIVE_ALIASES
#include "simde/x86/avx2.h"
void simd_add(const float *a, const float *b, float *result) {
// Load 8 float values into SIMD registers
simde__m256 va = simde_mm256_loadu_ps(a);
simde__m256 vb = simde_mm256_loadu_ps(b);
// Add 8 pairs of floats in parallel with a single instruction
simde__m256 vresult = simde_mm256_add_ps(va, vb);
// Store the result back to memory
simde_mm256_storeu_ps(result, vresult);
}
Key SIMDe functions used:
simde_mm256_loadu_ps
: Load 8 float values into a 256-bit registersimde_mm256_add_ps
: Add two 256-bit registers (8 floats) in parallelsimde_mm256_storeu_ps
: Store 8 float values from a register to memory
Crystal FFI Binding
# From src/simdish.cr
@[Link(ldflags: "#{__DIR__}/../c/libsimdish.so")]
lib LibSIMDish
# SIMD vector addition for Float32 arrays
fun simd_add(a : Float32*, b : Float32*, result : Float32*) : Void
# Other functions...
end
Crystal Wrapper
# From src/simdish.cr
def self.add(a : Slice(Float32), b : Slice(Float32)) : Slice(Float32)
raise ArgumentError.new("Arrays must have the same size") if a.size != b.size
raise ArgumentError.new("Array size must be a multiple of 8") if a.size % 8 != 0
result = Slice(Float32).new(a.size)
# Process 8 elements at a time
(0...a.size).step(8) do |i|
LibSIMDish.simd_add(a.to_unsafe + i, b.to_unsafe + i, result.to_unsafe + i)
end
result
end
Performance Considerations
SIMD operations can significantly improve performance for numerical computations:
- Processing 8 elements at once can theoretically provide up to 8x speedup
- Actual performance gains depend on various factors including memory access patterns
- SIMD is most effective for compute-bound operations on large arrays
- SIMDe provides optimal performance by using native instructions when available
Platform Support
Thanks to SIMDe, this library works on multiple platforms:
- x86/x86_64 (Intel, AMD)
- ARM/ARM64 (Apple M1/M2, Raspberry Pi, etc.)
- PowerPC
- WASM (WebAssembly)
- And more!
Requirements
- Array sizes must be multiples of 8 (because we process 8 Float32 values at once)
- C compiler (gcc, clang, etc.)
Development
- Clone the repository
- Initialize the SIMDe submodule:
git submodule update --init
- Modify C code in the
c
directory - Run
make
in thec
directory to compile - Run
crystal spec
to test
Contributing
- Fork it (https://github.com/kojix2/simdish/fork)
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
Author
- kojix2 - creator and maintainer
simdish
- 1
- 0
- 1
- 0
- 0
- 15 days ago
- April 6, 2025
MIT License
Mon, 21 Apr 2025 07:17:04 GMT