simdish

SIMDish: Learning SIMD with Crystal

This project was generated by AI for use as an educational resource for me.

SIMDish is an educational library that demonstrates how to use SIMD (Single Instruction, Multiple Data) instructions from Crystal language through C FFI (Foreign Function Interface). This project serves as a learning tool to understand how SIMD acceleration works and how to integrate C code with Crystal.

test

What is SIMD?

SIMD (Single Instruction, Multiple Data) is a computing technique that allows a single instruction to process multiple data elements in parallel. Modern CPUs include SIMD instruction sets like:

  • SSE (Streaming SIMD Extensions)
  • AVX (Advanced Vector Extensions)
  • AVX2
  • AVX-512

This library uses SIMD instructions to process 8 single-precision floating-point numbers (Float32) simultaneously.

Cross-Platform Support with SIMDe

SIMDish uses SIMDe (SIMD Everywhere) to provide cross-platform SIMD support. SIMDe is a header-only library that provides portable implementations of SIMD intrinsics on hardware that doesn't natively support them.

Benefits of using SIMDe:

  • Works on multiple platforms (x86, ARM, PowerPC, etc.)
  • Automatically uses native SIMD instructions when available
  • Falls back to portable implementations when needed
  • No need to write different code for different platforms

Project Structure

simdish/
├── c/                      # C implementation of SIMD functions
│   ├── simd_add.c          # Vector addition using SIMD
│   ├── simd_sub.c          # Vector subtraction using SIMD
│   ├── simd_mul.c          # Vector multiplication using SIMD
│   ├── simd_div.c          # Vector division using SIMD
│   ├── simdish.h           # Header file with function declarations
│   ├── Makefile            # For compiling C code into a shared library
│   └── simde/              # SIMDe library (submodule)
├── src/
│   └── simdish.cr          # Crystal FFI bindings and wrapper methods
├── spec/
│   └── simdish_spec.cr     # Tests for the library
└── examples/
    └── basic.cr            # Basic usage example

How It Works

  1. C Implementation: The core SIMD functions are implemented in C using SIMDe.
  2. Compilation: The C code is compiled into a shared library (.so file).
  3. FFI Binding: Crystal's FFI is used to bind to the compiled C functions.
  4. Crystal Wrapper: User-friendly Crystal methods wrap the FFI calls.

Installation

  1. Add the dependency to your shard.yml:
dependencies:
  simdish:
    github: kojix2/simdish
  1. Run shards install

  2. Initialize and update the SIMDe submodule:

cd lib/simdish
git submodule update --init
  1. Compile the C code:
cd lib/simdish/c
make

Usage

Before running the examples or tests, ensure that the library path is set correctly. This can be done by setting the DYLD_LIBRARY_PATH and LD_LIBRARY_PATH environment variables:

# For macOS
DYLD_LIBRARY_PATH=$(pwd)/c crystal examples/basic.cr

# For Linux
LD_LIBRARY_PATH=$(pwd)/c crystal examples/basic.cr
require "simdish"

# Create Float32 arrays with 8 elements (size must be a multiple of 8)
a = Slice[1.0_f32, 2.0_f32, 3.0_f32, 4.0_f32, 5.0_f32, 6.0_f32, 7.0_f32, 8.0_f32]
b = Slice[8.0_f32, 7.0_f32, 6.0_f32, 5.0_f32, 4.0_f32, 3.0_f32, 2.0_f32, 1.0_f32]

# SIMD addition
result = SIMDish.add(a, b)
puts result # => [9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0]

# SIMD subtraction
result = SIMDish.sub(a, b)
puts result

# SIMD multiplication
result = SIMDish.mul(a, b)
puts result

# SIMD division
result = SIMDish.div(a, b)
puts result

Understanding the Code

C Implementation with SIMDe

Let's look at how vector addition is implemented using SIMDe:

// From c/simd_add.c
#define SIMDE_ENABLE_NATIVE_ALIASES
#include "simde/x86/avx2.h"

void simd_add(const float *a, const float *b, float *result) {
    // Load 8 float values into SIMD registers
    simde__m256 va = simde_mm256_loadu_ps(a);
    simde__m256 vb = simde_mm256_loadu_ps(b);
    
    // Add 8 pairs of floats in parallel with a single instruction
    simde__m256 vresult = simde_mm256_add_ps(va, vb);
    
    // Store the result back to memory
    simde_mm256_storeu_ps(result, vresult);
}

Key SIMDe functions used:

  • simde_mm256_loadu_ps: Load 8 float values into a 256-bit register
  • simde_mm256_add_ps: Add two 256-bit registers (8 floats) in parallel
  • simde_mm256_storeu_ps: Store 8 float values from a register to memory

Crystal FFI Binding

# From src/simdish.cr
@[Link(ldflags: "#{__DIR__}/../c/libsimdish.so")]
lib LibSIMDish
  # SIMD vector addition for Float32 arrays
  fun simd_add(a : Float32*, b : Float32*, result : Float32*) : Void
  
  # Other functions...
end

Crystal Wrapper

# From src/simdish.cr
def self.add(a : Slice(Float32), b : Slice(Float32)) : Slice(Float32)
  raise ArgumentError.new("Arrays must have the same size") if a.size != b.size
  raise ArgumentError.new("Array size must be a multiple of 8") if a.size % 8 != 0

  result = Slice(Float32).new(a.size)
  
  # Process 8 elements at a time
  (0...a.size).step(8) do |i|
    LibSIMDish.simd_add(a.to_unsafe + i, b.to_unsafe + i, result.to_unsafe + i)
  end
  
  result
end

Performance Considerations

SIMD operations can significantly improve performance for numerical computations:

  • Processing 8 elements at once can theoretically provide up to 8x speedup
  • Actual performance gains depend on various factors including memory access patterns
  • SIMD is most effective for compute-bound operations on large arrays
  • SIMDe provides optimal performance by using native instructions when available

Platform Support

Thanks to SIMDe, this library works on multiple platforms:

  • x86/x86_64 (Intel, AMD)
  • ARM/ARM64 (Apple M1/M2, Raspberry Pi, etc.)
  • PowerPC
  • WASM (WebAssembly)
  • And more!

Requirements

  • Array sizes must be multiples of 8 (because we process 8 Float32 values at once)
  • C compiler (gcc, clang, etc.)

Development

  1. Clone the repository
  2. Initialize the SIMDe submodule: git submodule update --init
  3. Modify C code in the c directory
  4. Run make in the c directory to compile
  5. Run crystal spec to test

Contributing

  1. Fork it (https://github.com/kojix2/simdish/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Author

  • kojix2 - creator and maintainer
Repository

simdish

Owner
Statistic
  • 1
  • 0
  • 1
  • 0
  • 0
  • 15 days ago
  • April 6, 2025
License

MIT License

Links
Synced at

Mon, 21 Apr 2025 07:17:04 GMT

Languages