blah

For-fun pet project building software stack on top of a minimal ISA

Blah

What is this project

This is a toy computational toolchain. It allows to compile, assemble, link, run and debug programs, without relying on existing tech.
It use the RiSC16 instruction set architecture.

RiSC stands for Ridiculously Simple Computer, which makes sense in the context in which the instruction-set is normally used -- to teach simple organization and architecture to undergraduates who do not yet know how computers work. The architecture has a whopping 8 opcodes, uses 8 registers, and is nonetheless general enough to execute fairly sophisticated programs. This makes it a relatively useful teaching tool, as it allows students to look at computer architecture concepts from simple to advanced without the instruction-set getting in the way or cluttering up the picture.
-- Bruce Jacob

Doing so, it involves simple implementation of:

  • A virtual machine implementing the RiSC16 ISA
  • An assembler for a RiSC16 assembly language, outputing custom relocatable object files
  • A linker for mergeing and linking said object files into a raw binary
  • A compiler for a rudimentary imperative C-inspired general purpose language
  • A curses based debugger

This toolchain is implemented as a pure library.
There are two implementation integrating this toolchain:

  • A CLI based on STDIO and the local filesystem
  • A multi user web application

Why is this project

The initial motivation is to experiment with compiler and computer architecture from scratch but without the burden of the complexity of modern computers. This is a pet project and pretext to learn and experiment.

Since the project began to yield some (moderatly) interesting results I decided to use at as my main portfollio item and so I made it a demo project for more marketable skills and added the web application.

Buzzwords

The toolchain is written as a collection of pure functions. Error reporting and filesystem access are provided as dependencies through injection of implementation of simple interfaces.

The web application REST api is fully stateless and scalable horizontally by default.
The default development docker network use several parallel api instances load balanced by a reverse proxy without requiring ip hashing.
Api instances exchange notifications and synchronize locks through a common cache and publish/subscribe.

The webapp api is built with extensive usage of dependency injection and the codebase design draw inspiration from domain driven design (DDD) and clean architecture fundamentals without falling into a counter-productive dogmatism tarpit.

The webapp api handles quotas and ACL for limiting resource impact of users.

The webapp support colaborative features and server notifications through websockets.
The webapp support long-running interactive tasks.

The webapp frontend is built with react.

Core Concepts

The most down to earth objective one can have with this toolchain is to build and run a program.

Computer specification

An instruction set architecture is a description of a CPU capabilities, not a description of a full computers. We need some additional information to describe event the simplest computer. Since RiSC16 use memory mapped IO we need to define the address space of our computers.
Address space are defined in spec files:

spec.ini

[hardware.segment.ram]
kind=ram
start=0x0
size=0xffff

This describe a computer whith no I/O and whose full address space (from 0 to 0xffff as RiSC16 has a 16 bits address bus) is mapped to ram.

Most of the time we want at least on I/O port to be able to interact with our program:

spec.ini

[hardware.segment.ram]
kind=ram
start=0x0
size=0xfffe

[hardware.segment.tty]
kind=io
start=0xfffe
tty=true

The tty = true property will be recognized by toolchain implementation and will be mapped to a suitable user input/output channel.

A simple program

Lets write a simple program that output a constant string.

hello.blah

# Define a global start symbol. This is not required but it is a practical hint for the linker
export start:

# Load the address of the symbol 'text', defined at the end of the file
movi r1 :text

# Load the addrss of the symbol '__io_tty'
# This symbol is automtically made available by the linker based on the specification file.
# In our case, its valeu will be the address of the 'tty' io port, AKA 0xfffe
movi r4 :__io_tty

loop:
	
	# Read a 16 bit word from 'text'
	lw r3 r1 0
	
	# if the word has the value 0, jump to 'end
	beq r3 r0 :end

	# Write the word we read to '__io_tty'
	sw r3 r4 0

	# Increment the pointer to text by 1
	addi r1 r1 1

	# Jump to the beginning of the loop
	beq r0 r0 :loop

# Define an 'end' symbol that poitn to an isntruction that stop the program
end: halt

# define a symbol 'text' and embbed a string data in the program
text: .ascii "Hello world !\n"

We can run our program: cli asm --spec=spec.ini -build-dir=build --output=hello --also-run hello.blah

Output:

Success: Successfully assembled hello.blah into 'build/hello.ro'
Success: Successfully merged 'build/hello.ro'
Success: Successfully linked 'hello'
Success: Running ... 'hello'
Hello world !
Success: Ran 'hello'

Usage

CLI

Build the command line with the crystal shard tool:
shards build cli
Or with docker:
docker run -v './:/root' -w /root crystallang/crystal:1.15.0-alpine shards build cli

The CLI binary will be found in ./bin/cli.

Web Application

Building and running the API with development parameters for local usage:
docker-compose --env-file local.env up

Building, running and serving the frontend web application: npm run dev-serv

Open the webapp on http://localhost:8080

Project cross-referencing

It is possible to reference a file from another accessible project from any project.
There are five different category of path that be used when referencing a file from within another file:

  • Regular relative file path: dependency.sl, ./lib/function.sl, ../project.ini
  • Absolute project file path: /build/output.ro
  • User scoped project file path: my-std-lib:/stdlib.lib
    The absolute path is prefixed by the name of a project of the current user and seperated by a :
  • Project scoped file path: user:project:/path The absolute path is prefixed by the name of a user and the name of a project of this user, seperated by a ':'
  • Job temporary filesystem: @/build/output.ro, @canbeany/thi/ng File path prefixed by a @ will be read/written from a temporary filesystem that is initialized for each job.
    User running job on project where they don't have write access can write to @ files As for project filesystem, this temporary filesystem is subject to quotas for maximum total fiel size and file numbers.

Toolchain

Full example

Here is an example of what can be currently done:
Running

./bin/cli -l stdlib/left_bitshift.sl stdlib/multiply.blah stdlib/putword.sl stdlib/right_bitshift.sl stdlib/stacklang_startup.blah -o build/stdlib.lib
./bin/cli -b build -s examples/brainfuck/brainfuck.ini -d bf-source=examples/brainfuck/hello.bf examples/brainfuck/brainfuck.sl build/stdlib.lib -o build/brainfuck

Will output

Hello World!

It compile and run a small program that will execute another program written in Brainfuck

Sources can be found here or in this readme:

examples/brainfuck/brainfuck.ini

[linker.section.text]
start=0x0

[linker.section.stack]
start=0xff00
size=0x00fe

[hardware.segment.ram]
kind=ram
start=0x0
size=0xfffe

[hardware.segment.tty]
kind=io
start=0xfffe
tty=true

[hardware.segment.brainfuck]
kind=io
start=0xffff
source=$bf-source

examples/brainfuck/hello.bf:

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.

examples/brainfuck/brainfuck.sl:

require "../stdlib/prototypes.sl"

fun load_io(io:*, destination:*, size):_ {
  var i = 0
  var buffer
  while ((buffer = *io) != 0xff00 ) {
    if (i == size)
      return 0
    *(destination + i) = buffer
    i += 1
  }
  return i
}

var program: [0x1000]
var ram: [0x1000]

fun main:_ {
  var pc = 0
  var ptr = 0
  var program_size = load_io(&__io_brainfuck, &program, 0x1000)
  var loop_count = 0

  if (program_size == 0x10)
     return 1

  while (1) {
    if (pc == program_size)
      return 0
    if (program[pc] == 0x3E)
      ptr += 1
    if (program[pc] == 0x3c)
      ptr -= 1
    if (program[pc] == 0x2B)
      ram[ptr] += 1
    if (program[pc] == 0x2D)
      ram[ptr] -= 1
    if (program[pc] == 0x2E)
      __io_tty = ram[ptr]
    if (program[pc] == 0x2C)
      ram[ptr] = __io_tty
    if (program[pc] == 0x5B) {
      if (ram[ptr] == 0) {
       	while (program[pc] != 0x5D || loop_count != 0) {
 	        pc += 1
	        if (pc == program_size)
      	    return 1
	        if (program[pc] == 0x5B)
	          loop_count += 1
	        if (program[pc] == 0x5D)
	          loop_count -= 1
        }
      }
    }
    if (program[pc] == 0x5D) {
      if (ram[ptr] != 0) {
        loop_count = 1
       	while (program[pc] != 0x5B || loop_count != 0) {
          if (pc == 0)
	          return 1
	        pc -= 1
	        if (program[pc] == 0x5D)
      	    loop_count += 1
	        if (program[pc] == 0x5B)
	          loop_count -= 1
	      }
      }
    }

    pc += 1
  }
  return 0
}

Other sources files used for building the stdlib can be found in the stdlib directory

TODO

  • Group functionalities behind a presentation independent application
    • Build a CLI presentation layer
  • Improve app instrumentation (stream events)
    • Multiple sources per errors
    • Text effects
    • Linker errors
    • Native codegen errors Unfinished but enough for now
    • Spec errors
  • Split projects into toolchain/app toolchain/clients app/api app/client app/database, ...
  • Build language servers
  • Rename the project
  • Recipe files / books (makeshift makefiles)
  • Debugger server
  • Design a reasonnably doable webappified version:
    • API with:
      • Docker, Migration, User with in-house auth and social logins
      • Project with a file tree (total size quotas per project / per user)
      • Websocket for streaming warning / error / compilation progress, and runtime tty (websocket quotas)
      • Projects ACL, for sharing projects with various privilege (private, shared, public, read/write priviliege)
        • Browse public projects (such as stdlib, wiki, demo)
        • Clone project
        • Language feature to allow references to other projects
    • Frontend client with:
      • Toolbar for account/project management (create, switch, delete, export, log out, account details)
      • File explorer
      • Code editor
      • Log console
      • Up to several tty console
      • A debugger client
      • Markdown rendering
    • integration tests
  • Full refactor
    • Rewrite assembler parser
    • Improve assembler/stacklang lexers to stream and parsers to have a fixed low look-ahead
    • Full compiler refactor
      • Three address code intermediary
      • Various simples optimizations
      • Smarter register handling
    • Consider switching to a custom 20 bits words ISA
  • Write an assembler able to ouput raw bitcode
    • Better error output for linker
  • Write a dummy virtual machine that can execute this raw bitcode
    • IO
    • IO with both data and control register
    • Hello World
    • Propose memory paging mechanisms
    • Propose memory protection mechanisms
  • Design and write a compiler for a small stack language
    • Finish parser refactor to full-featurness
    • Detect unterminating functions
    • Rework compiler error handling
    • Define a way to debug at link time (A non-loaded section that can contain an index of informations)
    • Operators overload on non-word or pointer types
    • Implement else, elsif, next, break
    • Global variable initialization
    • Scoped variables (in statement blocks) allocated as needed
    • Add error for stack size exceding small immediate size
    • Fix && and ||
    • Fix >= and <=
    • Refactor compiler to be simpler
    • Inline small functions
  • Stdlib
    • IO handling
    • Basic math
  • Write an OS
    • Implement a file system
    • Relocate and load another program
    • Relocate and load a dynamic library
    • Load, relocate a program with link to a dynamic library
    • Handle syscalls
    • Run multiple programs
    • Memory management
    • Fork and threads
    • Memory paging trhough bank switch or file system
    • Memory protection (isolate programs and limit jump between programs)
  • Define fixed spec for a computer
  • Define a standard configuration/description mechanism for detecting 'hardware'
Repository

blah

Owner
Statistic
  • 1
  • 0
  • 0
  • 0
  • 8
  • about 17 hours ago
  • January 9, 2021
License

MIT License

Links
Synced at

Sun, 23 Feb 2025 00:00:58 GMT

Languages