triki
Triki
You want to develop against real production data, but you don't want to violate your users' privacy. Enter Triki: standalone Crystal code for the selective rewriting of SQL dumps in order to protect user privacy.
Documentation
API documentation is available at https://josacar.github.io/triki/.
Table of Contents
- Supported databases and versions
- Install
- Quick Start
- Column Types & Configuration Options
- Error Handling & Troubleshooting
- Scaffolding
- Speed
- Contributing
- Thanks
- License
Supported databases and versions
Servers
- MySQL
- Postgres
- SQL Server
Clients
pg_dump( Postgresql ) up to 15.xmysqldump( MySQL ) up to 8.2mysqldump( MariaDB ) akamariadb-dump( since v0.3.0 ) up to 10.11
Note: Clients may break current SQL dump parsing as by now there is no proper integration testing in CI with all combinations of servers and clients versions, above versions have been partially manually tested.
Install
Add this in your shard.yml
dependencies:
triki:
github: josacar/triki
And then run shards install
Quick Start
1. Create an obfuscator script
# obfuscator.cr
require "triki"
obfuscator = Triki.new({
"users" => {
"email" => :email,
"password" => { :type => :fixed, :string => "password123" },
"name" => :name,
},
"sessions" => :truncate,
"roles" => :keep,
})
obfuscator.obfuscate(STDIN, STDOUT)
2. Pipe a database dump through it
mysqldump -c -u user -ppassword mydb | crystal run obfuscator.cr > obfuscated.sql
That's it — every row in users gets anonymized, sessions is emptied, and roles passes through unchanged.
Tip: For large dumps, compile once and use the binary for much faster processing:
crystal build --release obfuscator.cr
mysqldump -c -u user -ppassword mydb | ./obfuscator > obfuscated.sql
Example Usage
Make an obfuscator.cr script:
require "triki"
obfuscator = Triki.new({
"people" => {
"email" => { :type => :email, :skip_regexes => [/^[\w\.\_]+@my_company\.com$/i] },
"ethnicity" => :keep,
"crypted_password" => { :type => :fixed, :string => "SOME_FIXED_PASSWORD_FOR_EASE_OF_DEBUGGING" },
"bank_account" => { :type => :fixed, :string => ->(row : Triki::RowAsHash) { "#{row["bank_account"].to_s[0..4]}#{"*" * (row["email"].to_s.size - 5)}".as(Triki::RowContent) } },
"salt" => { :type => :fixed, :string => "SOME_THING" },
"remember_token" => :null,
"remember_token_expires_at" => :null,
"age" => { :type => :null, :unless => ->(row : Triki::RowAsHash) { row["email"] == "hello@example.com" } },
"photo_file_name" => :null,
"photo_content_type" => :null,
"photo_file_size" => :null,
"photo_updated_at" => :null,
"postal_code" => { :type => :fixed, :string => "94109", :unless => ->(person : Triki::RowAsHash) { person["postal_code"] == "12345"} },
"name" => :name,
"full_address" => :address,
"bio" => { :type => :lorem, :number => 4 },
"relationship_status" => { :type => :fixed, :one_of => ["Single", "Divorced", "Married", "Engaged", "In a Relationship"] },
"has_children" => { :type => :integer, :between => 0..1 },
},
"invites" => :truncate,
"invite_requests" => :truncate,
"tags" => :keep,
"relationships" => {
"account_id" => :keep,
"code" => { :type => :string, :length => 8, :chars => Triki::USERNAME_CHARS }
}
})
obfuscator.fail_on_unspecified_columns = true # if you want it to require every column in the table to be in the above definition
obfuscator.globally_kept_columns = %w[id created_at updated_at] # if you set fail_on_unspecified_columns, you may want this as well
obfuscator.obfuscate(STDIN, STDOUT)
And to get an obfuscated dump:
mysqldump -c --add-drop-table --hex-blob -u user -ppassword database | obfuscator > obfuscated_dump.sql
For details on required mysqldump flags and troubleshooting common issues, see Error Handling & Troubleshooting below.
PostgreSQL & SQL Server
By default the database type is assumed to be MySQL, but you can use the builtin SQL Server and Postgres support by specifying:
obfuscator.database_type = :sql_server
obfuscator.database_type = :postgres
If using Postgres, use pg_dump to get a dump:
pg_dump database | obfuscator > obfuscated_dump.sql
Column Types & Configuration Options
Simple types (shorthand)
Use the symbol directly: "column_name" => :email
Table of all types
| Type | Description | Options | Example |
|---|---|---|---|
:email |
Generates a random email | :skip_regexes |
{ :type => :email, :skip_regexes => [/@mycompany\.com$/i] } |
:string |
Random string of given length | :length, :chars |
{ :type => :string, :length => 8, :chars => Triki::USERNAME_CHARS } |
:lorem |
Lorem ipsum text | :number (sentences) |
{ :type => :lorem, :number => 4 } |
:like_english |
Real English sentences | :number (sentences) |
{ :type => :like_english, :number => 3 } |
:name |
Full random name | — | :name |
:first_name |
First name only | — | :first_name |
:last_name |
Last name only | — | :last_name |
:address |
Full address (street, city, state, zip) | — | :address |
:street_address |
Street address only | — | :street_address |
:secondary_address |
Apt/Suite number | — | :secondary_address |
:city |
City name | — | :city |
:state |
State abbreviation | — | :state |
:zip_code |
US zip code | — | :zip_code |
:phone |
Phone number | — | :phone |
:company |
Company name | — | :company |
:ipv4 |
IPv4 address | — | :ipv4 |
:ipv6 |
IPv6 address | — | :ipv6 |
:url |
URL | — | :url |
:integer |
Random integer | :between (range) |
{ :type => :integer, :between => 18..65 } |
:fixed |
Always the same value | :string or :one_of |
{ :type => :fixed, :string => "secret" } or { :type => :fixed, :one_of => ["A", "B"] } |
:null |
Sets value to NULL | — | :null |
:keep |
Keeps the original value | — | :keep |
Conditional options
These options can be combined with any type above:
| Option | Type | Description | Example |
|---|---|---|---|
:skip_regexes |
Array(Regex) |
Skip obfuscation if value matches any regex | { :type => :email, :skip_regexes => [/@internal\.com$/] } |
:unless |
Proc or :nil or :blank |
Skip obfuscation when condition is true | { :type => :null, :unless => :nil } (only nullify non-null values) |
:if |
Proc or :nil or :blank |
Only obfuscate when condition is true | { :type => :email, :if => :blank } (only fill blank emails) |
The :fixed type also supports :one_of to randomly pick from an array:
"status" => { :type => :fixed, :one_of => ["active", "inactive", "pending"] }
And :fixed supports procs for dynamic values based on the row:
"masked_ssn" => { :type => :fixed, :string => ->(row) { "***-**-" + row["ssn"].to_s[-4..-1] } }
Table-level actions
Instead of a column hash, use a symbol for the entire table:
| Action | Description |
|---|---|
:truncate |
Remove all rows from the table |
:keep |
Pass the table through unchanged |
Error Handling & Troubleshooting
Configuration errors
| Error | Cause | Fix |
|---|---|---|
Column 'X' could not be found in table 'Y' |
A column in your config doesn't exist in the dump | Check your config for typos, or use scaffold to see the actual column names |
Column 'X' defined in table 'Y', but not found in table definition |
fail_on_unspecified_columns = true and a dump column has no config entry |
Add the missing column to your config or add it to globally_kept_columns |
ERROR: Column X does not exist |
The config references a column not present in the INSERT statement | Same as above — verify column names match the dump |
ERROR: 'unless' definition does not return Bool or Symbol |
:unless value isn't a Proc, :nil, or :blank |
Use one of the supported conditional values |
ERROR: 'if' definition does not return Bool or Symbol |
Same for :if |
Same fix |
ERROR: 'length' or 'range' es not an Integer or a Range |
:length on a :string type isn't an integer or range |
Pass an integer or range (e.g. 5..15) |
table_config is not a hash |
Internal error — a table config is neither :truncate, :keep, nor a hash |
Check your config structure |
| Unknown type warning | You used a :type symbol that isn't recognized |
Replace with a valid type from the Column Types table, or use :keep |
Parsing errors
| Error | Cause | Fix |
|---|---|---|
Parse error: unexpected token begginning at … |
The MySQL dump wasn't generated with -c (column names) flag |
Always use mysqldump -c |
Cannot obfuscate Postgres dumps containing INSERT statements |
Postgres dump used INSERT instead of COPY |
Use pg_dump without --inserts or --column-inserts |
Warnings (logged to STDERR)
| Warning | Cause | Fix |
|---|---|---|
X was not specified in the config |
A table in the dump has no config entry and isn't :keep |
Add the table to your config, or set it to :keep or :truncate. A future release may treat this as an error |
Common issues
mysqldump without -c won't work Triki relies on knowing column names to decide what to obfuscate. The -c flag (--complete-insert) is mandatory:
mysqldump -c --hex-blob -u user -ppassword database | obfuscator > out.sql
Special characters in strings causing parse issues Use --hex-blob with mysqldump to hex-encode binary/blob content, which avoids quote-escaping edge cases:
mysqldump -c --hex-blob -u user -ppassword database | obfuscator > out.sql
Very long lines causing MySQL errors For large dumps with many rows per INSERT, try:
mysqldump -c --hex-blob --max_allowed_packet=128M --single-transaction --skip-extended-insert --quick -u user -ppassword database | obfuscator > out.sql
Newlines in MariaDB >= 10.7.1 dumps Triki handles newlines within INSERT statements automatically (since v0.3.0). No extra flags needed.
Scaffolding
If you don't want to type all those table names and column names into your obfuscator.cr script, you can use triki to do some of that work for you. It can consume your database dump file and create a "scaffold" for the script. To run triki in this mode, start with an "empty" scaffolder.cr script as follows:
obfuscator = Triki.new
obfuscator.scaffold(STDIN, STDOUT)
Then feed in your database dump:
mysqldump -c --hex-blob -u user -ppassword database | scaffolder > obfuscator_scaffold_snippet
pg_dump database | scaffolder > obfuscator_scaffold_snippet
The output will be a series of configuration statements of the form:
"table_name" => {
"column1_name" => :keep # scaffold
"column2_name" => :keep # scaffold
... etc.
Scaffolding also works if you have a partial configuration. If your configuration is missing some tables or some columns, a call to 'scaffold' will pass through the configuration that exists and augment it with scaffolding for the missing tables or columns.
Speed
The main motivation to rewrite this from Ruby to Crystal was speed, here is an example obfuscating 16 tables and 15 columns in total.
MySQL dump 160MB (gzip'ed)
Ruby
real 1m56.980s
user 1m57.080s
sys 0m2.660s
Crystal
real 0m26.579s
user 0m28.220s
sys 0m1.748s
MySQL dump 1.4G
Ruby
real 1m52.974s
user 1m49.824s
sys 0m4.560s
Crystal
real 0m17.642s
user 0m17.952s
sys 0m2.192s
That's about 6.40x speedup compared to the Ruby version.
Contributing
- Fork the project.
- Create a feature branch for your change.
- Add tests for any new functionality.
- Run the test suite (
crystal spec) and make sure everything passes. - Run the linter (
bin/ameba) and fix any issues. - Send a pull request.
Thanks
Forked from https://github.com/cantino/my_obfuscate
Thanks to all of the authors and contributors of the original Ruby gem
LICENSE
This work is provided under the MIT License. See the included LICENSE file.
The included English word frequency list used for generating random text is provided under the Creative Commons – Attribution / ShareAlike 3.0 license by http://invokeit.wordpress.com/frequency-word-lists/
triki
- 45
- 4
- 1
- 0
- 3
- 7 days ago
- May 26, 2020
MIT License
Thu, 16 Apr 2026 21:22:35 GMT