unicode_blocr

Identify the Unicode block to which a character belong.

Unicode Blocr

Build Status ISC

Identify the Unicode block to which a character belongs.

This library can be used to identify the type of characters, and eventually filter them (for example emoticons).

In Unicode, a block is defined as one contiguous range of code points (https://en.wikipedia.org/wiki/Unicode_block).

Installation

Add the dependency to your shard.yml:

dependencies:
  unicode_blocr:
    github: j8r/unicode_blocr

Usage examples

Basic

To print the block range to which the character belongs:

require "unicode_blocr"

puts UnicodeBlock.new 'a' #=> UnicodeBlock::BasicLatin
puts UnicodeBlock.new 'é' #=> UnicodeBlock::Latin1Supplement

Filter characters

To keep all characters inferior to a block range, here MiscellaneousSymbolsandPictographs and Emoticons, we delete all characters belonging to blocks above MiscellaneousSymbolsandPictographs.

require "unicode_blocr"

puts "hi😊".delete &.ord.>= UnicodeBlock::EnclosedIdeographicSupplement.value #=> hi
puts "café".delete &.ord.>= UnicodeBlock::BasicLatin.value #=> caf

License

Copyright (c) 2018-2019 Julien Reichardt - ISC License

Repository

unicode_blocr

Owner
Statistic
  • 2
  • 0
  • 0
  • 0
  • 0
  • almost 5 years ago
  • December 16, 2018
License

ISC License

Links
Synced at

Thu, 02 May 2024 07:22:09 GMT

Languages