crystal-emoji-regex v3.2.3
Crystal Emoji Regex π
A set of Ruby regular expressions for matching Unicode Emoji symbols.
Background
This is based upon the fantastic work from Jessica Stokes' [ruby-emoji-regex](https://github.com/ticky/ruby-emoji-regex)
which was based on Mathias Bynens' [emoji-regex](https://github.com/mathiasbynens/emoji-regex)
Javascript package. emoji-regex
is cleverly assembled based upon data from the Unicode Consortium.
The regular expressions provided herein are derived from that pacakge.
Installation
-
Add the dependency to your
shard.yml
:dependencies: emoji_regex: github: watzon/emoji_regex
-
Run
shards install
Usage
emoji_regex
provides these regular expressions:
-
EmojiRegex::RGIEmoji
is the regex you most likely want. It matches all emoji recommended for general interchange, as defined by the Unicode standard'sRGI_Emoji
property. In a future version, this regular expression will be renamed toEmojiRegex::Regex
and all other regexes removed. -
EmojiRegex::Regex
is deprecated, and will be replaced withRGIEmoji
in a future major version. It matches emoji which present as emoji by default, and those which present as emoji when combined withU+FE0F VARIATION SELECTOR-16
. -
EmojiRegex::Text
is deprecated, and will be removed in a future major version. It matches emoji which present as text by default (regardless of variation selector), as well as those which present as emoji by default.
RGI vs Emoji vs Text Presentation
RGI_Emoji
is a property of emoji symbols, defined in Unicode Technical Report #51 which marks emoji as being supported by major vendors and therefore expected to be usable generally. In most cases, this is the property you will want when seeking emoji characters.
Emoji_Presentation
is another such property, defined in UTR#51 which controls whether symbols are intended to be rendered as emoji by default.
Generally, for emoji which re-use Unicode code points which existed before Emoji itself was introduced to Unicode, Emoji_Presentation
is false
. Emoji_Presentation
may be true
but RGI_Emoji
false for characters with non-standard emoji-like representations in certain conditions. Notable cases are the Emoji Keycap Sequences (#οΈβ£, 1οΈβ£, 9οΈβ£, *οΈβ£, etc.) which are sequences composed of three characters; the base character, an U+FE0F VARIATION SELECTOR-16
, and finally the U+20E3 COMBINING ENCLOSING KEYCAP
.
These characters, therefore, are matched to varying degrees of precision by each of the regular expressions included in this package;
#
is matched only byEmojiRegex::Text
as it is considered to be a text part of a possible emoji.#οΈ
is matched byEmojiRegex::Regex
as well asEmojiRegex::Text
as it hasEmoji_Presentation
despite not being a generally accepted Emoji or recommended for general interchange.#οΈβ£
is matched by all three regular expressions, as it is recommended for general interchange.
It's most likely that the regular expression you want is EmojiRegex::RGIEmoji
! βΊοΈ
Example
require "emoji_regex"
text = <<-TEXT
\u{231A}: β default Emoji presentation character (Emoji_Presentation)
\u{2194}: β default text presentation character
\u{2194}\u{FE0F}: βοΈ default text presentation character with Emoji variation selector
#: # default text presentation character
#\u{FE0F}: #οΈ default text presentation character with Emoji variation selector
#\u{FE0F}\u{20E3}: #οΈβ£ default text presentation character with Emoji variation selector and combining enclosing keycap
\u{1F469}: π© Emoji modifier base (Emoji_Modifier_Base)
\u{1F469}\u{1F3FF}: π©πΏ Emoji modifier base followed by a modifier
TEXT
puts "EmojiRegex::RGIEmoji"
text.scan EmojiRegex::RGIEmoji do |match|
puts "Matched sequence #{match[0]} β code points: #{match[0].size}"
end
puts
puts "EmojiRegex::Regex"
text.scan EmojiRegex::Regex do |match|
puts "Matched sequence #{match[0]} β code points: #{match[0].size}"
end
puts
puts "EmojiRegex::Text"
text.scan EmojiRegex::Text do |match|
puts "Matched sequence #{match[0]} β code points: #{match[0].size}"
end
Console output:
EmojiRegex::RGIEmoji
Matched sequence β β code points: 1
Matched sequence β β code points: 1
Matched sequence βοΈ β code points: 2
Matched sequence βοΈ β code points: 2
Matched sequence #οΈβ£ β code points: 3
Matched sequence #οΈβ£ β code points: 3
Matched sequence π© β code points: 1
Matched sequence π© β code points: 1
Matched sequence π©πΏ β code points: 2
Matched sequence π©πΏ β code points: 2
EmojiRegex::Regex
Matched sequence β β code points: 1
Matched sequence β β code points: 1
Matched sequence βοΈ β code points: 2
Matched sequence βοΈ β code points: 2
Matched sequence #οΈ β code points: 2
Matched sequence #οΈ β code points: 2
Matched sequence #οΈβ£ β code points: 3
Matched sequence #οΈβ£ β code points: 3
Matched sequence π© β code points: 1
Matched sequence π© β code points: 1
Matched sequence π©πΏ β code points: 2
Matched sequence π©πΏ β code points: 2
EmojiRegex::Text
Matched sequence β β code points: 1
Matched sequence β β code points: 1
Matched sequence β β code points: 1
Matched sequence β β code points: 1
Matched sequence βοΈ β code points: 2
Matched sequence βοΈ β code points: 2
Matched sequence # β code points: 1
Matched sequence # β code points: 1
Matched sequence #οΈ β code points: 2
Matched sequence #οΈ β code points: 2
Matched sequence #οΈβ£ β code points: 3
Matched sequence #οΈβ£ β code points: 3
Matched sequence π© β code points: 1
Matched sequence π© β code points: 1
Matched sequence π©πΏ β code points: 2
Matched sequence π©πΏ β code points: 2
Development
Requirements
Initial setup
To install all the Ruby and Javascript dependencies, you can run:
bin/setup
To update the Ruby source files based on the emoji-regex
library:
bundle exec rake regenerate
Specs
A spec suite is provided, which can be run as:
crystal spec
Versioning Policy
The version of crystal-emoji-regex
will always track the upstream version from ruby-emoji-regex
.
crystal-emoji-regex
- 2
- 0
- 0
- 0
- 1
- almost 3 years ago
- January 5, 2022
MIT License
Sun, 17 Nov 2024 08:06:30 GMT