Regex patterns
Matching emoji characters within a string can be difficult, as multiple codepoints, surrogate pairs,
variation selectors, zero width joiners, so on and so forth, must be taken into account. To make
this whole process easier, pre-built regex patterns are available in the emojibase-regex
package.
- Yarn
- NPM
yarn add emojibase-regex
npm install emojibase-regex
Usage
As stated, there are 7 regex patterns. One for matching emoji presentation characters, one for matching text presentation characters, one for matching both types of characters, and the last for matching shortcodes or emoticons.
emojibase-regex
- Matches both emoji and text presentation characters.emojibase-regex/emoji
- Matches only emoji presentation characters.emojibase-regex/emoji-loose
- Like the above but also includes characters withoutFE0F
.emojibase-regex/text
- Matches only text presentation characters.emojibase-regex/text-loose
- Like the above but also includes characters withoutFE0E
.emojibase-regex/emoticon
- Matches supported emoticons and their permutations.emojibase-regex/shortcode
- Matches supported shortcodes.emojibase-regex/shortcode-native
- Matches supported shortcodes in their native language (cldr-native
).
Each of these imports return a RegExp
instance with no flags defined.
import EMOJI_REGEX from 'emojibase-regex';
import EMOTICON_REGEX from 'emojibase-regex/emoticon';
import SHORTCODE_REGEX from 'emojibase-regex/shortcode';
import SHORTCODE_NATIVE_REGEX from 'emojibase-regex/shortcode-native';
`🙂`.match(EMOJI_REGEX);
':)'.match(EMOTICON_REGEX);
':pleased:'.match(SHORTCODE_REGEX);
':гвинея:'.match(SHORTCODE_NATIVE_REGEX);
The
u
(unicode) andg
(global) flags are not defined on these patterns.
The emoticon regex does not include word boundaries.
Unicode codepoint support
By default, regex patterns are generated using hexadecimal Unicode ranges. If desired, ES2015+
Unicode codepoint aware regex patterns can be used, which can be found in the codepoint
directory.
import CODEPOINT_EMOJI_REGEX from 'emojibase-regex/codepoint';
The
u
(unicode) flag is required (defined by default) when using these patterns.
Codepoint regex patterns are only supported in Node.js and modern browsers.
Unicode property support
An ECMAScript proposal to
support Unicode property escapes within regex is currently in the works. This proposal, if passed,
would enable regex patterns like the following: /\p{Emoji}/
. This feature would greatly reduce the
filesize of our regex patterns while being more accurate to the Unicode standard.
These patterns can be found in the property
directory, but use at your own risk!
import PROPERTY_EMOJI_REGEX from 'emojibase-regex/property';
Filesizes
File | Size | Gzipped |
---|---|---|
shortcode.js | 34 B | 54 B |
property/text.js | 60 B | 76 B |
property/emoji.js | 103 B | 93 B |
property/index.js | 115 B | 102 B |
emoticon.js | 463 B | 244 B |
shortcode-native.js | 652 B | 411 B |
text.js | 1.55 kB | 627 B |
codepoint/text.js | 1.89 kB | 648 B |
emoji.js | 11.89 kB | 2.41 kB |
emoji-loose.js | 12.26 kB | 2.21 kB |
text-loose.js | 12.26 kB | 2.21 kB |
codepoint/emoji.js | 12.46 kB | 2.45 kB |
codepoint/emoji-loose.js | 12.66 kB | 2.25 kB |
codepoint/text-loose.js | 12.66 kB | 2.25 kB |
index.js | 16.02 kB | 2.43 kB |
codepoint/index.js | 16.13 kB | 2.47 kB |