Regex patterns

Matching emoji characters within a string can be difficult, as multiple codepoints, surrogate pairs, variation selectors, zero width joiners, so on and so forth, must be taken into account. To make this whole process easier, pre-built regex patterns are available in the emojibase-regex package.

yarn add emojibase-regex

Usage

As stated, there are 7 regex patterns. One for matching emoji presentation characters, one for matching text presentation characters, one for matching both types of characters, and the last for matching shortcodes or emoticons.

  • emojibase-regex - Matches both emoji and text presentation characters.
  • emojibase-regex/emoji - Matches only emoji presentation characters.
  • emojibase-regex/emoji-loose - Like the above but also includes characters without FE0F.
  • emojibase-regex/text - Matches only text presentation characters.
  • emojibase-regex/text-loose - Like the above but also includes characters without FE0E.
  • emojibase-regex/emoticon - Matches supported emoticons and their permutations.
  • emojibase-regex/shortcode - Matches supported shortcodes.
  • emojibase-regex/shortcode-native - Matches supported shortcodes in their native language (cldr-native).

Each of these imports return a RegExp instance with no flags defined.

import EMOJI_REGEX from 'emojibase-regex';
import EMOTICON_REGEX from 'emojibase-regex/emoticon';
import SHORTCODE_REGEX from 'emojibase-regex/shortcode';
import SHORTCODE_NATIVE_REGEX from 'emojibase-regex/shortcode-native';
`🙂`.match(EMOJI_REGEX);
':)'.match(EMOTICON_REGEX);
':pleased:'.match(SHORTCODE_REGEX);
':гвинея:'.match(SHORTCODE_NATIVE_REGEX);

The u (unicode) and g (global) flags are not defined on these patterns.

The emoticon regex does not include word boundaries.

Unicode codepoint support

By default, regex patterns are generated using hexadecimal Unicode ranges. If desired, ES2015+ Unicode codepoint aware regex patterns can be used, which can be found in the codepoint directory.

import CODEPOINT_EMOJI_REGEX from 'emojibase-regex/codepoint';

The u (unicode) flag is required (defined by default) when using these patterns.

Codepoint regex patterns are only supported in Node.js and modern browsers.

Unicode property support

An ECMAScript proposal to support Unicode property escapes within regex is currently in the works. This proposal, if passed, would enable regex patterns like the following: /\p{Emoji}/. This feature would greatly reduce the filesize of our regex patterns while being more accurate to the Unicode standard.

These patterns can be found in the property directory, but use at your own risk!

import PROPERTY_EMOJI_REGEX from 'emojibase-regex/property';

Filesizes

FileSizeGzipped
shortcode.js34 B54 B
property/text.js60 B76 B
property/emoji.js102 B92 B
property/index.js114 B101 B
emoticon.js461 B247 B
shortcode-native.js507 B348 B
text.js1.49 KB609 B
codepoint/text.js1.82 KB639 B
emoji.js8.49 KB2.06 KB
emoji-loose.js8.77 KB1.87 KB
text-loose.js8.77 KB1.88 KB
index.js8.79 KB1.88 KB
codepoint/emoji.js9.42 KB2.11 KB
codepoint/emoji-loose.js9.54 KB1.91 KB
codepoint/text-loose.js9.54 KB1.91 KB
codepoint/index.js9.55 KB1.91 KB