Datasets

Emoji's are generated into JSON files called datasets, with each dataset being grouped into one of the following: localized data, versioned data, and metadata. These datasets can be found within the emojibase-data package, or loaded from a CDN.

yarn add emojibase-data

JSON files will need to be parsed manually unless handled by a build/bundle process.

Usage#

As stated, there are 3 groups of datasets, each serving a specific purpose. The first group, localized data, is exactly that, datasets with localization provided by CLDR (view supported locales). These datasets return an array of emoji objects that adhere to the defined data structure.

import emojis from 'emojibase-data/<locale>/data.json';
import compactEmojis from 'emojibase-data/<locale>/compact.json';
import groupsSubgroups from 'emojibase-data/<locale>/meta.json';

The second group, versioned data, provides datasets for emoji and Unicode release versions. These datasets return a map, with the key being the version, and the value being an array of emoji hexcodes included in the associated release version.

  • emojibase-data/versions/emoji.json - Emoji characters grouped by emoji version.
  • emojibase-data/versions/unicode.json - Emoji characters grouped by Unicode version.
import unicodeVersions from 'emojibase-data/versions/unicode.json';

The third and last group, metadata, provides specialized datasets for unique use cases.

  • emojibase-data/meta/groups.json - A map of non-localized emoji groups (Smileys & People), subgroups (Sky & Weather), and hierarchy, according to the official Unicode data files.
  • emojibase-data/meta/hexcodes.json - An array of all emoji hexcodes (hexadecimal codepoints).
  • emojibase-data/meta/unicode.json - An array of all emoji unicode characters, including text and emoji presentation characters.
  • emojibase-data/meta/unicode-names.json - A map of hexcodes to official Unicode names for each emoji.
import { groups, subgroups, hierarchy } from 'emojibase-data/meta/groups.json';

Data structure#

Each emoji character found within the pre-generated datasets are represented by an object composed of the properties listed below. In an effort to reduce the overall dataset filesize, most property values have been implemented using integers, with associated constants.

  • annotation (string) - A localized description, provided by CLDR, primarily used for text-to-speech (TTS) and accessibility.
  • emoji (string) - The emoji presentation Unicode character.
  • emoticon (string) - If applicable, an emoticon representing the emoji character.
  • gender (number) - If applicable, the gender of the emoji character. 0 for female, 1 for male.
  • group (number) - The categorical group the emoji belongs to. Undefined for uncategorized emojis.
  • hexcode (string) - The hexadecimal representation of the emoji Unicode codepoint. If the emoji supports both emoji and text variations, the hexcode will not include the variation selector. If a multi-person, multi-gender, or skin tone variation, the hexcode will include zero width joiners and variation selectors.
  • order (number) - The order in which emoji should be displayed on a device, through a keyboard or emoji picker. Undefined for unordered emojis.
  • shortcodes (string[]) - An array of community curated shortcodes. Does not include surrounding colons.
  • skins (emoji[]) - If applicable, an array of emoji objects for each skin tone modification, starting at light skin, and ending with dark skin.
  • subgroup (number) - The categorical subgroup the emoji belongs to. Undefined for uncategorized emojis.
  • tags (string[]) - An array of localized keywords, provided by CLDR, to use for searching and filtering.
  • text (string) - The text presentation Unicode character.
  • tone (number | number[]) - If applicable, the skin tone of the emoji character. 1 for light skin, 2 for medium-light skin, 3 for medium skin, 4 for medium-dark skin, and 5 for dark skin. Multi-person skin tones will be an array of values.
  • type (number) - The default presentation of the emoji character. 0 for text, 1 for emoji.
  • unicode (string) - Either the emoji or text presentation Unicode character. Only available in the compact dataset.
  • version (number) - The version in which the emoji character was released.

Not all properties will be found in the emoji object, as properties without an applicable value are omitted from the emoji object.

{
annotation: 'man lifting weights',
emoji: '馃弸锔忊嶁檪锔',
gender: 1,
group: 0,
hexcode: '1F3CB-FE0F-200D-2642-FE0F',
order: 1518,
shortcodes: [
'man_lifting_weights',
],
subgroup: 0,
tags: [
'weight lifter',
'man',
],
type: 1,
version: 4,
skins: [
{
annotation: 'man lifting weights: light skin tone',
emoji: '馃弸馃徎鈥嶁檪锔',
gender: 1,
group: 0,
hexcode: '1F3CB-1F3FB-200D-2642-FE0F',
order: 1522,
shortcodes: [
'man_lifting_weights_tone1',
],
subgroup: 0,
type: 1,
tone: 1,
version: 4,
},
// ...
],
},

Compact format#

While the emoji data is pretty thorough, not all of it may be required, and as such, a compact dataset is supported. This dataset supports the following properties: annotation, emoticon, group, hexcode, order, shortcodes, skins, tags, and unicode.

To use a compact dataset, replace data.json with compact.json.

import data from 'emojibase-data/en/compact.json';
{
annotation: 'man lifting weights',
group: 0,
hexcode: '1F3CB-FE0F-200D-2642-FE0F',
order: 1518,
shortcodes: [
'man_lifting_weights',
],
tags: [
'weight lifter',
'man',
],
unicode: '馃弸锔忊嶁檪锔',
skins: [
{
annotation: 'man lifting weights: light skin tone',
group: 0,
hexcode: '1F3CB-1F3FB-200D-2642-FE0F',
order: 1522,
shortcodes: [
'man_lifting_weights_tone1',
],
unicode: '馃弸馃徎鈥嶁檪锔',
},
// ...
],
},

Metadata format#

The metadata format is a special dataset that provides translations for groups, sub-groups, and any other related emoji metadata.

import data from 'emojibase-data/en/meta.json';
{
groups: [
{
key: 'smileys-emotion',
message: 'smileys & emotion',
order: 0,
},
// ...
],
subgroups: [
{
key: 'face-smiling',
message: 'smiling',
order: 0,
},
// ...
],
};

Fetching from a CDN#

If you prefer to not inflate your bundle size with these large JSON datasets, you can fetch them from our CDN (provided by jsdelivr.com) using fetchFromCDN(), fetchEmojis(), or fetchShortcodes().

import { fetchFromCDN, fetchEmojis, fetchMetadata, fetchShortcodes } from 'emojibase';
const englishEmojis = await fetchFromCDN('en/data.json', { shortcodes: ['github'] });
const japaneseCompactEmojis = await fetchEmojis('ja', { compact: true });
const germanCldrShortcodes = await fetchShortcodes('de', 'cldr');
const chineseTranslations = await fetchMetadata('zh');

Learn more about these functions in the API.

Supported locales#

Follow locales are supported for both full and compact datasets.

  • Chinese (zh)
  • Chinese, Traditional (zh-hant)
  • Danish (da)
  • Dutch (nl)
  • English (en)
  • English, Great Britain (en-gb)
  • Estonian (et)
  • Finnish (fi)
  • French (fr)
  • German (de)
  • Hungarian (hu)
  • Italian (it)
  • Japanese (ja)
  • Korean (ko)
  • Lithuanian (lt)
  • Malay (ms)
  • Norwegian (nb)
  • Polish (pl)
  • Portuguese (pt)
  • Russian (ru)
  • Spanish (es)
  • Spanish, Mexico (es-mx)
  • Swedish (sv)
  • Thai (th)
  • Ukrainian (uk)

Filesizes#

Sorted by original size in ascending order.

FileSizeGzipped
zh-hant/data.json602.68 KB66.62 KB
zh/data.json627.42 KB73.26 KB
sv/data.json637.91 KB72.06 KB
nb/data.json639.03 KB72.68 KB
da/data.json644.1 KB72.36 KB
en/data.json645.5 KB71.05 KB
en-gb/data.json645.99 KB71.3 KB
et/data.json650.27 KB71.69 KB
fi/data.json653.2 KB75.16 KB
fr/data.json656.8 KB71.71 KB
ko/data.json657.79 KB75.58 KB
nl/data.json658.55 KB72.71 KB
lt/data.json658.8 KB75.03 KB
pt/data.json659.54 KB74.77 KB
ja/data.json664.15 KB76.22 KB
ms/data.json668.67 KB72.51 KB
hu/data.json669.67 KB74.9 KB
es/data.json676.5 KB74.85 KB
es-mx/data.json676.83 KB75.15 KB
pl/data.json677.04 KB78.64 KB
it/data.json679.61 KB76.61 KB
de/data.json681.15 KB79.59 KB
ru/data.json787 KB85.28 KB
th/data.json801.72 KB76.32 KB
uk/data.json804.46 KB84.23 KB