Datasets

Emoji's are generated into JSON files called datasets, with each dataset being grouped into one of the following: localized data, versioned data, and metadata. These datasets can be found within the emojibase-data package, or loaded from a CDN.

Yarn
NPM

yarn add emojibase-data

npm install emojibase-data

JSON files will need to be parsed manually unless handled by a build/bundle process.

Usage

As stated, there are 3 groups of datasets, each serving a specific purpose. The first group, localized data, is exactly that, datasets with localization provided by CLDR (view supported locales). These datasets return an array of emoji objects that adhere to the defined data structure.

import emojis from 'emojibase-data/<locale>/data.json';
import compactEmojis from 'emojibase-data/<locale>/compact.json';
import groupsSubgroups from 'emojibase-data/<locale>/messages.json';

The second group, versioned data, provides datasets for emoji and Unicode release versions. These datasets return a map, with the key being the version, and the value being an array of emoji hexcodes included in the associated release version.

emojibase-data/versions/emoji.json - Emoji characters grouped by emoji version.
emojibase-data/versions/unicode.json - Emoji characters grouped by Unicode version.

import unicodeVersions from 'emojibase-data/versions/unicode.json';

The third and last group, metadata, provides specialized datasets for unique use cases.

emojibase-data/meta/groups.json - A map of non-localized emoji groups (Smileys & People), subgroups (Sky & Weather), and hierarchy, according to the official Unicode data files.
emojibase-data/meta/hexcodes.json - A map of emoji hexcodes (hexadecimal codepoints) to an object of hexcodes with different qualified status: fully qualified, minimally qualified, and unqualified.
emojibase-data/meta/unicode.json - An array of all emoji unicode characters, including text and emoji presentation characters.
emojibase-data/meta/unicode-names.json - A map of hexcodes to official Unicode names for each emoji.

import { groups, subgroups, hierarchy } from 'emojibase-data/meta/groups.json';

Data structure

Each emoji character found within the pre-generated datasets are represented by an object composed of the properties listed below. In an effort to reduce the overall dataset filesize, most property values have been implemented using integers, with associated constants. View the Emoji object for a list of all available fields.

Not all properties will be found in the emoji object, as properties without an applicable value are omitted from the emoji object. This helps to reduce the filesize!

{
  annotation: 'man lifting weights',
  emoji: '🏋️‍♂️',
  gender: 1,
  group: 0,
  hexcode: '1F3CB-FE0F-200D-2642-FE0F',
  order: 1518,
  shortcodes: [
    'man_lifting_weights',
  ],
  subgroup: 0,
  tags: [
    'weight lifter',
    'man',
  ],
  type: 1,
  version: 4,
  skins: [
    {
      annotation: 'man lifting weights: light skin tone',
      emoji: '🏋🏻‍♂️',
      gender: 1,
      group: 0,
      hexcode: '1F3CB-1F3FB-200D-2642-FE0F',
      order: 1522,
      shortcodes: [
        'man_lifting_weights_tone1',
      ],
      subgroup: 0,
      type: 1,
      tone: 1,
      version: 4,
    },
    // ...
  ],
},

Compact format

While the emoji data is pretty thorough, not all of it may be required, and as such, a compact dataset is supported. View the CompactEmoji object for a list of all available fields.

To use a compact dataset, replace data.json with compact.json.

import data from 'emojibase-data/en/compact.json';

{
  annotation: 'man lifting weights',
  group: 0,
  hexcode: '1F3CB-FE0F-200D-2642-FE0F',
  order: 1518,
  shortcodes: [
    'man_lifting_weights',
  ],
  tags: [
    'weight lifter',
    'man',
  ],
  unicode: '🏋️‍♂️',
  skins: [
    {
      annotation: 'man lifting weights: light skin tone',
      group: 0,
      hexcode: '1F3CB-1F3FB-200D-2642-FE0F',
      order: 1522,
      shortcodes: [
        'man_lifting_weights_tone1',
      ],
      unicode: '🏋🏻‍♂️',
    },
    // ...
  ],
},

Messages format

The messages format is a special dataset that provides translations for groups, sub-groups, and any other related emoji metadata. The key in each message lines up with a defined TypeScript type alias.

import data from 'emojibase-data/en/messages.json';

{
  groups: [
    {
      key: 'smileys-emotion',
      message: 'smileys & emotion',
      order: 0,
    },
    // ...
  ],
  subgroups: [
    {
      key: 'face-smiling',
      message: 'smiling',
      order: 0,
    },
    // ...
  ],
  skinTones: [
    {
      key: 'light',
      message: 'light skin tone',
    },
    // ...
  ],
};

Fetching from a CDN

If you prefer to not inflate your bundle size with these large JSON datasets, you can fetch them from our CDN (provided by jsdelivr.com) using fetchFromCDN(), fetchEmojis(), or fetchShortcodes().

import { fetchFromCDN, fetchEmojis, fetchMessages, fetchShortcodes } from 'emojibase';

const englishEmojis = await fetchFromCDN('en/data.json', { shortcodes: ['github'] });
const japaneseCompactEmojis = await fetchEmojis('ja', { compact: true });
const germanCldrShortcodes = await fetchShortcodes('de', 'cldr');
const chineseTranslations = await fetchMessages('zh');

Fetching from your own CDN

If you want to load the JSON datasets from your own CDN, you can customize the cdnUrl using the options object.

When cdnUrl is a string, fetchFromCDN will append '/${path}' to the url. Make sure to include the version within the cdnUrl yourself, it's not added automatically to give you control over its placement.

import { fetchFromCDN, fetchEmojis, fetchMessages, fetchShortcodes } from 'emojibase';

const cdnUrl = 'https://example.com/cdn/emojidata/latest';

const englishEmojis = await fetchFromCDN('en/data.json', { shortcodes: ['github'], cdnUrl });
const japaneseCompactEmojis = await fetchEmojis('ja', { compact: true, cdnUrl });
const germanCldrShortcodes = await fetchShortcodes('de', 'cldr', { cdnUrl });
const chineseTranslations = await fetchMessages('zh', { cdnUrl });

cdnUrl can also be a function, so you have complete control over the format of the url. This function receives path and version as parameters. Version will be what you pass in within the options object, or it will default to latest. Note that version is also used for the cache key, so it's advised to set the option and not hard-code it in the cdnUrl function.

import { fetchFromCDN, fetchEmojis, fetchMessages, fetchShortcodes } from 'emojibase';

function cdnUrl(path: string, version: string): string {
  return `https://example.com/cdn/emojidata/${version}/${path}`;
}

const englishEmojis = await fetchFromCDN('en/data.json', { shortcodes: ['github'], cdnUrl });
const japaneseCompactEmojis = await fetchEmojis('ja', { compact: true, cdnUrl });
const germanCldrShortcodes = await fetchShortcodes('de', 'cldr', { cdnUrl });
const chineseTranslations = await fetchMessages('zh', { cdnUrl });

Supported locales

Follow locales are supported for both full and compact datasets.

Bengali (bu)
Chinese (zh)
Chinese, Traditional (zh-hant)
Danish (da)
Dutch (nl)
English (en)
English, Great Britain (en-gb)
Estonian (et)
Finnish (fi)
French (fr)
German (de)
Hindu (hi)
Hungarian (hu)
Italian (it)
Japanese (ja)
Korean (ko)
Lithuanian (lt)
Malay (ms)
Norwegian (nb)
Polish (pl)
Portuguese (pt)
Russian (ru)
Spanish (es)
Spanish, Mexico (es-mx)
Swedish (sv)
Thai (th)
Ukrainian (uk)

Filesizes

Sorted by original size in ascending order.

Emojis
Emojis (compact)
Shortcodes
Messages
Other

File	Size	Gzipped
zh-hant/data.json	674.07 kB	84.59 kB
sv/data.json	698.15 kB	82.79 kB
nb/data.json	701.89 kB	84.36 kB
zh/data.json	708.75 kB	93.53 kB
da/data.json	710.08 kB	86 kB
fi/data.json	715.13 kB	86.26 kB
et/data.json	722.12 kB	85.51 kB
es/data.json	730.87 kB	83 kB
es-mx/data.json	732.06 kB	83.39 kB
lt/data.json	735.13 kB	88.48 kB
en/data.json	735.59 kB	89.22 kB
en-gb/data.json	735.59 kB	89.22 kB
ja/data.json	736.62 kB	91.25 kB
ms/data.json	743.18 kB	86.71 kB
nl/data.json	748.35 kB	92.29 kB
pt/data.json	751.02 kB	94.34 kB
hu/data.json	761.02 kB	93.98 kB
ko/data.json	761.63 kB	100.11 kB
vi/data.json	762.17 kB	87.89 kB
fr/data.json	764.67 kB	95.29 kB
de/data.json	768.58 kB	97.99 kB
pl/data.json	772.79 kB	96.72 kB
it/data.json	799.19 kB	102.82 kB
ru/data.json	897.47 kB	104.02 kB
th/data.json	913.12 kB	92.7 kB
uk/data.json	931.48 kB	106 kB
hi/data.json	977.49 kB	103.19 kB
bn/data.json	1.02 MB	102.24 kB

File	Size	Gzipped
zh-hant/compact.json	479.62 kB	75.38 kB
sv/compact.json	503.7 kB	73.14 kB
nb/compact.json	507.44 kB	76.07 kB
zh/compact.json	514.3 kB	83.42 kB
da/compact.json	515.63 kB	76.4 kB
fi/compact.json	520.68 kB	76.38 kB
et/compact.json	527.67 kB	76.11 kB
es/compact.json	536.42 kB	73.51 kB
es-mx/compact.json	537.61 kB	73.8 kB
lt/compact.json	540.68 kB	78.71 kB
en/compact.json	541.14 kB	80.05 kB
en-gb/compact.json	541.14 kB	80.05 kB
ja/compact.json	542.17 kB	81.14 kB
ms/compact.json	548.73 kB	77.29 kB
nl/compact.json	553.9 kB	82.72 kB
pt/compact.json	556.57 kB	84.61 kB
hu/compact.json	566.57 kB	83.95 kB
ko/compact.json	567.17 kB	89.51 kB
vi/compact.json	567.72 kB	78.07 kB
fr/compact.json	570.22 kB	85.5 kB
de/compact.json	574.13 kB	87.44 kB
pl/compact.json	578.34 kB	86.66 kB
it/compact.json	604.74 kB	92.63 kB
ru/compact.json	703.02 kB	93.93 kB
th/compact.json	718.67 kB	82.66 kB
uk/compact.json	737.03 kB	95.63 kB
hi/compact.json	783.04 kB	92.86 kB
bn/compact.json	821.3 kB	92.12 kB

File	Size	Gzipped
fr/shortcodes/emojibase.json	42 B	62 B
en/shortcodes/cldr-native.json	258 B	184 B
en-gb/shortcodes/cldr-native.json	258 B	184 B
zh/shortcodes/emojibase-native.json	298 B	202 B
zh/shortcodes/emojibase.json	347 B	186 B
ja/shortcodes/emojibase.json	1.02 kB	472 B
ja/shortcodes/emojibase-native.json	1.09 kB	571 B
it/shortcodes/cldr-native.json	1.18 kB	496 B
nl/shortcodes/cldr-native.json	2.39 kB	725 B
ru/shortcodes/emojibase.json	19.23 kB	5.9 kB
ru/shortcodes/emojibase-native.json	25.23 kB	6.59 kB
da/shortcodes/emojibase-native.json	36.68 kB	6.7 kB
es/shortcodes/cldr-native.json	42.73 kB	8.38 kB
es-mx/shortcodes/cldr-native.json	42.76 kB	8.47 kB
de/shortcodes/cldr-native.json	43.33 kB	6.84 kB
nb/shortcodes/cldr-native.json	44.34 kB	7.4 kB
en/shortcodes/github.json	45.31 kB	15.6 kB
en/shortcodes/iamcal.json	47.83 kB	15.11 kB
pt/shortcodes/cldr-native.json	53.2 kB	10.53 kB
sv/shortcodes/emojibase-native.json	54.62 kB	10.14 kB
fr/shortcodes/cldr-native.json	55.89 kB	10.4 kB
da/shortcodes/cldr-native.json	56.77 kB	9.1 kB
fi/shortcodes/cldr-native.json	69.88 kB	10.7 kB
et/shortcodes/cldr-native.json	71.04 kB	12.06 kB
sv/shortcodes/cldr-native.json	72.39 kB	11.97 kB
lt/shortcodes/cldr-native.json	120.4 kB	20.59 kB
pl/shortcodes/cldr-native.json	124.71 kB	18.9 kB
en/shortcodes/emojibase-legacy.json	129.02 kB	24.32 kB
sv/shortcodes/emojibase.json	137.61 kB	27.25 kB
zh-hant/shortcodes/cldr-native.json	139.82 kB	27.14 kB
zh/shortcodes/cldr-native.json	144.03 kB	27.5 kB
hu/shortcodes/cldr.json	147.25 kB	27.16 kB
ja/shortcodes/cldr.json	147.6 kB	26.24 kB
hu/shortcodes/cldr-native.json	147.9 kB	25.67 kB
en/shortcodes/cldr.json	148.77 kB	26.83 kB
en-gb/shortcodes/cldr.json	148.77 kB	26.83 kB
zh-hant/shortcodes/cldr.json	149.27 kB	25.21 kB
da/shortcodes/cldr.json	150.32 kB	26.76 kB
da/shortcodes/emojibase.json	150.69 kB	29.99 kB
sv/shortcodes/cldr.json	150.92 kB	27.05 kB
th/shortcodes/cldr.json	150.97 kB	27.1 kB
nb/shortcodes/cldr.json	151.5 kB	26.79 kB
et/shortcodes/cldr.json	152.36 kB	27.01 kB
fi/shortcodes/cldr.json	153.47 kB	27.39 kB
nl/shortcodes/cldr.json	156.87 kB	27.49 kB
ja/shortcodes/cldr-native.json	156.96 kB	28.82 kB
vi/shortcodes/cldr.json	157.32 kB	26.44 kB
de/shortcodes/cldr.json	157.47 kB	27.62 kB
zh/shortcodes/cldr.json	157.57 kB	25.7 kB
en/shortcodes/emojibase.json	157.82 kB	30.03 kB
en-gb/shortcodes/emojibase.json	157.82 kB	30.03 kB
ru/shortcodes/cldr.json	158.47 kB	27.99 kB
pt/shortcodes/cldr.json	158.65 kB	27.69 kB
bn/shortcodes/cldr.json	161.54 kB	28.37 kB
ms/shortcodes/cldr.json	164.58 kB	27.8 kB
hi/shortcodes/cldr.json	164.82 kB	29.08 kB
pl/shortcodes/cldr.json	164.96 kB	28.54 kB
lt/shortcodes/cldr.json	165.08 kB	28.53 kB
it/shortcodes/cldr.json	165.37 kB	28.28 kB
ko/shortcodes/cldr-native.json	165.47 kB	29.22 kB
es-mx/shortcodes/cldr.json	165.53 kB	28.01 kB
fr/shortcodes/cldr.json	165.83 kB	28.03 kB
es/shortcodes/cldr.json	165.87 kB	27.99 kB
ko/shortcodes/cldr.json	166.01 kB	27.46 kB
uk/shortcodes/cldr.json	172.82 kB	29.26 kB
vi/shortcodes/cldr-native.json	178.58 kB	29.13 kB
ru/shortcodes/cldr-native.json	212.58 kB	31.13 kB
en/shortcodes/joypixels.json	223.11 kB	34.54 kB
th/shortcodes/cldr-native.json	235 kB	31.32 kB
uk/shortcodes/cldr-native.json	238.37 kB	32.93 kB
hi/shortcodes/cldr-native.json	269.67 kB	33.37 kB
bn/shortcodes/cldr-native.json	279.18 kB	32.56 kB

File	Size	Gzipped
zh/messages.json	6.2 kB	1.93 kB
zh-hant/messages.json	6.2 kB	1.93 kB
en/messages.json	6.5 kB	1.59 kB
en-gb/messages.json	6.5 kB	1.6 kB
da/messages.json	6.51 kB	1.79 kB
sv/messages.json	6.51 kB	1.8 kB
ms/messages.json	6.54 kB	1.81 kB
nb/messages.json	6.56 kB	1.8 kB
ko/messages.json	6.57 kB	2.09 kB
et/messages.json	6.61 kB	1.85 kB
nl/messages.json	6.64 kB	1.82 kB
de/messages.json	6.65 kB	1.91 kB
it/messages.json	6.65 kB	1.83 kB
fi/messages.json	6.66 kB	1.88 kB
pl/messages.json	6.66 kB	1.99 kB
pt/messages.json	6.74 kB	1.89 kB
es-mx/messages.json	6.76 kB	1.89 kB
es/messages.json	6.77 kB	1.9 kB
ja/messages.json	6.78 kB	2.24 kB
fr/messages.json	6.78 kB	1.91 kB
hu/messages.json	6.81 kB	2 kB
vi/messages.json	6.85 kB	2.12 kB
lt/messages.json	6.85 kB	1.95 kB
ru/messages.json	7.84 kB	2.33 kB
uk/messages.json	7.91 kB	2.38 kB
hi/messages.json	8.51 kB	2.31 kB
bn/messages.json	8.66 kB	2.32 kB
th/messages.json	9.1 kB	2.42 kB

File	Size	Gzipped
meta/groups.json	3.86 kB	1.24 kB
meta/unicode.json	72.55 kB	12.63 kB
versions/unicode.json	95.08 kB	11.98 kB
versions/emoji.json	95.12 kB	12.08 kB
meta/unicode-names.json	237.34 kB	28.48 kB
meta/hexcodes.json	258.7 kB	28.51 kB

Usage​

Data structure​

Compact format​

Messages format​

Fetching from a CDN​

Fetching from your own CDN​

Supported locales​

Filesizes​