Skip to main content

Datasets

Emoji's are generated into JSON files called datasets, with each dataset being grouped into one of the following: localized data, versioned data, and metadata. These datasets can be found within the emojibase-data package, or loaded from a CDN.

yarn add emojibase-data

JSON files will need to be parsed manually unless handled by a build/bundle process.

Usage

As stated, there are 3 groups of datasets, each serving a specific purpose. The first group, localized data, is exactly that, datasets with localization provided by CLDR (view supported locales). These datasets return an array of emoji objects that adhere to the defined data structure.

import emojis from 'emojibase-data/<locale>/data.json';
import compactEmojis from 'emojibase-data/<locale>/compact.json';
import groupsSubgroups from 'emojibase-data/<locale>/messages.json';

The second group, versioned data, provides datasets for emoji and Unicode release versions. These datasets return a map, with the key being the version, and the value being an array of emoji hexcodes included in the associated release version.

  • emojibase-data/versions/emoji.json - Emoji characters grouped by emoji version.
  • emojibase-data/versions/unicode.json - Emoji characters grouped by Unicode version.
import unicodeVersions from 'emojibase-data/versions/unicode.json';

The third and last group, metadata, provides specialized datasets for unique use cases.

  • emojibase-data/meta/groups.json - A map of non-localized emoji groups (Smileys & People), subgroups (Sky & Weather), and hierarchy, according to the official Unicode data files.
  • emojibase-data/meta/hexcodes.json - A map of emoji hexcodes (hexadecimal codepoints) to an object of hexcodes with different qualified status: fully qualified, minimally qualified, and unqualified.
  • emojibase-data/meta/unicode.json - An array of all emoji unicode characters, including text and emoji presentation characters.
  • emojibase-data/meta/unicode-names.json - A map of hexcodes to official Unicode names for each emoji.
import { groups, subgroups, hierarchy } from 'emojibase-data/meta/groups.json';

Data structure

Each emoji character found within the pre-generated datasets are represented by an object composed of the properties listed below. In an effort to reduce the overall dataset filesize, most property values have been implemented using integers, with associated constants. View the Emoji object for a list of all available fields.

Not all properties will be found in the emoji object, as properties without an applicable value are omitted from the emoji object. This helps to reduce the filesize!

{
annotation: 'man lifting weights',
emoji: '🏋️‍♂️',
gender: 1,
group: 0,
hexcode: '1F3CB-FE0F-200D-2642-FE0F',
order: 1518,
shortcodes: [
'man_lifting_weights',
],
subgroup: 0,
tags: [
'weight lifter',
'man',
],
type: 1,
version: 4,
skins: [
{
annotation: 'man lifting weights: light skin tone',
emoji: '🏋🏻‍♂️',
gender: 1,
group: 0,
hexcode: '1F3CB-1F3FB-200D-2642-FE0F',
order: 1522,
shortcodes: [
'man_lifting_weights_tone1',
],
subgroup: 0,
type: 1,
tone: 1,
version: 4,
},
// ...
],
},

Compact format

While the emoji data is pretty thorough, not all of it may be required, and as such, a compact dataset is supported. View the CompactEmoji object for a list of all available fields.

To use a compact dataset, replace data.json with compact.json.

import data from 'emojibase-data/en/compact.json';
{
annotation: 'man lifting weights',
group: 0,
hexcode: '1F3CB-FE0F-200D-2642-FE0F',
order: 1518,
shortcodes: [
'man_lifting_weights',
],
tags: [
'weight lifter',
'man',
],
unicode: '🏋️‍♂️',
skins: [
{
annotation: 'man lifting weights: light skin tone',
group: 0,
hexcode: '1F3CB-1F3FB-200D-2642-FE0F',
order: 1522,
shortcodes: [
'man_lifting_weights_tone1',
],
unicode: '🏋🏻‍♂️',
},
// ...
],
},

Messages format

The messages format is a special dataset that provides translations for groups, sub-groups, and any other related emoji metadata. The key in each message lines up with a defined TypeScript type alias.

import data from 'emojibase-data/en/messages.json';
{
groups: [
{
key: 'smileys-emotion',
message: 'smileys & emotion',
order: 0,
},
// ...
],
subgroups: [
{
key: 'face-smiling',
message: 'smiling',
order: 0,
},
// ...
],
skinTones: [
{
key: 'light',
message: 'light skin tone',
},
// ...
],
};

Fetching from a CDN

If you prefer to not inflate your bundle size with these large JSON datasets, you can fetch them from our CDN (provided by jsdelivr.com) using fetchFromCDN(), fetchEmojis(), or fetchShortcodes().

import { fetchFromCDN, fetchEmojis, fetchMessages, fetchShortcodes } from 'emojibase';

const englishEmojis = await fetchFromCDN('en/data.json', { shortcodes: ['github'] });
const japaneseCompactEmojis = await fetchEmojis('ja', { compact: true });
const germanCldrShortcodes = await fetchShortcodes('de', 'cldr');
const chineseTranslations = await fetchMessages('zh');

Fetching from your own CDN

If you want to load the JSON datasets from your own CDN, you can customize the cdnUrl using the options object.

When cdnUrl is a string, fetchFromCDN will append '/${path}' to the url. Make sure to include the version within the cdnUrl yourself, it's not added automatically to give you control over its placement.

import { fetchFromCDN, fetchEmojis, fetchMessages, fetchShortcodes } from 'emojibase';

const cdnUrl = 'https://example.com/cdn/emojidata/latest';

const englishEmojis = await fetchFromCDN('en/data.json', { shortcodes: ['github'], cdnUrl });
const japaneseCompactEmojis = await fetchEmojis('ja', { compact: true, cdnUrl });
const germanCldrShortcodes = await fetchShortcodes('de', 'cldr', { cdnUrl });
const chineseTranslations = await fetchMessages('zh', { cdnUrl });

cdnUrl can also be a function, so you have complete control over the format of the url. This function receives path and version as parameters. Version will be what you pass in within the options object, or it will default to latest. Note that version is also used for the cache key, so it's advised to set the option and not hard-code it in the cdnUrl function.

import { fetchFromCDN, fetchEmojis, fetchMessages, fetchShortcodes } from 'emojibase';

function cdnUrl(path: string, version: string): string {
return `https://example.com/cdn/emojidata/${version}/${path}`;
}

const englishEmojis = await fetchFromCDN('en/data.json', { shortcodes: ['github'], cdnUrl });
const japaneseCompactEmojis = await fetchEmojis('ja', { compact: true, cdnUrl });
const germanCldrShortcodes = await fetchShortcodes('de', 'cldr', { cdnUrl });
const chineseTranslations = await fetchMessages('zh', { cdnUrl });

Supported locales

Follow locales are supported for both full and compact datasets.

  • Bengali (bu)
  • Chinese (zh)
  • Chinese, Traditional (zh-hant)
  • Danish (da)
  • Dutch (nl)
  • English (en)
  • English, Great Britain (en-gb)
  • Estonian (et)
  • Finnish (fi)
  • French (fr)
  • German (de)
  • Hindu (hi)
  • Hungarian (hu)
  • Italian (it)
  • Japanese (ja)
  • Korean (ko)
  • Lithuanian (lt)
  • Malay (ms)
  • Norwegian (nb)
  • Polish (pl)
  • Portuguese (pt)
  • Russian (ru)
  • Spanish (es)
  • Spanish, Mexico (es-mx)
  • Swedish (sv)
  • Thai (th)
  • Ukrainian (uk)

Filesizes

Sorted by original size in ascending order.

FileSizeGzipped
zh-hant/data.json652.4 kB74.51 kB
zh/data.json677.53 kB81.19 kB
da/data.json693.39 kB80.1 kB
sv/data.json694.83 kB81.18 kB
nb/data.json696.11 kB81.41 kB
en/data.json703.04 kB79.88 kB
en-gb/data.json703.06 kB79.88 kB
et/data.json708.59 kB80.58 kB
ko/data.json712.47 kB83.49 kB
fi/data.json713.38 kB84.6 kB
ja/data.json717.15 kB84.29 kB
nl/data.json717.6 kB81.56 kB
fr/data.json718.26 kB80.97 kB
lt/data.json720.83 kB84.39 kB
pt/data.json721.24 kB83.97 kB
ms/data.json729.53 kB81.53 kB
hu/data.json731.32 kB84.24 kB
es/data.json738.22 kB84.2 kB
pl/data.json739.01 kB87.87 kB
es-mx/data.json739.11 kB84.44 kB
it/data.json742.89 kB86.02 kB
de/data.json745.33 kB89.42 kB
ru/data.json867.72 kB95.25 kB
th/data.json876.41 kB84.88 kB
uk/data.json893.5 kB95.08 kB
hi/data.json922.76 kB91.73 kB
bn/data.json945.98 kB89.5 kB