What X / Twitter actually counts as a character

The "280 characters" limit on an X (formerly Twitter) post is one of the most-quoted numbers on the internet, and almost nobody knows what it actually counts. Most external tools either count Latin characters one-for-one and miss the CJK weighting, or count UTF-16 code units and split modern emoji into pieces. Both produce the wrong number.

This article walks through how X actually scores a post, why it is the way it is, and what that means for any tool that claims to predict whether your post fits.

The short answer

X uses a weighted codepoint count:

Most characters count as 1: ASCII, Latin with diacritics (naïve), Greek, Hebrew, Arabic, most punctuation, single emoji.
Most CJK characters count as 2: Chinese, Japanese (kanji and kana), Korean (hanja and hangul).
Most Cyrillic characters count as 2: Russian, Bulgarian, Serbian, Ukrainian, etc.
Compound emoji (family emoji, flag emoji, skin-tone variants) count as 1 total — not the underlying codepoint count.
URLs always count as 23, regardless of their actual length.

The free-tier limit is 280 weighted units. The premium ("X Premium") limit is 4,000.

If you write English, this is roughly the same as counting Latin characters. If you write Russian or Japanese, you have effectively half the room.

The CJK doubling, and why

A typical Chinese sentence packs more meaning per character than a typical English one. A 280-character English tweet might be 50 words; the same number of Chinese hanzi is closer to a paragraph. When Twitter doubled its post length from 140 to 280 in 2017, it kept the CJK and Cyrillic weighting unchanged because Chinese and Japanese users had not been fitness-constrained by the 140 limit in the same way English users had.

The exact rule is defined in the twitter-text library, an open-source reference implementation that X publishes for client developers. The algorithm:

Normalise the text to Unicode NFC (canonical composition).
Iterate over codepoints.
For each codepoint, look up its weight in a range table:
- Range 0x0000 – 0x10FF (most Latin and basic punctuation): weight 1
- Range 0x2000 – 0x200D (general punctuation, zero-width joiner): weight 1
- Range 0x2010 – 0x201F (dashes, quotes): weight 1
- All other codepoints: weight 2
Sum the weights.

The default-2 catch-all is what makes CJK and Cyrillic count double — they sit outside the explicit weight-1 ranges. Most emoji also fall outside, but they are handled by a separate emoji-grapheme rule that overrides the per-codepoint weight: a complete emoji grapheme counts as 2 units, regardless of how many codepoints make it up.

The emoji rule, in detail

A single emoji is one grapheme — what a reader sees as a single character — but can be composed of multiple codepoints joined by zero-width joiners or modified by skin-tone selectors. For X's purposes:

A simple emoji like 😀 (U+1F600) is one grapheme, counts as 2 units.
A skin-toned emoji like 👋🏽 (waving hand + medium skin tone) is one grapheme, counts as 2 units.
A compound emoji like 👨‍👩‍👧 (family: man, woman, girl) is one grapheme — 4 codepoints joined by ZWJs — counts as 2 units.
A flag emoji like 🇳🇱 (regional indicators NL) is one grapheme — 2 codepoints — counts as 2 units.

The rule is "one emoji grapheme = 2 units". A naive UTF-16 code-unit count would dramatically overcharge for compound emoji (the family emoji is 8 UTF-16 code units), and a naive codepoint count would charge based on internal ZWJs (also wrong). Neither matches X.

A reference character counter that uses Intl.Segmenter to count graphemes will agree with X for emoji. A counter that uses string.length in JavaScript will not.

URLs are always 23

This is the rule that surprises people most: every URL counts as 23 characters, no matter how long the actual URL is.

When you post a link, X automatically wraps it through the t.co URL shortener. A 200-character medium.com URL becomes a t.co/abc1234 of 23 characters in the rendered post. The composer counts it as 23 from the start so your character budget reflects what readers will see.

The 23 figure is hard-coded in the twitter-text library and has not changed since the t.co shortener was introduced in 2010 (it was 22 originally, bumped to 23 to add a character for HTTPS URLs). Any tool that estimates X post length by counting bare URL characters will be off by 100+ characters on a typical link-heavy post.

The rule applies to:

Any URL with http:// or https://
Most bare domains that look like URLs (example.com/path)
Email-style links (mailto:user@example.com)

What it does not apply to:

Domain mentions in surrounding text that the parser does not consider URL-like
Code blocks, if any client format applies them (X does not have official code blocks)

How emoji vs CJK vs URLs interact

A few worked examples on the free-tier 280 limit:

"Hello, world!" — 13 weight-1 characters → 13 units, leaving 267.
"你好，世界！" — 6 CJK characters → 12 units (each weight 2), leaving 268.
"Привет, мир!" — 12 codepoints, of which 9 are Cyrillic letters (weight 2) and 3 are space + punctuation (weight 1) → 21 units, leaving 259.
"https://medium.com/some-very-long-article-slug-that-keeps-going" — 1 URL → 23 units, leaving 257.
"Read this 👀: https://example.com/post" — 10 weight-1 chars + 1 emoji (2 units) + 1 URL (23 units) → 35 units, leaving 245.

The pattern: English plus links plus single emoji is roughly counted-as-typed. CJK or Russian effectively halves your budget. URLs always cost 23 even if they look like 30 or 200.

Other platforms count differently

For comparison:

Bluesky counts pure codepoints with no doubling — 300 codepoints, full stop. CJK takes the same space as English.
Mastodon counts grapheme clusters — typically 500 — with no doubling. Closer to "what a reader sees".
SMS is the strangest: 160 septets in GSM-7 encoding, but as soon as a single character outside GSM-7 appears (most emoji, smart quotes, em-dashes), the entire message switches to UCS-2 with a 70-character-per-segment limit.
LinkedIn counts UTF-16 code units, which means non-BMP emoji eat 2 of your 3,000 characters each.
HTML <title> has no formal limit, but Google truncates at around 50–60 characters in search results.

If you write a multi-platform post composer, you cannot use a single character count. You need a per-platform count, and the per-platform rules are not interchangeable.

What this means for tools

A character-counter tool that wants to be accurate for X has to:

Detect URLs and replace each one with a fixed-23-unit cost in the count.
Iterate codepoints (not UTF-16 code units) to apply the weight-1-vs-weight-2 rule correctly.
Use grapheme segmentation (Intl.Segmenter) to count compound emoji as one unit, not many.

Most "tweet length" counters online do step 2 but not step 1 or 3, so they over-count posts with links (which is most posts) and miscount posts with compound emoji (which is many posts). The character counter on this site uses grapheme segmentation, which gets emoji right, and reports a plain codepoint count — useful if you want to know "how many characters did I type?" but not the number X will use to decide if your post fits.

For platform-aware counting, the canonical reference is the twitter-text library, which X publishes in JavaScript, Java, Ruby, and Objective-C. Most production tools that need an accurate count import it directly rather than reimplementing the rules.

A few historical curiosities

A handful of edge cases that come up in technical writing about X's character count:

Some Chinese punctuation is weight-1, not weight-2. The full-width comma (，, U+FF0C) and full-width period (。, U+3002) sit in the catch-all default-2 region. The half-width versions are weight-1. Most Chinese writers use full-width punctuation, so a "Chinese sentence" really does count roughly as twice its character count.
Combining marks count separately. A precomposed é (U+00E9) is one weight-1 codepoint = 1 unit. The decomposed form e + combining acute (U+0065 + U+0301) is two weight-1 codepoints = 2 units. NFC normalisation collapses most cases to the precomposed form, but pasted text from some sources can be in NFD and silently cost extra.
Math and symbol blocks are weight-2. Mathematical italic letters, blackboard-bold letters, and obscure symbol-block characters all default to weight 2. Posts that show off 𝑥 or ℝ notation cost more than they look.
Variation selectors are free. The variation-selector codepoints that switch some emoji between text-style and emoji-style presentations have weight 1, but they are usually subsumed into the grapheme-emoji rule and contribute zero to the post's count.

The summary

X counts a weighted codepoint score, with most characters = 1 and CJK + Cyrillic + most non-Latin scripts = 2.
Each emoji grapheme = 2, regardless of internal complexity.
Each URL = 23, regardless of actual length.
The rules are codified in the open-source twitter-text library; clients import that library rather than reimplementing.
Other platforms count differently — Bluesky and Mastodon do not double CJK; LinkedIn counts UTF-16 code units; SMS switches encodings under your feet.

If you need to predict whether a specific post fits X's limit, count via twitter-text. If you need a general-purpose character count for any other purpose, count graphemes. Almost nobody actually wants to count UTF-16 code units, even though that is what most string-length functions report by default.