Every named HTML character entity defined by the WHATWG living standard — from & for ampersand to &zwnj; for zero-width non-joiner. Each row gives the entity name, the full entity with leading ampersand and trailing semicolon, the actual rendered character, the decimal and hex Unicode code points, and the count of code points (most entities are 1 code point; a few are 2).

Pro tip: You only need to escape five characters in HTML: &, <, >, and inside attribute values " and '. Everything else is optional. Using © over the literal © is taste, not necessity, in a UTF-8 document.

Choose Columns

Select which columns to include in your download.

Name

Entity

Character

Decimal

Hex

CP Count

· ·

Export Format

CSV Free

JSON Free

SQL Pro

Excel Soon

Preview (first 5 rows)

About the HTML Entities Dataset

HTML defines a set of named character references — convenient aliases for Unicode characters that can be used anywhere in HTML markup. Some are essential (the five that need escaping in markup), most are conveniences for special characters in mathematics, classical alphabets, technical symbols, and arrows. This dataset comes from the WHATWG HTML Living Standard, which is the canonical source for browsers.

Common Use Cases

HTML encoders and decoders, content sanitization pipelines, code-editor entity autocomplete dictionaries, math equation editors, security scanners that look for unencoded HTML in user input, plain-text-to-HTML converters, content migration tools, and educational reference apps.

Column Reference

name — bare entity name without the leading & or trailing ;.
character — the actual rendered Unicode character.
decimal — Unicode code point(s) in decimal, space-separated for multi-code-point entities.
hex — Unicode code point(s) in hex (U+XXXX form).
codepoint_count — most entities are 1; a few are 2 (combined characters like &NotEqualTilde;).

Required vs Optional Entities

In an HTML document with UTF-8 encoding declared, only five characters must be entity-encoded: & (always), < (always), > (in text content), " (inside double-quoted attributes), and ' (inside single-quoted attributes). Every other named entity is a convenience — you can write the literal character instead. Older Latin-1 documents needed entities for accented letters, but UTF-8 makes those optional.

Numeric Character References

Beyond named entities, HTML supports numeric character references: © (decimal) or © (hex) for ©. These work for every Unicode code point regardless of whether it has a named entity. Numeric references are how you encode characters with no named alias.

Multi-Code-Point Entities

A few entities produce two code points. &NotEqualTilde; renders as ≂̸ (two combined characters), &nvgt; renders as ⪢ with a combining slash. These are mathematical and physics characters that the spec ships as composites for cleaner authoring. Filter codepoint_count = 2 to find them all.

Trailing Semicolon

Strictly speaking, HTML requires the trailing ; on every entity reference. Browsers historically tolerated missing semicolons for a small set of legacy entities (&amp, &lt, &gt, &copy, &reg) for backwards compatibility, but you should always write the semicolon. This dataset lists only the with-semicolon variants — the form your authoring tools should produce.

Honey-Do Tracker — home maintenance for landlords and property managers

137 Foundry — custom app building studio

HTML Entities Reference — All 2,125 Named Character References | EvvyTools

About the HTML Entities Dataset

Common Use Cases

Column Reference

Required vs Optional Entities

Numeric Character References

Multi-Code-Point Entities

Trailing Semicolon

More Data Lists

Locale Codes

US Federal Holidays (10-Year Calendar)

Vehicle Make / Model / Year

State / County / City