Tool Deep Dives May 3, 2026 10 min read

Text Encoding Explained: Base64, URL & HTML Entities

Organized server rack with network cables representing structured data encoding

If you have ever stared at %20 in a URL and wondered why the space got replaced with a percent sign, or squinted at a long string of letters and slashes that turns out to be an image, you have already encountered encoding. It is woven through every layer of the web, from HTTP headers to JSON payloads to database storage. Getting it wrong produces bugs that are genuinely hard to trace.

Encoding is not encryption. Encryption transforms data so only authorized parties can read it back. Encoding transforms data so it survives a particular transport medium or storage format intact. The distinction matters in practice: Base64 encoded data is not secret. It is just reformatted. Anyone with a decoder can reverse it in under a second. Mixing up the two leads to security decisions based on false assumptions.

network rack cables organized infrastructure Photo by Vladimir Srajber on Pexels

Why Encoding Exists

Different systems handle different character sets differently. A URL can only contain a limited set of ASCII characters, so spaces, accented letters, and symbols have to be represented in a format that URL parsers understand. HTML treats < and > as markup, so displaying those characters literally in a browser requires entity references. Email servers historically mangled binary attachments, which is why Base64 became the standard for sending non-text content over mail protocols.

Every encoding format is a solution to a specific transport constraint. Knowing which constraint you are working around makes the choice obvious. Getting it wrong means your data gets corrupted somewhere between systems, often silently.

Base64 Encoding

Base64 represents binary data as a sequence of 64 printable ASCII characters: 26 uppercase letters, 26 lowercase letters, digits 0-9, plus + and /. A = character pads the output to a multiple of four characters.

The math: every three bytes of input (24 bits) produce exactly four Base64 characters. Each character encodes a 6-bit group. This means Base64 output is always roughly one-third larger than the original data. A 100-byte input becomes about 136 Base64 characters.

Common applications include embedding images directly in HTML or CSS as data: URIs, encoding binary file attachments in email, and passing binary data through REST APIs that expect text bodies. If you see a string starting with data:image/png;base64, everything after the comma is Base64 encoded pixel data.

Decoding is the exact reverse. Four characters map back to three bytes. Most languages include Base64 support in their standard libraries, but having a browser-based tool is faster when you just need to decode a single value during debugging without writing a throwaway script.

URL Encoding (Percent-Encoding)

URL encoding, formally defined as percent-encoding in RFC 3986, represents characters that are not safe in URLs as a percent sign followed by two hexadecimal digits. A space becomes %20, # becomes %23, & becomes %26, and + becomes %2B.

The "unsafe" characters are those with special meaning in URL syntax. The ? character starts a query string. The & separates query parameters. The # introduces a fragment identifier. The / separates path segments. If your data contains any of these, they must be percent-encoded or the URL parser interprets them as structure rather than content.

A frequent bug is double-encoding. If you encode a string twice, %20 becomes %2520 on the second pass because % itself encodes to %25. This looks fine in development but breaks when an intermediate layer (a proxy, a framework router, a CDN rewrite rule) does its own encoding pass on top of yours.

Form data submitted with the application/x-www-form-urlencoded content type uses a variant where spaces become + instead of %20. The difference matters when constructing query strings manually. A library handles this correctly; doing it by hand requires knowing which variant to use.

fiber optic cables glowing blue data transmission Photo by Brett Sayles on Pexels

HTML Entities

HTML reserves five characters as syntax: <, >, &, ", and '. To display these characters literally in a browser rather than have the parser interpret them as markup, you use entity references: <, >, &, ", and '.

Skipping this step is the root cause of most cross-site scripting (XSS) vulnerabilities. When user-supplied text gets inserted into an HTML document without encoding, an attacker can inject <script> tags that execute in the context of anyone who loads the page. The OWASP Cheat Sheet Series covers output encoding and XSS prevention thoroughly and is a practical reference for any developer building user-facing applications.

Named entities like © for the copyright symbol and ® for registered trademark are convenient for editorial content. Numeric references like © and ® are more portable across HTML parsers and email clients that may not recognize all named entities. Numeric hex references use the &#x prefix followed by the Unicode code point in hexadecimal, for example © for copyright.

Hex Encoding and Binary

Hex encoding represents each byte as two hexadecimal digits. The byte value 65 (ASCII for capital A) becomes 41 in hex. Every byte is exactly two characters, which makes hex strings easy to parse programmatically.

Hex strings appear in cryptographic output (SHA-256 hashes produce 64 hex characters, HMAC-SHA256 produces the same), CSS color values (#ff6347 for tomato red), and protocol debugging. When you are examining a raw network packet, a binary file header, or a database column stored as varbinary, hex encoding is what makes the raw bytes human-readable.

Binary encoding goes further, representing each byte as eight zeros and ones. It is rarely useful at the application layer but comes up in bit-manipulation work, low-level protocol design, and situations where you need to inspect the exact bit pattern of a value, such as understanding bitmask fields in a flags register.

Unicode Escape Sequences

JavaScript and many other languages use Unicode escape sequences to represent characters as their code points. The four-digit form \uXXXX handles the Basic Multilingual Plane (U+0000 through U+FFFF). The extended form \u{XXXXXX} covers code points above U+FFFF, including emoji.

The information-about-information on Unicode lives at unicode.org, which also hosts the full character tables if you need to look up a specific code point.

Unicode escapes are useful when you need to embed non-ASCII characters in source code or configuration files that must remain strictly ASCII-safe. They also appear in serialized JSON when an encoder escapes non-ASCII characters for transport compatibility, particularly in environments that cannot guarantee UTF-8 end to end.

JWT Decoding

A JSON Web Token (JWT) consists of three Base64URL-encoded parts separated by dots: a header, a payload, and a signature. Base64URL is a URL-safe variant of Base64 that substitutes + with - and / with _ and omits the = padding character, making the entire token safe to include in a URL without further encoding.

Decoding a JWT means splitting on the dots and Base64URL-decoding the header and payload sections. The result is readable JSON containing the token's claims: subject, expiration, issued-at time, and any custom claims the issuer added. The signature portion verifies authenticity and requires the server's secret key to validate; browser-side decoding only reads the claims.

Quick JWT inspection is a development and debugging tool. If you are verifying that an API issues the right subject claim, checking an expiration timestamp, or confirming that a custom claim is present, decoding the payload is exactly what the job requires. The MDN documentation on Web APIs covers the Web Crypto API if you need to validate signatures client-side.

server hardware blade units installed rack Photo by panumas nikhomkhai on Pexels

ROT13

ROT13 shifts each letter 13 positions forward in the alphabet. A becomes N, B becomes O, Z wraps around to M. Because there are 26 letters, applying ROT13 twice returns the original text exactly.

It is not a cipher in any meaningful security sense, but it serves its original purpose well: lightly obscuring text so it is not immediately readable on a quick scan. Usenet used ROT13 for spoilers in discussion threads so readers would not accidentally see the ending of a film while scrolling. Some forums and content communities still use it for the same reason.

"When I build developer tools, I think about the workflow interruption cost. Switching to a terminal or writing a throwaway script to decode a single value breaks focus. A unified encoder that handles the whole encoding surface without context-switching is one of those genuinely practical things." - Dennis Traina, founder of 137Foundry

Encoding Chains and Auto-Detection

Real data often passes through multiple encoding layers before it reaches you. A value might be JSON-serialized, then Base64-encoded, then URL-encoded as a query parameter. Unwrapping it requires reversing each layer in the right order, and if you are not sure what encoding was applied, you have to guess.

Auto-detection shortens the guessing phase. Base64 has a recognizable character set and length properties. Percent-encoded strings contain %XX patterns. HTML entities start with & and end with ;. A tool that identifies the encoding from the content can apply the right decoder without you having to test each one manually.

Encoding chains let you compose transformations in sequence. URL-encoding a value and then Base64-encoding the result, or reversing that order for decoding, can be done as a single operation when the transformations are configurable.

Batch Processing

Encoding a hundred values one at a time is the kind of work that leads to mistakes. A batch mode accepts a newline-separated column of inputs and returns a column of encoded or decoded outputs in the same order. This is practical for data migration (encoding a field across an entire CSV before import), test fixture preparation, and auditing an existing dataset for unescaped values.

The free encoding toolkit by EvvyTools covers all of the formats above with auto-detection, encoding chains, and batch processing in one place. The broader EvvyTools directory includes related utilities for development and content work, and the EvvyTools blog has additional deep-dives into developer tooling.

data center storage racks rows organized Photo by Brett Sayles on Pexels

Choosing the Right Encoding

A few rules of thumb that hold up across most situations:

Use Base64 when transporting binary data over a text channel, whether that is email, a JSON body, or a CSS data: URI.

Use URL encoding when constructing query strings or embedding dynamic content in URL paths.

Use HTML entities when inserting dynamic content into any HTML document, without exception.

Use Hex when you need a human-readable, byte-exact representation of binary data, such as cryptographic output or protocol byte sequences.

Use Unicode escapes when source files must remain ASCII-safe but the data includes non-ASCII characters.

Use JWT decoding when you need to inspect token claims during development or debugging.

Use ROT13 when you want to obscure text without protecting it, specifically in editorial or community contexts.

The Wikipedia articles on Base64 and percent-encoding are solid starting points for the full technical specifications. RFC 4648 at the IETF Datatracker is the authoritative reference for Base64, Base32, and Base16 encoding definitions.

Encoding problems are almost always invisible when they go right. When they go wrong, the error messages are often misleading because the failure happens in the handoff between two systems that each expected the other to handle the conversion. Knowing what the formats are, and having a fast way to test transformations, is the difference between a five-minute fix and an hour of confused debugging.

base64 developer tools encoding url-encoding web-development