Skip to main content

Invisible Character Remover

Detect and remove hidden Unicode characters from text

EVT·T27
Hidden Glyph Audit

About the Invisible Character Remover

The Invisible Character Remover scans pasted text for the dozens of Unicode code points that occupy a string position without producing a visible glyph: zero-width spaces (U+200B), zero-width joiners and non-joiners (U+200C / U+200D), byte order marks (U+FEFF), soft hyphens (U+00AD), non-breaking spaces (U+00A0), right-to-left override (U+202E), and friends. It counts each occurrence by type, lets you keep legitimate ones, and strips the rest in a single click.

It is built for developers chasing “syntax error on line 1” when line 1 looks fine, content editors cleaning text pasted from Word or Google Docs, AI users stripping watermark characters from generated content, security engineers hardening user-input sanitation, and anyone whose code diff shows changes the editor refuses to display.

Detection and removal happen entirely in JavaScript on your device. Whatever you paste — proprietary source code, NDA-bound documents, customer-data exports — never leaves your browser. The page makes no network call after first load. This matters: invisible-character bugs frequently involve credentials, API keys, or schema fields you would rather not upload to a random web service.

Not every invisible character is a bug. Non-breaking spaces hold typographic phrases together (10 km, © 2026), zero-width joiners build composite emoji (the family glyph 👨‍👩‍👧‍👦 is a sequence of base characters joined by ZWJ), and soft hyphens give browsers permission to break long words at acceptable points. The tool lets you opt-in or opt-out per character class — aggressive stripping can break valid Unicode-rich content as easily as it fixes a parser bug.

Privacy100% client-side · pasted text never transmitted
ScopeUnicode 15.1 zero-width & format chars
Last reviewed2026-05-14 by Dennis Traina
Invisible Characters Found
0
Unique Types
0
Text Length
0
Status
Detection Map = invisible character position
Character Breakdown
Character Code Point Count Description
About detection modes: Common Only catches the most frequent offenders — zero-width spaces, zero-width joiners/non-joiners, byte order marks, and word joiners. All Invisible adds soft hyphens, non-breaking spaces, directional marks, variation selectors, invisible separators, and other rare Unicode control characters. Use “All Invisible” when debugging stubborn formatting issues.

Paste multiple text blocks separated by a line containing only --- to clean them all at once.

Upload a .txt or .csv file to clean without pasting.

Drop a .txt or .csv file here, or click to browse

Keep certain characters that are legitimate in specific contexts.

Batch processing, file upload, and whitelist require subscription
Save requires subscription
Honey-Do Tracker — home maintenance for landlords and property managers

What Are Invisible Unicode Characters?

Invisible Unicode characters are code points that occupy space in a string but produce no visible glyph when rendered. They were originally designed for legitimate purposes — controlling text direction in right-to-left languages, providing line-break hints to rendering engines, or joining emoji sequences into compound glyphs. The problem is that these characters are genuinely invisible: you cannot see them in a text editor, they do not appear in most search-and-replace dialogs, and they survive copy-and-paste operations intact. When they end up where they do not belong — inside source code, database fields, API payloads, or published content — they cause errors that are extremely difficult to diagnose because the text looks perfectly correct to the human eye.

Common Sources of Hidden Characters in Text

Hidden characters infiltrate text through several common pathways. PDF extraction is one of the most frequent sources: PDF renderers encode layout information using zero-width spaces, soft hyphens, and directional markers that get carried along when you copy text. Microsoft Word and Google Docs insert non-breaking spaces, byte order marks, and special whitespace characters for formatting control that persist when content is pasted into other applications. Web scraping often captures HTML entities and Unicode control characters embedded in page source. Messaging apps like Slack, WhatsApp, and Telegram use zero-width joiners internally for emoji rendering and occasionally leak them into plain-text exports. Even code editors can be culprits — Windows Notepad historically saved files with a UTF-8 BOM, and some IDE auto-formatters insert non-breaking spaces in place of regular spaces under certain locale settings.

How Invisible Characters Break Code and Formatting

In source code, a single zero-width space inside a variable name creates what appears to be a valid identifier but is actually a completely different token. The compiler or interpreter throws a syntax error pointing to a line that looks flawless. A BOM at the start of a PHP file causes “headers already sent” errors because the three-byte sequence is output before any header calls. In HTML and CSS, invisible characters inside class names or selectors silently break style matching without any visible indication in the markup. In databases, invisible characters in primary keys or indexed columns prevent exact-match queries from returning results even when the visible text matches perfectly. JSON and XML parsers may reject payloads containing unexpected control characters, producing cryptic parse errors. Email deliverability suffers when invisible characters appear in subject lines or headers, triggering spam filters tuned to detect obfuscation techniques.

Zero-Width Space: The Most Common Culprit

The zero-width space (U+200B) is far and away the most frequently encountered invisible character. Its legitimate purpose is to indicate optional line-break positions in scripts that do not use spaces between words, such as Thai, Khmer, and Chinese. Web browsers and word processors use it to suggest wrapping points in long URLs or unbroken strings. The problem is that it behaves like a real character in every other context: it has a string length of one, it affects equality comparisons, and it passes through most validation routines undetected. Two strings that look identical to a human — “hello” and “hel‎lo” — will fail a strict equality check if one contains a zero-width space. This character is the single most common cause of “it works when I retype it manually” debugging sessions.

How to Prevent Invisible Characters in Your Workflow

Prevention starts with your tools. Configure your code editor to display whitespace characters and use a font that renders zero-width characters with a visible placeholder glyph (JetBrains Mono and Fira Code both do this). When copying text from PDFs or documents, paste into a plain-text intermediary first — most invisible formatting characters survive rich-text paste but a plain-text round-trip strips some of them. Set up pre-commit hooks or CI pipeline steps that scan source files for unexpected Unicode code points; a simple regex check for characters in the U+200B–U+200F and U+2028–U+202F ranges catches the vast majority of offenders. For database inputs, add a sanitization layer that strips known invisible characters before storage. When working with external APIs, validate response bodies for control characters before parsing. These measures cost almost nothing in performance but save hours of debugging time over the life of a project.

Looking for related tools? Try our Text Diff Tool to compare text changes, or explore all Writing & Content tools.

Frequently Asked Questions

What are invisible Unicode characters?

They are code points that occupy space in a string but produce no visible glyph. Common examples include U+200B zero-width space, U+FEFF byte order mark, U+00A0 non-breaking space, and U+00AD soft hyphen. They exist for legitimate purposes like text direction and emoji joining but cause bugs when misplaced.

How do hidden characters end up in text?

The most common sources are PDF text extraction, copy-paste from Microsoft Word or Google Docs, AI-generated content, rich text editors that preserve formatting hints, and files saved by older Windows tools that add a BOM. Copy-paste operations preserve these characters invisibly.

Why does a syntax error say line 1 when line 1 looks fine?

Almost always because a byte order mark (U+FEFF) is at the start of the file. UTF-8 files should never contain a BOM, but Windows Notepad and some exporters add one by default. The parser sees it as an unexpected character even though the editor hides it.

Are invisible characters a security risk?

They can be. Attackers use zero-width joiners and right-to-left override characters to spoof filenames, create lookalike domains, and smuggle payloads past code review. Stripping them during input sanitation is a standard hardening practice for source control, CMS fields, and user-generated content.

Does the tool run in the browser?

Yes. Detection and removal happen entirely in JavaScript on the device, so pasted code and confidential text never leave the browser. That matters for internal source code, proprietary data, and anything under NDA.

137 Foundry — custom app building studio
Honey-Do Tracker — home maintenance for landlords and property managers
137 Foundry — custom app building studio
Link copied to clipboard!