txtkit

Character Counter (Dev)

Developer-focused character analysis: Unicode character count, UTF-8 byte count, JSON validation, CSV structure detection, and non-printable character scanning.

Paste text or code to analyse

How to use the dev character counter

Paste any text, JSON, or CSV into the editor. The six stat rows update immediately. An amber highlight on the bytes row signals that multi-byte characters are present — meaning byte count exceeds character count. Green and blue badges appear automatically when valid JSON or CSV structure is detected. A red highlight on the non-printable row flags invisible characters that might be causing issues.

How byte count is calculated

Byte count uses the browser's built-in TextEncoder API with UTF-8 encoding. new TextEncoder().encode(text).length returns the exact byte length of the string as it would be stored or transmitted in UTF-8. This matches what most databases, HTTP stacks, and file systems report.

Why developers need character vs byte awareness

Bugs from character/byte confusion are common. A VARCHAR(255) column stores 255 bytes in utf8mb4 — that is only 63 emoji, not 255. A cookie limited to 4096 bytes can hold far fewer Asian characters than the byte count suggests. An API with a “160 character” limit might actually mean 160 bytes. This tool makes the distinction visible immediately.

Frequently asked questions

What is the difference between character count and byte count?

Characters and bytes are not always the same. ASCII characters (A–Z, 0–9, basic punctuation) are 1 byte each in UTF-8. Most accented Latin characters (é, ü, ñ) are 2 bytes. Many emoji are 4 bytes. Chinese/Japanese/Korean characters are typically 3 bytes. If your system has a byte limit (database column size, API request limit, file size constraint), use the byte count — not the character count.

Why does my string have more bytes than characters?

Because one or more characters in your text require multiple bytes to encode in UTF-8. This is normal for any text containing emoji, accented characters, or non-Latin scripts. The byte/character ratio column shows the average bytes per character — a ratio above 1.0 means multi-byte characters are present. The tool highlights this in amber as a heads-up.

What does the JSON validation check?

It runs your text through JSON.parse() and reports whether it's valid JSON. If valid, it shows the number of top-level keys (for objects) or items (for arrays). It does not validate against a schema — just that the syntax is correct. Useful for verifying API responses, config files, or data exports before piping them into another tool.

What does the CSV detection check?

It checks whether the input looks like CSV by confirming: every line has the same number of comma-separated fields, and there are at least 2 rows and 2 columns. If detected, it shows the row count and column count. It assumes comma delimiter — it does not currently detect tab-separated or semicolon-separated files. For a more robust CSV audit, use a dedicated CSV linter.

What are non-printable characters and why do they matter?

Non-printable characters are Unicode code points that have no visible glyph: null bytes (\x00), carriage returns without matching newlines, zero-width spaces (U+200B), Unicode bidirectional override characters (U+200F, U+202E), and others. They're invisible in most text editors but break databases, APIs, and parsers. They can also be used to obfuscate malicious content in strings that appear safe. The non-printable count turns red when any are detected.

Tools you might like

17 tools across three categories — all free, no signup required.

Writing & Content

Measure, analyze, and optimize your writing. Count words, estimate reading time, check readability scores, and find which keywords dominate your draft.

Editing & Formatting

Transform and clean text into the format you actually need. Convert case, strip junk whitespace, remove duplicate lines, and generate filler copy in seconds.

Developer Tools

Built for code and data work. Convert identifiers between naming conventions, inspect byte counts, strip HTML tags, count lines, and deduplicate with a diff view.