Text Cleaner
Select any combination of cleaning operations — remove extra spaces, fix line endings, convert smart quotes, strip HTML — and see a diff of every change before accepting.
How to use the text cleaner
Paste your text into the editor, then check the operations you want to apply. The output and diff panel update live — no Clean button required. When you're happy with the result, click Accept to replace the editor content. Undo is available immediately after accepting.
What each cleaning operation does
Remove extra spaces collapses multiple consecutive spaces to one. Trim lines strips leading and trailing spaces from each line. Remove blank lines deletes lines that contain only whitespace. Remove line breaks joins all lines into one paragraph. Fix paragraph spacing ensures exactly one blank line between paragraphs. Smart → straight quotes converts curly quotes to ASCII. Em dashes → hyphens converts — to -. Remove HTML strips all <tags> from the text.
Why text cleaning is essential for content work
Text copied from PDFs, presentations, email clients, and websites arrives with invisible formatting artifacts. Double spaces, Windows CRLF line endings, smart quotes, non-breaking spaces, and stray HTML all cause problems downstream — in CMSes, code editors, and publishing tools. Cleaning text before pasting it into your workflow prevents hours of debugging and reformatting later.
Frequently asked questions
What are smart quotes and why do they cause problems?
Smart quotes are the curly typographic quotation marks (“ ”) as opposed to straight quotes (\" \'). Word processors, Google Docs, and iOS keyboards insert smart quotes automatically. They look better in print, but they cause problems in code editors, CSV files, JSON, command-line tools, and any system that expects ASCII. The smart-to-straight conversion in the Text Cleaner replaces “ ” → \" and ‘ ’ → '.
What is the difference between an em dash and a hyphen?
A hyphen (-) is ASCII character 45, used for compound words and line breaks. An en dash (–) is U+2013, used for ranges (pages 10–20). An em dash (—) is U+2014, the long dash used for parenthetical clauses — like this. Em dashes break many plain-text systems: CSV parsers, email clients, command-line tools, and older databases. The 'Convert em dashes to hyphens' option replaces — with - so the text is safe to paste into technical contexts.
What does 'remove extra spaces' actually do?
It collapses any run of two or more consecutive spaces into a single space. It doesn't touch line breaks or indentation — only horizontal spaces within a line. This is useful for text copied from PDFs (which frequently insert random double-spaces) or content pasted from HTML (where the browser may have collapsed spaces visually but the underlying text still has them).
What is the diff view showing me?
The diff view shows a character-level comparison between your original text and the cleaned output. Removed characters appear in red with a strikethrough. Added characters appear in green. Lines with no changes are dimmed. This lets you verify the cleaning operations did exactly what you intended before using Replace to overwrite your working text.
I pasted text from a PDF and there are weird line breaks everywhere. How do I fix it?
PDFs hardcode line breaks at every visual line ending, so pasted text looks like a poem with a line break every 60–80 characters. Check 'Remove line breaks' — this joins all lines into continuous paragraphs. If the PDF had clear paragraph separations (a blank line between paragraphs), check 'Fix paragraph spacing' first before removing line breaks, so the paragraph structure is preserved.
Tools you might like
17 tools across three categories — all free, no signup required.
Measure, analyze, and optimize your writing. Count words, estimate reading time, check readability scores, and find which keywords dominate your draft.
Transform and clean text into the format you actually need. Convert case, strip junk whitespace, remove duplicate lines, and generate filler copy in seconds.
Built for code and data work. Convert identifiers between naming conventions, inspect byte counts, strip HTML tags, count lines, and deduplicate with a diff view.