Funny: How to Break Windows Notepad

It's not a bug! It's unintended irony!

Thursday, May 18, 2006 by Zoomba | Discussion: Windows Software

Here's a really funny way to break Notepad that a coworker showed me this morning. I bet this is one of those jokes that's been around for ages, but this was the first I ever heard of it, so it's new to me.

This actually works. It will not crash your computer, it just breaks Notepad in that it causes it to display very oddly. No perm damage comes of the following steps.

Here's how to do it:
1. Open up Notepad (not Wordpad, not Word or any other word processor)
2. Type in this sentence exactly (without quotes): "this app can break"
3. Save the file to your hard drive.
4. Close Notepad
5. Open the saved file by double clicking it.

Instead of seeing your sentence, you should see a series of squares. For whatever reason, Notepad can't figure out what to do with that series of characters and breaks

Again, it doesn't crash the app or anything, it's just a funny little twise of fate/unintended feature
First Previous Page 2 of 2 Next Last
synfin80
Reply #21 Wednesday, June 14, 2006 11:39 AM
If anyone is interested in what it is doing... I saved a copy of the file after opening the corrupted version. If you open it in a hex editor, you will see that two bytes have been prefixed to the file: FF FE. The FF FE characters at the start of the file signify that the file is a UTF-16 little-endian encoded file.

http://en.wikipedia.org/wiki/UTF-16

Unicode uses 2 bytes instead of 1 bytes like ASCII. The phrase "this app can break" is 18 characters long (include the spaces), which explains why people are seeing 9 Asian or 9 block characters.

That doesn't explain why it happens, but it does explain what is happening.
neoposix2
Reply #22 Wednesday, June 14, 2006 11:41 AM
All ya'll crack me up. Trying to figure out if it's Chinese, Japanese, "Simplified Chinese".

But did any of you try simply changing the letters? Such as:

"this app can brake", or
"this cat can split", or
"xxxx xxx xxx xxxxx", or
"abcd efg hij klmno"?

They all work, and produce different combinations of your "Chinese characters". My guess is, it's a bitshifting bug (or possibly an egg) which is simply a bitshift of the original characters.
Logomachist
Reply #23 Wednesday, June 14, 2006 12:30 PM
Now can anyone create an 18 character phrase that means something both in English and Chinese?
Wei Zhong Goh
Reply #24 Wednesday, June 14, 2006 1:01 PM
I've been learning Chinese for 9.5 years and Japanese for 3.5 years. The characters are CJK (Chinese/Japanese/Korean) characters, but have no meaning when read as a phrase in Chinese or Japanese. Individual characters hold meaning in Chinese and Japanese; the fourth character means "ash" and the last "joy". CJK characters ultimately originate from Chinese, and all the characters probably have meanings, at least in the past.

In short, the bunch of characters are random CJK characters.
arantius
Reply #25 Wednesday, June 14, 2006 9:43 PM
Synfin80 is correct, this is a UTF related bug. I, however, do not see any "FEFF" characters, and that would seem to be the real issue to me; UTF encoded file without the BOM.

If I view the raw data of the file created in the "save" step, I get:

$ hexdump break.txt
0000000 6874 7369 6120 7070 6320 6e61 6220 6572
0000010 6b61

Translate those hex values into ascii characters, and you get:

htsia ppc nab erka

Or, as he said, little-endian two-byte groups of the value we started with.
(That means, look, transpose each pair of characters, it's the original!)

Something that turns it into 16-bit little endian runs, but not the bit to turn it into a valid UTF encoded file.

This also happens with "this abc xyz break" and "efgh abc xyz break". But it seems, only when you type it in first thing after starting the program.

I don't know how, but at
xiaoyug
Reply #26 Thursday, June 15, 2006 10:06 PM
after a few tries, i got the following pattern:

xxxx xxxxxxx yyy....

the positions of the 2 blanks mattered, just letters -- no digits or other symbols, and as long as the total number of characters is even then Notepad will break.

looks like a bug rather than an egg. but i'm curious what's the logic behind this...
hmmm, what if some of the x's are replaced with real Unicode characters?...
xiaoyug
Reply #27 Thursday, June 15, 2006 10:19 PM
nope, can't insert Unicode into the file.
if you do, Notepad saves the file in Unicode format (even if you specify ASCII), which doubles the filesize and tags FFFE at the beginning.
Mavedrive
Reply #28 Friday, June 16, 2006 10:59 AM
this is what we call BUG.

try putting adding a carriage (return) at the end of the line.
notepad will then properly handle it...
Adam Louis
Reply #29 Saturday, June 17, 2006 12:47 AM
Not just Notepad -- saving and opening the same string in Metapad produces the same results.

Looking at the hex, it seems to be saving fine -- the bytes. Changing anything other than the letters in "break" (said other including adding or deleting any of it), or substituting any charcter other than the lower-case alphabet (61 to 7A), for any letter of those last five, makes it display properly, again, in both of the -pads I tried it in.

e.g.

Displays properly:
this app can breaka
this app can brea
this app can breaK
qhis app can break
this app canbreak
this app can break[carriage return]

Still broken:
this app can qwert

My guess is that this collection of characters is somehow interpreted as malformed unicode-16 by whatever common sense is applied to distinguishing between the various characters encodings by text editors -- it's a bug, but on a much more abstract scale than Notepad being a shitty app.
thoreaulylazy
Reply #30 Wednesday, June 21, 2006 3:16 AM
It's not a bug in the core logic or in the translator, nor is it an easter-egg; it's simply a matter of what the defaults for notepad are. When you save a file in notepad, the default is "ANSI" (it's in a dropdown with other choices). When you open a file in notepad (File->Open), apart from the file path, you specify the file format, the default is "Unicode" (UTF-16), with "ANSI" and "UTF-8" as other choices. When you double-click a file, all notepad has is the filepath, it wasn't given the format, so it just uses the default Unicode when auto-detecting fails. Auto-detecting fails when CRLF is missing under all possible attempted formats. If you hit a newline after the sentence, the file would open correctly under the ANSI format.

If the default format to open were the same as the default format to save, this wouldn't confuse end-users.

No, I have never worked for Microsoft. I used to work for Oracle, and I have seen similar auto-detection failures when trying to simplify the user experience by not asking for the values of unknown variables.

Cheers,
thoreaulylazy
kulman
Reply #31 Wednesday, June 21, 2006 2:47 PM
My observations on this bug.
1) Its happens only when entered for first time after we open the notepad program.
2) when it is saved in any other encoding format other than ANSI, it wont happen.
3) It is happening only when the first 18 characters are 4letter word(space) followed by 2 three letter words(space in between) and then a five letter word.Including digits its happening. above this if the total characters are even, its happening.
4)If the phrase contains any Upper case letters in between other than 1st and last character, it wont happen again.
Though I could hear the reason as "
1) You are saving to 8-bit Extended ASCII (Look at the Save As / Encoding format)

2) You are reading as 16-bit UNICODE (You guessed it, look at the Save As / Encoding format)

This is why the 18 8-bit characters are being displayed as 9 (obviously not supported by your codepage) 16-bit UNICODE characters"
I would like to conclude that why it is happening for short peice of text?
2)If we erase the Junk characters and re type phrase "Bush hid the facts" or "aaaa aaa aaa aaaaa" or "1111 111 111 1111" , it is appearing fine.

The reason what I would hear for this is "Text files containing UTF-16 is supposed to start with a BOM, so you can read those two chars and the application will know it is UTF-16. But so many applications does not do that (you have probably noticed the two small chars Notepad adds to the start of a file sometimes),

So what should a poor Notepad do? Well, it can alwys use the IsTextUnicode() Win32 API. YOu pass it some text, and it tries to guess if it is Unicode or not. But what if you just give it so little text, and maybe even just lowercase? Well, then it isn't too easy to tell if it really was unicode or not you gave it. And in these small strings you guys have found, it yes indeed does break. Poor Notepad gets the blame for a bad API, and other faulty apps.."
more interesting links
http://blogs.msdn.com/michkap/archive/2006/06/14/631016.aspx
GreenReaper
Reply #32 Thursday, June 22, 2006 7:57 PM
Here is another explanation of this, which I happened to read after reading this here: http://blogs.msdn.com/oldnewthing/archive/2006/06/14/630864.aspx
LadyJinx
Reply #33 Monday, July 17, 2006 11:27 PM
Didn't work for me...
Je$$e
Reply #34 Wednesday, July 19, 2006 2:23 PM
not work!!
Tominated
Reply #35 Thursday, July 20, 2006 3:44 AM
pc world (in oz) did a section about it in the back of the magazine this month. apparantly "pies are the shizz" does not work.
DesignCaddy
Reply #36 Saturday, July 22, 2006 1:48 PM
thoreaulylazy just completely killed the ride

thanks, guys, i found this post thoroughly amusing

Please login to comment and/or vote for this skin.

Welcome Guest! Please take the time to register with us.
There are many great features available to you once you register, including:

  • Richer content, access to many features that are disabled for guests like commenting on the forums and downloading skins.
  • Access to a great community, with a massive database of many, many areas of interest.
  • Access to contests & subscription offers like exclusive emails.
  • It's simple, and FREE!



web-wc01