My whitespace binary coder: whitecoder

Imagine this situation: you’re supposed to give your supervisor an important report he’s expecting on a specific day, on that day you give him a bunch of blank papers. He garbs the papers and goes through them as  his anger increases. He awaits an explanation and you say: … You see, it’s whitespace encoded.

You know, that situation is not far from the situation that brought up this topic! The other night, sigtermer and I were talking about our programming class professors and how one can effectively have his revenge on them for being – let’s just say: not so open-minded – professors. A thought rolled after another, then this idea kicked in. Giving a professor the requested code – whitespace encoded. Handing him the blank papers would yield the best expression ever. Explaining it would confuse him whether to be angry of this joke or be interested and ask about the codec.  Doing that to dino-professors would probably result in deducted marks instead of few extra ones! Though I think it would be worth it.

Putting those situations aside, how does the the idea “whitespace
encoding” sounds to you? Sounds fantastic to me!

The whitespace codec

To me, it was a joke, but to sigtermer, it was a topic to investigate. Apparently, he liked the thought of it; he came up with a whitespace codec.

We all know – at least I hope – that each byte consists of 8 bits, and each bit is either 0 or 1. The idea behind the codec is fairly simple: get as low as bits and bytes, then encode each bit. 0′s become a whitespace character and 1′s become a different whitespace character.

Three whitespace characters could’ve been useful in this case: tab (‘\t’), space (‘ ‘), and a newline (‘\n’). But because binary only has two values, one had to be left out, … it was the newline. So, 0 becomes ‘ ‘; 1 becomes ‘\t’.

The only question that remained was: Which digit should be encoded first? … sigtermer covered that and bunch of other stuff in an Arabic “specification” he wrote (English pseudo-specification by me). Encoding begins with the least significant bit of the byte; from right to left.

Here’s an example that demonstrates the encoding part.

... yes, I did add an eighth color to the rainbow spectrum!

Decoding would be the exact opposite: the first whitespace character of the whitespace encoded binary (WEB) becomes the least significant bit of a byte – of course. Meaning that for every 8 whitespace characters, the first becomes the least significant bit. Of course, that’s all after turning it back to its original form (0 or 1). Staring a few more seconds at the image and tracing it forwards and backwords would probably make it simpler.

Coding the coding codec

I was all okay until sigtermer made his next move. It’s when he had finished his whitespace codec – coded in C. He gave me the sourcecode, but it was WEB encoded!!

In order to be able to read it and/or try it, I had to write my own compatible decoder; I had to follow the specification.

He encouraged me to write my own coder – in my preferred language, python – for fun and for experience. … Challenge was accepted. My goal was not only to have fun, it was also to see how low Python can go. Though, my main goal was to decode the encoded sourcecode: white.web.

I coded mine in hours one morning. Though it had a serious bug, so serious that it rendered the codec incompatible! Decoding white.web gave me garbage and not a sourcecode! The bug was a bit hard to squash since decoding data that were encoded by my encoder seemed to work!

Clickedy click to enlarge.

That bit was driving me crazy! I later knew it was literal; the bit was driving me crazy. Instead of me making the byte be coded from right to left, I made the bit be encoded from right to left!! A BIT IS ONLY ONE DIGIT; writing it from RTL or LTR would not change anything! The reason why I had a hard time figuring this one out is because I was positive that direction of coding was not the problem – for I’ve checked the implementation and it was correct. Indeed, it was correct, but it wasn’t in the right code block! … That bit.

Putting that bug out of its misery was like stomping on an ant! I had a fully functional and compliant whitespace coder, … named whitecoder. It successfully decoded the sigtermer’s encoded sourcecode.

It was time to write this blog-post. But first, I thought of sharing it with the one who came with the idea – I shouldn’t have done that. He pointed out that the deal was to create the most compact whitespace coder.

I wrote my code with the “divide and conquer” principle in mind, it was fat, as fat as 110 pure lines of code – containing  about 10 functions! I populated the sourcecode with functions for the sake of reusability; for others to to be able to import it as a module if ever needed needed – which is most unlikely. The functions were even documented. I also added a neat looking help message.

I shall confess: I ripped off a command's man page. I dare you to tell which!

At first, I refused to leave the fully armored whitecoder for another, but then took it as another challenge. I finished it in no time, it was easy. whitecoder-compact was born; only 19 pure lines of Python code. I was happy, but again, upon sharing, another another issue was pointed on.

It was not an issue, but it can become one. Here’s how whitecoder functions: reads input, stores in memory, then processes. This would work in most cases, but if one is to encode a large file, let’s say 512MB file, the result WEB file would be of size 4GB! My PC, and most other’s, would run out of memory. Same would happen in sigtermer’s example: an insane guy wants to encode a whole partition! The only solution was to process data on the fly. … Challenge accepted.

As time flew by, I became a granddad, whitecoder-compact’s son was born: whitecoder-compact-stream (what a lovely full name! XD). Basically, I forked the father, whitecoder-compact. However, I couldn’t keep him as slim as his parent, he had 2 additional lines of code. But hey, it encodes and decodes on the fly! The insane person who wants to encode his HDD will be insanely happy now!

Just to be safe, I decided to try them all together and see if they all do their jobs correctly. PIPES!

$ echo "Works" | ./whitecoder.py -e | ./whitecoder-compact.py -d | ./whitecoder-compact-stream.py -e | ./whitecoder.py -d | ./whitecoder-compact.py -e | ./whitecoder-compact-stream.py -d
Works

Works!!

The battle of the whitecoder clan

So I got 3, do you honestly think they're all as good? I didn't think so! It was time to check which was the best; which was the most efficient; which was the fastest.

Arena: AnxiousBox (Pentium 4)
Objective: Beat all at decoding encoded data
Target: top
Command: cat `which top` | script -e | script -d > ./top

$ time cat `which top` | ./whitecoder.py -e |./whitecoder.py -d > top0
real    0m3.173s
user    0m3.008s
sys    0m0.052s
$ diff `which top` top0 # compares original top with decoded encoded top, top0
$ #binary files match
$ time cat `which top` | ./whitecoder-compact.py -e |./whitecoder-compact.py -d > top1
real	0m2.663s
user	0m2.452s
sys	0m0.044s
$ diff `which top` top1 # compares original top with top1
$ #binary files match
$ time cat `which top` | ./whitecoder-compact-stream.py -e |./whitecoder-compact-stream.py -d > top2
real	0m3.872s
user	0m3.716s
sys	0m0.036s
$ diff `which top` top2 # compares original top with top2
$ #binary files match

The compact, as expected, was the fastest. The original, came in second place. And the last was the most efficient, compact-stream. "You can't have both, it's either speed or efficiency." - a  dino-professor.

Because of that, I decided to keep all three of them and not to drop any. All are to be provided, and it's up to the reader which to download. The choice should be based on the need: if development or hacking, I'd say the original; if huge data, I'd say compact-stream; if fun, I'd say compact.

Oh, and if you're curious, yes, the decoded encoded top files worked in all cases. Try it yourself, but just don't forget to $chmod +x them.

Mine and yours

The reason behind sigtermer's act of not giving me his executable or his sourcecode is obvious, he wanted this. He wanted me to create one in another language. I have to  say, it was a smart decision. I not only learned new interesting stuff, I also got something awesome to brag about when I hit my 60's. On top all of that, I am ready to try it on one of my next programming professors!

Although I liked sigtermer's method of forcing, I'm not going to do the same. Nope, I'm not going to leave you with blank documents, I'm going to like directly to the scripts. Mainly because there are some people who aren't coders, but like to mess around (like me years ago). Another reason is that, I'm sure if there's a good programmer around or at least one who's willing to learn for a change, he'd take the challenge without me asking for it.

... I wont mention names, but I'm expecting a .net and a Java implementations - thought I'm not sure if it'll happen.  But you know what would rock, what would get the grand non-existing prize? A whitespace binary coder coded in whitespace programming language!

In case you're interested in my project, I've hosted it on GoogleCode. I've also set up the wiki: explaining how to use it (with examples) and how to import it in python and use the module.

Hope you got pumped and ready to take part.

About these ads

~ by AnxiousNut on September 11, 2011.

3 Responses to “My whitespace binary coder: whitecoder”

  1. I am ready to try it on one of my next programming professors!

    No, for that you must compress, encrypt (and loose the key in an unfortunetet repartitioning accident), encoded it in white space, print it on paper, then hand it over.

    On a more technical note,

    Decoding would be the exact opposite: the first whitespace character of the whitespace encoded binary (WEB) becomes the most significant bit of a byte – of course, after turning it back to its original form (0 or 1).

    I think you’ve confused it a bit. the LSB of the first byte gets encoded and printed first. When it is decoded, it is read first, but is still the LSB (held until you have enough bits to form a byte/or shifted in from the left).

    That, or I have things confused myself.

    • I think you’ve confused it a bit. the LSB of the first byte gets encoded and printed first.

      You’re right, it shouldn’t have been “the most significant “, rather “the least significant”. Thanks for correction. Updated.

      No, for that you must compress, encrypt (and loose the key in an unfortunetet repartitioning accident), encoded it in white space, print it on paper, then hand it over.

      Quite a nice strategy for one getting himself killed! XD

  2. [...] If you’d like to read an explanation of the specification implementation, you can read this (The whitespace codec [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: