Thought of buying a new phone line number; wrote a script

Original screenshot by Kai

Have you ever wondered what phone number you should get? Have you ever wondered what your phone number might spell? … Well, I had to dive into this since I’ve been thinking lately of ditching my current Telecommunication Service Provider for another one – seeing that another TSP charges less and provides longer validity.

Now, when people here in Kuwait decide to get a new phone number, the majority goes for what they call “golden lines”: phone numbers that contain as few different digits as possible and as structured as possible (e.g. xxxyxxxy). The only problem with those numbers is that they are extremely expensive – some reached above 1000KWD! But honestly, even if they were not as expensive, let’s say as cheap as the regular phone numbers, I wouldn’t be that interested.

However, what interests me is phone numbers that actually spell words; phonewords; vanity numbers (e.g. xyz-NUT). This is uncommon in here, in fact, I’ve yet to meet someone that spells his phone number! Objectively, I dislike this fact, but subjectively, I love it; it means we, those who seek phonewords, have a greater chance to get what we want! … That’s why I had to know what numbers some great names would correspond to on dialpads.

At first, I grabbed my phone and manually translated the words I had in mind. All one has to do is looking up the corresponding number of a letter on a dialpad. GNU:468; linux:54689; debian:332426; ubuntu:828688; python:798466. However, upon finishing that, a bunch of other names came to my mind. I couldn’t take it, so much unnecessary effort has been put. I decided to write a script.

If you wish to skip development story, click this link!

Development funs and woes

A simple script was intended, just a couple of lines that does the work on my behalf. It’s been written within a minute or so.

  • Anxious-Nut: 2694687-688
  • Sig: 744
  • Looly-kinns: 56659-54667
  • JiMMaR: 546627
  • MBH: 624

Those are some nicknames that crossed my mind. Apparently the shorter the nickname is, the greater the chance of finding an available line number. If you have a long one, like mine, you’ll have to take one part of it. :/ … but at least I knew what numbers I should be going for.

The birth of the reversed code

Two thing I have no control over are tempting thoughts and writing long posts – but let’s leave long posts aside. I wanted to reverse the functionality of the previous code; I wanted to know what digits might correspond to. Perhaps, something catchy might be found, but the main reason was that I wanted to know what my number spells.

The problem with this functionality is that the implementation is somewhat complex. Translating letters to numbers is quite straightforward, each letter has the only one possible corresponding number. However, translating numbers to letters is quite different, each number has a number of corresponding possibilities.

Three different letters point to a single number.

One number may point to three different letters.

Assuming that all numbers have 3 corresponding letters: a number of only 1 digit would have 3 (31) possibilities; 2 digits would have 9 (32) possibilities; 8 digits (length of phone numbers in Kuwait) would have 6561 (38) possibilities! However, because that assumption is incorrect (numbers may correspond from 0 letters upto 4 letters), the number of possibilities might vary depending on the digits. For instance, a number that consists of 5 occurrences of the number 1 or 0 would give at most 64 (43) possibilities (since 1 or 0 corresponds nothing, they have no possibilities which leaves us with the possibilities of the remaining 3 numbers that corresponds to 4 letters). Whereas a number that consists of 8 occurrences of the number 9 or 7 (corresponds to 4 letters) would give 65536 (48). However, because each number has a different number of corresponding letters, there’s no static number of possibilities for an 8 digits number. However, if you really wish to calculate it, you can do it by getting the product of a sequence:

Mathematics is not the only perquisite, apparently! :P

So after analyzing the theoretical part and facing a blank paper for a couple of minutes, I managed to come up with two pseudo-algorithms, one recursive and the other iterative. However, I ditched the iterative one although iteration is faster than recursion in general. The reason behind that decision was the readability and the easiness of the implementation.

To be honest, the implementation was a bit of a hassle, but I managed to get it to function plusgood. Also, the thing that I feared the most – execution speed, was no longer feared: although it was recursive, it was fast.

time ./dialpadTrans.py d2l -l en 99779977 > /dev/null
real	0m0.280s
user	0m0.224s
sys	0m0.020s

After making sure that it’s the function was not flawed, I tried my phone number. I gotta say: the result was doubleplusgood! It told me that one possibility is: “XP 1 cry 11″ which can be interpreted and remembered as: Buying one copy of WindowsXP cause 11 people to cry. Or maybe as: One bully sticking out his tongue (emoticon XP) made 11 kids cry. … okay, maybe those aren’t best interpretations. :/

My phone number was a fail because of the occurrences of the number 1. Same will be the fate of numbers that are stabbed with 1’s and 0’s. However, people with numbers that does not contain such party poopers would probably have their numbers spelling something better.

The birth of the filtering code

65536 permutations are a lot to go through and mentally processing each entry, don’t you think? Well, that’s what I realized. I also realized that most doesn’t make sense (e.g. zs1bpz11). That’s why I have implement a wordlist dictionary filter: It filters the entries and displays only the permutations that has words within the wordlist. If you’re a regular reader, you might realize that I’ve implemented something similar in scrambler. However, because scrambler’s is different, I cannot reuse it.

Implementation was straightforward, but inefficient – though it was the most efficient of all the algorithms I managed to get: looping into a wordlist and check for the occurrence of each word within each permutation. Yes, that means that if the permutations were 65536 and the wordlist contained 98569 (linux wordlist file), then 6459817984 comparison operations will be made!

I was sure this function will lack speed and there was nothing I can do about it. I tried it. … I was right. Filtering the possibilities of my number with linux dictionary wordlist took 13 seconds. But, hey, it works!

… Though, I was bugged by the execution speed. I decided to try “optimizing” the function. One thing I did was dumping the whole wordlist into the memory – taking advantage of memory speed. Another thing I did was lessening variable assignments by fusing statements – avoiding rewriting to memory. Those two actually made a difference: the filter became 23% faster (dropped from 13 seconds to 10).

Although it was still unsatisfying, I knew I had to leave it as is. Continuing optimization attempts would probably end up in parsing the wordlist and storing it in a sqlite database. However, this is not intended to be an every-day use application to require as much speed as possible. Also, don’t forget this is run on an 7 y/o computer. Besides, readers might use it once and that’s it! … ne?

The birth of a UTF-8 supportive code

I was about to close my development environment, but I remembered an important thing: encodings. … At that point, the script can only deal with ASCII. However, dialpad of other languages do exist in reality, including an Arabic one. UTF-8 support was called for!

Most of my python scripts I used to write supported UTF-8, so supporting it in this one was not a big hassle – thought not saying it was a piece of cake. Translating digits to letters worked from the first time, however, translating letters to digits, failed! After a long time of debugging, I noticed that the length of the unicode string has become twice as long as the one entered. The bug was easy to squash, but was quite interesting: because I forgot to decode the Arabic string, the values were raw binary values represented as hex values.

… UTF-8 encoding uses multiple bytes for encoding if needed: the first byte is backwards compatible with ASCII, however, if a character is not within the range of ASCII characters, multiple bytes (upto 6) may be used for a the character.

A single Arabic UTF-8 character value

Because of that fact and the fact that Arabic UTF-8 characters are of 2 bytes, the length of a UTF-8 Arabic character represented in hex would be 2. So if we have an Arabic string, then it’ll be twice as long! And if you recall from earlier, I mentioned that English did work that time, it’s because English characters uses 1 byte; backwards compatible with ASCII, remember?

Again, it was just a matter of decoding the string before processing it. I did it that and it was a success; I tried a simple input, it was working. Just to be sure I decided to try my phone number since I know how much it takes and which permutations should be displayed, however, … what once used to take 10 seconds was now taking 5 minutes. I literally froze.

What kind of optimization does this require to get it faster? Is it even possible to make it faster? Did I do something wrong? Is it because of python? Should I ditch this script and write a new one? SHOULD I USE C INSTEAD OF PYTHON? I decided to give it a rest; I left my PC for the night. It was depressing.

LOL, that night, I had a funny dream. … I was a new employee in a software solution company along with sig, Sig was doing okay, but I on the other hand was not able to write a single line of code. I do not remember what my assignment was though. It ended with me realizing that it was a unicode issue! XD

Once woke up in bed, I thought of what might cause the lag. I realized that all I did the previous night was adding the UTF-8 support, only a couple of words. I decided to take off the UTF-8 support. I did, and the execution was back to 10 seconds. If you’re a python coder, you’d probably know that UTF-8 encoding and decoding is only two words appended to a string: input.decode('utf-8') or input.encode('utf-8'). Since UTF-8 support meant the inclusion of these and 3000% slower execution, I decided to split up the filtering function into two, one for ASCII encoded text and another for utf-8 encoded text. Although I could have set up conditions instead of making two functions, but I was hoping that I could find a solution for the UTF-8 function.

It’s quite obvious that the lag was occurring from the decode() and encode() class constructors – as if they were poorly coded. But I’m not gonna say that, I’m pretty sure there’s a reason behind this lag. I couldn’t do anything about the class constructors lag, but I knew I had to find a way to over come this issue. I had to go deeper, to think as a C programmer rather than an independent python programmer. … I found it!

Yes, I found a possible solution. Since the lag was being cause at decoding, why not stick with ASCII and hex values for UTF-8 instead of unicode values? Why not take advantage of the interesting bug I faced the previous day? All I had to know is how many bytes that a single UTF-8 character is represented by. Knowing that would enable to use it as a factor; comparing $factor values at once whether they’re text ot hex values. If the text is in ASCII, the factor is going to be 1 (1 byte) which will process an element at a time. If the text is 2 bytes UTF-8, it’s going to process 2 at a time. If it’s 4 bytes UTF-8, then 4 at a time. It was a valid solution.

At first, I implemented a dictionary of languages and their encodings. ┬áBut then I realized there’s a more flexible way of doing it: using the .decode() class constructor once. Since it’s going to be used excessively, it’s not going to effect the speed. The equation for the factor was simple: the length of the original string divided by the length of the decoded string.

factor=len(word)/len(word.decode('utf-8'))

It was time to try it out.

10 seconds; victory!! … lol, looking at the whole thing now makes me want to the word “victory” back; 10 seconds is still a lot! … but at least not 5 minutes.

Dialpad Translator v1.0

So, in the end, I managed to get myself a script that translates letters to dialpad numbers and vice versa, supporting both Arabic and English languages by default, and has a wordlist dictionary filtering system. All of that in 180 lines of code, … well, 70 without the dictionaries, argument parser, and help messages.

Usage

Well, since I wrote my own argument parser, I had the chance of kicking in a nice looking man-page-like help message.

Here you go, the execution of the examples in the image above:

Dialpad language support/hacking

As shown in the “man page” above, you can choose a language: English or Arabic. Both dialpads are hard-coded in the sourcecode. However, if you wish to show some love to your local language and it’s neither of the supported ones, contact me and I’ll add it – and credit you of course. Though, if you wish not to cooperate, you can modify and add it on your own! All you have to do is 3 simple steps. Let’s do an example, adding a support to Wacko dialpad.

1. Adding the new language to the language dictionary

language = {'en': 'English',
			'ar': 'Arabic',
			'wacko': 'Wacko'}

2. adding a new dialpad python dictionary (having an unused name)

keypadWacko = {'1': ['t', 'u', 'v'],
			'2': ['p', 'q', 'r', 's'],
			'3': ['g', 'h', 'i'],
			'4': ['a', 'b', 'c'],
			'5': ['w', 'x', 'y', 'z'],
			'6': ['6'],
			'7': ['7'],
			'8': ['j', 'k', 'l'],
			'9': ['m', 'n', 'o'],
			'0': ['a', 'b', 'c']
			}

3. Adding a line to the last portion of the code – under if __name__=="__main__". Add a new elif statement

	if (args['keypad']=='en'): keypad=keypadEn
	elif (args['keypad']=='ar'): keypad=keypadAr
	elif (args['keypad']=='wacko'): keypad=keypadWacko

and you’re ready to go.

Wordlists

As long as the text within a wordlist file is either UTF-8 or ASCII, there shouldn’t be any problem using it. You can even write your own wordlist seeing that most dictionaries don’t include all words you know. For example, “linux”, “noob”, and “href” are not included in the default linux dictionary!

In case you don’t know the location of the linux words (English) wordlist on your system, SHAME ON YOU! LOL, it’s under /usr/share/dict/words (Ubuntu). And if you ever forgot, it’s provided withing the help message. As for the Arabic wordlist, you can get it from my old blog-post.

Download

Of course, a link the script: here! Enjoy! :D

… I’ve been working on this post (writing, drawing, and formatting) for two days. So if you’ve read the whole post, you truly have my respect. :)

About these ads

~ by AnxiousNut on August 27, 2011.

4 Responses to “Thought of buying a new phone line number; wrote a script”

  1. I did actually read the whole thing
    but say .. did you decide which number to go for in the end ?
    =3

    • Wow, I really appreciate it!

      … a couple of numbers are on my mind atm, but I wont tell before getting one; putting them in here might mean me not getting some of them!

  2. I have to say Great stuff,
    Although I didn’t fully understand some of the implementation stuff since I’m not into phython ,,, but I liked the idea .
    Good job :)

    • You don’t really have to understand the python implementation as long as you understand the concept of the implementation – which was demonstrated generally.

      … thanks for the feedback, Ahmed! :D

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: