The concept of diceware is pretty awesome! You can read the nasty details here. It requires you to have a dice… lame! Who will bring a dice to work to generate passwords!? Come on?! The principle is really cool, regardless.
We need some code to do it. This lady did it! It is meant for you to use words that you will remember without loosing the security aspect and prevent you from using “abc123”.
What is the composition of a diceware password?
The recommended size is 4 sets of words – separated by space. The size of the words can vary from 1 to 5 characters each.
At work I wanted to make it simple and standardized, so I choose a set of 4 words with 4 letters each. You can have a list of a lot of words within that pattern. But you can use the list however you want.. The more the better. The idea is that the words are easy to remember, so it has to be within your language dictionary and composed within your pattern. In my case it is Portuguese… check this one:
So the password would be something like – “volt come giro poti”
How strong is it?
Considering the 4 sets of 4 letters words (no pun intended) I’m using, the size of it is of 152 bits. And I’m counting the space bits as well that actually has one byte ( we have 3 spaces). It basically yields a gigantic number of possibilities.. something around the 5,708990770823839524233143877798e+45. That’s right.. 45 other digits after the last one seen.
If we count the characters only without meaning + plus space. The number of combinations would be smaller but still very big. But would be hard to remember a 4 set of random letters and we would be back to this “autg xdrv gvcn xmg”, right? That’s not a choice. What we need then is a list with words that make sense in our language. So let’s get one. You can generate yours (good idea), look on the internet, or grab from a book.
Say you have finished editing your list and ended up with 1000 words in it. That would give you a 1000 * 1000 * 1000 * 1000 = 1.000.000.000.000. Yep, that is a trillion. With your crappy list of a 1000 random 4 letter words, you would get a trillion different passwords, that would be actually easy to remember.
So basically what you have to do is process that list and spit each word randomly to compose your password. It would be rolling the dice for you.
Here is the code for your reference, the list is here.
#!/usr/bin/python import random with open('words.txt','r') as f: mywords = [line.strip() for line in f] print 'New Password: %s %s %s %s' %(random.choice(mywords),random.choice(mywords),random.choice(mywords),random.choice(mywords))
But this is not the diceware per se. The real diceware requires you to roll the dice 5 times to get each word. So each word of your dictionary would be assigned a number that goes from 11111 – 66666 getting you a list of 7776 unique words. Than our calculations becomes even more interesting now. Resulting in 7776 * 7776 * 7776 * 7776 = 3.656.158.440.062.976. I don’t know how to say that number in English! This is where the trues randomness in python needs to be explained, because I’m not rolling no freaking dice 5 times!
True or Pseudo Randomness
In computers system true randomness (rolling the dice) is hard to be achieved. Randomness is described as follow:
Randomness is the lack of pattern or predictability in events. A random sequence of events, symbols or steps has no order and does not follow an intelligible pattern or combination. Individual random events are by definition unpredictable, but in many cases the frequency of different outcomes over a large number of events (or “trials”) is predictable. For example, when throwing two dice, the outcome of any particular roll is unpredictable, but a sum of 7 will occur twice as often as 4. In this view, randomness is a measure of uncertainty of an outcome, rather than haphazardness, and applies to concepts of chance, probability, and information entropy.
In python for example the pseudorandomness method is used and is based on a set of mathematical functions called Mersenne Twister. In python the function “random” is used to generate a sequence of numbers and it takes a “seed” to start off. That is a deterministic way of generating numbers. You can choose that seed but generally the time of the system in milliseconds from epoch (1970) is used. Let me give you an example.
#!/usr/bin/python from random import seed from random import random # seed random number generator seed(1) # generate some random numbers print(random(), random(), random()) # resetting the seed to 1 again seed(1) # see the pseudo thing happening print(random(), random(), random())
You get two sets of random, but predictable numbers like the following.
You can see that after resetting the seed value, the randomness started off again from the same point, and the “randomness” is the same from the that point onwards.. hence the term pseudorandomness and deterministic.
As we set the seed number to 1, the random numbers will be given within the interval 0 and 1. Predicting the randomness can be useful to be used in production financial, engineering or machine learning systems.
If we use the python pseudo random function in a list, without setting a seed value (there is no point in it anyway) the result will be given based on a uniform likelihood or in other words, the choices are distributed evenly. In a list of 1000 words like the one I used, the likelihood of a given word to be given as a results is 1/1000 or 0,1%.
All that to say, that we don’t really need to roll the dice five times since the “entropy” is embedded in the python function.
If none of that is sufficient for you, you can order a true diceware password (made on paper) for 2 dollars.