Had a red flag on my ElasticSearch cluster these days and found that the reason was related to an unassigned shards between the nodes.
As the data I collect is not that sensitive I could easy delete it and recreate in case I need in the future. But first we need to find it. There is many articles on the internet to help one to understand the shards allocation but I offer here a simple solution which is – simply delete the bastard.
First, we check on the cluster health and get the count of unassigned shards.
The concept of diceware is pretty awesome! You can read the nasty details here. It requires you to have a dice… lame! Who will bring a dice to work to generate passwords!? Come on?! The principle is really cool, regardless.
We need some code to do it. This lady did it! It is meant for you to use words that you will remember without loosing the security aspect and prevent you from using “abc123”.
What is the composition of a diceware password?
The recommended size is 4 sets of words – separated by space. The size of the words can vary from 1 to 5 characters each.
At work I wanted to make it simple and standardized, so I choose a set of 4 words with 4 letters each. You can have a list of a lot of words within that pattern. But you can use the list however you want.. The more the better. The idea is that the words are easy to remember, so it has to be within your language dictionary and composed within your pattern. In my case it is Portuguese… check this one:
volt come rena xepa giro poti roxa …
So the password would be something like – “volt come giro poti”
How strong is it?
Considering the 4 sets of 4 letters words (no pun intended) I’m using, the size of it is of 152 bits. And I’m counting the space bits as well that actually has one byte ( we have 3 spaces). It basically yields a gigantic number of possibilities.. something around the 5,708990770823839524233143877798e+45. That’s right.. 45 other digits after the last one seen.
If we count the characters only without meaning + plus space. The number of combinations would be smaller but still very big. But would be hard to remember a 4 set of random letters and we would be back to this “autg xdrv gvcn xmg”, right? That’s not a choice. What we need then is a list with words that make sense in our language. So let’s get one. You can generate yours (good idea), look on the internet, or grab from a book.
Say you have finished editing your list and ended up with 1000 words in it. That would give you a 1000 * 1000 * 1000 * 1000 = 1.000.000.000.000. Yep, that is a trillion. With your crappy list of a 1000 random 4 letter words, you would get a trillion different passwords, that would be actually easy to remember.
So basically what you have to do is process that list and spit each word randomly to compose your password. It would be rolling the dice for you.
with open('words.txt','r') as f:
mywords = [line.strip() for line in f]
print 'New Password: %s %s %s %s' %(random.choice(mywords),random.choice(mywords),random.choice(mywords),random.choice(mywords))
But this is not the diceware per se. The real diceware requires you to roll the dice 5 times to get each word. So each word of your dictionary would be assigned a number that goes from 11111 – 66666 getting you a list of 7776 unique words. Than our calculations becomes even more interesting now. Resulting in 7776 * 7776 * 7776 * 7776 = 3.656.158.440.062.976. I don’t know how to say that number in English! This is where the trues randomness in python needs to be explained, because I’m not rolling no freaking dice 5 times!
True or Pseudo Randomness
In computers system true randomness (rolling the dice) is hard to be achieved. Randomness is described as follow:
Randomness is the lack of pattern or predictability in events. A random sequence of events, symbols or steps has no order and does not follow an intelligible pattern or combination. Individual random events are by definition unpredictable, but in many cases the frequency of different outcomes over a large number of events (or “trials”) is predictable. For example, when throwing two dice, the outcome of any particular roll is unpredictable, but a sum of 7 will occur twice as often as 4. In this view, randomness is a measure of uncertainty of an outcome, rather than haphazardness, and applies to concepts of chance, probability, and information entropy.
In python for example the pseudorandomness method is used and is based on a set of mathematical functions called Mersenne Twister. In python the function “random” is used to generate a sequence of numbers and it takes a “seed” to start off. That is a deterministic way of generating numbers. You can choose that seed but generally the time of the system in milliseconds from epoch (1970) is used. Let me give you an example.
from random import seed
from random import random
# seed random number generator
# generate some random numbers
print(random(), random(), random())
# resetting the seed to 1 again
# see the pseudo thing happening
print(random(), random(), random())
You get two sets of random, but predictable numbers like the following.
You can see that after resetting the seed value, the randomness started off again from the same point, and the “randomness” is the same from the that point onwards.. hence the term pseudorandomness and deterministic.
As we set the seed number to 1, the random numbers will be given within the interval 0 and 1. Predicting the randomness can be useful to be used in production financial, engineering or machine learning systems.
If we use the python pseudo random function in a list, without setting a seed value (there is no point in it anyway) the result will be given based on a uniform likelihood or in other words, the choices are distributed evenly. In a list of 1000 words like the one I used, the likelihood of a given word to be given as a results is 1/1000 or 0,1%.
All that to say, that we don’t really need to roll the dice five times since the “entropy” is embedded in the python function.
If none of that is sufficient for you, you can order a true diceware password (made on paper) for 2 dollars.
This Feb 28th is the so called “Thesis Defense” day. It is where me, myself and I, after submitting the theses papers, put myself at the disposal of the thesis committee. In this case, “defend” does not imply that a I will have to argue aggressively about my work (although I see myself doing it).
Rather, the thesis defense is designed so that faculty members can ask questions and make sure that students actually understand their field and focus area. It serves as a formality because the paper will already have been evaluated ( have been… it is called Qualification Process). During a defense, a student will be asked questions by members of the thesis committee. Questions are usually open-ended and require that the student think critically about his or her work. The event is supposed to last from one to 3 hours, I have heard it could take more.. geees!
The Defense, is the crowning event of at least 2 years of hard study, dedication and sacrifice. And I want to tell you what I have learned with it.
It is not as hard and mystic as it seems
At least here in Brazil, masters, or as we call it “Mestrado” is not as common as it should be. It has some sort of mysticism around it, like if it was reserved for a certain “class” of student and society. People really tend to go for an MBA. MBA in Brazil, although stands for – Masters in Business Administration, has nothing to do with masters Strict Sensu and it just a Lato-Sensu course, or a specialization. I’m not taking out the credit of those that chooses the MBA, but it is different. The MBA, in the country has a more commercial practical focus. Also it is offered during the night, or weekends which helps a lot those that actually have a job to attend. I guess that is the main reason people tend to go that way. In the other hand a full blown masters course, is considered too academical or meant for those that want to pursue a professor, researcher or academic career. This is not entirely true! You can enroll to a master course, continue to work in the industry and solve a real world problem.
This is part of the mysticism that goes around a Masters course. It is not only meant for academics, it is not meant only for super nerds researchers and you can do it while you keep your actual job! It is true, that the work you have must give you certain flexibility and freedom to cope with the crazy schedules that some schools push, but it is possible.
You can have a job that is not related to universities and academic world. You can work anywhere you want. In fact, big tech companies are the ones that employs masters graduates the most. And, there is a probability that your salary increases by up to 80% if you have a master degree.
The professors and board of teachers are pretty much regular people, with experience on some topics and areas of research.. but are NOT the owners of the entire knowledge. It is pretty common that the student knows more about a given topic than the professor. He is there, to help you to adjust your thought process and writing your ideas within the accepted scientific methodology but he is not a God with omniscience. You can argue, you can defend a statement, heck, you can actually fight with your professor (although not a good idea) if you think X is equal Y.
The other aspect of the mysticism of the Master course is that you are there to learn and have classes.Myth! There is no way for the school to teach each student the specifics about his work. What you actually learn is how to organize your thought process, how to treat numbers you may collect from your research, how to use others work to build the fundamentals of your work. That is it. Don’t expect to be there and have classes of advanced math or signal processing or any in depth classes about your field. That’s not going to happen. Instead you have several classes of debate, several seminars to expose your ideas and have the other student to confront, challenge and disagree with you. You have some writing classes, you have basic statistics classes and a lot of seminars. That’s where the knowledge and ideas are born. That’s where you mature and learn how to “defend” your work and identify potential flaws in it, by discussing it with other people.
Since it depends on you, to understand, collect, test, treat and present the work.. it is not as hard as it seems. give it a try. You might surprise your self.
It is a victory in solitude
It is sad! I know. But it is the hard truth. There is a big chance that not a single soul around you will actually understand what you are doing. I mean, friends, family co-workers. None of them will, one – be interested about what you discovered, two – be willing to discuss it in detail. Nobody cares! If you are married, your wife will be interested about when you will finish it so she can have you back to regular life. Or when you will stop spending the night reading to give more attention to your kids, or – my case – When you will be finished to be able to request a raise at your current job!
There wont be any question about the inner details of your research, and if they initially show interest.. that is rapidly lost when you start going on and on about it.
Upon completion, your friends will be excited to know that you have finished it with success.. remember the myth that it is super hard and reserved for some people? Sure, they will be thrilled with the news. But don’t think they are really interested in bits and bytes of it. And it is not because the don’t like you, or have no interest on your stuff.. it is just because the don’t understand it.. and it is really hard to relate with something you have no idea about.. they are (as everybody else) afraid to look stupid.
Isn’t it sad? That you research, and successfully develop something that could be used by society and have the potential to put your name in the field history.. and nobody cares?! You can’t share it, or brag about it :)?! Come on!! So sad!
It is indeed a victory in solitude. Be glad you made it. The hours, the sacrifice is totally worth!!. Knowledge is one the few things you can actually keep and its value can’t be measured. It is however a satisfaction that only you will feel in its fullness. It is ok!
Are you curious to know what I have studied? Probably not, Maybe?!
My thesis is called – Image Perceptual Hash applied for Video Copy identification. and here is the abstract.
With the event of the Internet, video and image files are widely shared and consumed by users from all over the world. Thus, methods to identify these files have emerged as a way to preserve intelectual and commercial rights. Content based identification or perceptual hashing is the technique capable of generating a numeric identifier from the image characteristics. With this identifier, it is possible compare and decide if two images are equal, similar or different. This study has as objective discuss the application of image perceptual hashing to identify video copies. It proposes the usage of known and public methods such as the Average and Difference Hash that are based on statistics of the image also Phash and Wavelet hash that are based on the image frequency.
An identification technique was applied using a Hamming distance similarity threshold and the combination of perceptual hash algorithms for video copy identification. The method was tested by applying several attacks to the candidate video and the results were properly detailed. It is possible to use perceptual hash algorithms for video copy identification, and there are benefits when there is a combination of more than one of them filling performance gaps and vulnerabilities eliminating false positives
Eu trabalho com TI, com informática, com computação. Deu pra entender né? Pra ser mais especifico, eu trabalho com segurança da informação e desenvolvimento para automação de tarefas. Isto significa, na prática que a literatura das coisas relacionadas do ramo, raramente são em português.
Por isto, já que eu leio em inglês, por que não escrever também? Por que eu não sou nativo? Porque talvez o número de erros gramaticais serão grandes? Talvez. Porém uma análise mais fria da coisa toda, me faz perceber que mesmo escrevendo em português eu não estaria livre deste risco. Na era das redes sociais, escrever errado se tornou o padrão da comunicação, por que as pessoas valorizam mais o “fazer-se entender” do que o propriamente escrever corretamente. Neste paradigma, escrever correto, na verdade aumento o risco do locutor não ser compreendido…( 🙂 ). Então por que não arriscar?
Assim, os posts relacionados a coisas técnicas e coisas de nerd deste blog serão em inglês. Os outros, posts, relacionados a corrida, bike e outras aventuras seguirão em português.