This time I had an encounter with a rather unusual Santa Claus and he sent me a curious message along with a set corrupted files that I can’t open. He said that I had to read carefully the message below to be able to crack it… ingenious Santa!
The message goes like this:
“I come with many colors, so beautiful and bright, I turn so many houses into a beautiful sight. What am I?”
One of these days I was trying implement just the Nprobe module from the NTOP stack as I didn’t need the whole pack. Nprobe is netflow collector whiting Ntop. The idea was to ship all the netflow data to Logstash, then have it converted to ElasticSearch and viewed in Kibana.
The password sharing activity has happened to me at least in two different occasions, and I’m certain it has happened with you too. See if this story is familiar to you. A family that had recently purchased a smart tv asked to use the service, NETFLIX – they were not fond of using credit card transactions. In a under developed countries having people afraid of using their credit card is quite common. The second instance was during my master course, where I had purchased a software that would ease the transcript of documents and the process of writing the dissertation. A friend wanted me to share it to ease his burdens. In both occasions I didn’t feel bad. I don’t think anybody does. Continue…
Had a red flag on my ElasticSearch cluster these days and found that the reason was related to an unassigned shards between the nodes.
As the data I collect is not that sensitive I could easy delete it and recreate in case I need in the future. But first we need to find it. There is many articles on the internet to help one to understand the shards allocation but I offer here a simple solution which is – simply delete the bastard.
First, we check on the cluster health and get the count of unassigned shards.
There are several ways to protect the audiovisual content and watermarking is one of them. It is arguably the best solution against content distribution via streaming, simply because it allows one to identify the source of the media theft.
Watermarking, which originally was created for image protection has been intensively researched in the past decade and is now possible to be applied not only in static videos but also in live streaming. That can be done in the hardware or software level and the mark can be inserted in the frames, key frames, bits, video sample and many other ways. It is an amazing technology! It is offered by the specialized companies as the ultimate protection against piracy.. for a lot of money off course.
There are certain desirable characteristics in these type of forensic measures that make it useful to be implemented to prevent piracy and I would like to discuss them first, before getting to the real purpose of this article.
Can you image a soccer transmission where you see a giant logo or number in the screen? That would not be the best way to put a mark on the content. Yet, it needs to be there somewhere. Nobody cares if the owner of the content has to insert something in it as long it doesn’t impact the end-user experience. There should be no degradation in the quality of the video too. The only and best way to do it is by inserting the mark invisible to the human eyes. Or if not invisible, imperceptible.
Robustness means that it should be difficult (if not impossible) to remove the watermarking from the media. What about making it not only invisible but moving? or random? uhh.. what about having it injected in different intervals? or a mix of sound and video marks? So the essence of the term robustness applied to this type of technology is to make it resistant to actions such as resizing, cropping, compression, rotation, noise, and many other attacks that may be applied in the effort to remove the mark.
This is one is the easiest ! Pairwise independence refers to fact that there shouldn’t be two equal marks in the same media. Although you can carry multiple different marks in the same media (say from different distribution path) they should not be equal.
Ok. Now that I have covered what the watermarking algorithm should have to be good I want to discuss a little bit what can be done to break it. Recent watermarking solutions are resistant to the common attacks – resizing, cropping, noise, compression and image overlay. There is one attack, however that still remains a challenge for must companies and it is called – The Collusion attack. The attack consists in merging two sources of the same video to form a third one. That new product would be then without the watermark or in some cases it would have two marks and make it difficult for the source identification.
Colluders collect several watermarked documents and combine them to produce digital content without underlying watermarks.
There are two basic types of collusion attack
Type 1 – In this type of collusion attack, attacker obtains several copies of the same work, with different watermarks. Here, the attacker tries to ﬁnd out the video frames which are similar in nature. Hence, frames belonging to the same scene have a high degree of correlation. The attacker then separates various scenes of the video. Then statistical average of the neighboring frames is done to mix the different marks together and computes a new unmarked frame. Type-1 collusion attack can only be successful if successive frames are different enough.
Type 2 – In this type of attack, the attacker obtains several different copies that contain the same watermark and studies them to learn about the algorithm. Then several copies are averaged by the attacker. If all copies have the same reference pattern added to them, then this averaging operation would return something that is closed to the pattern. Then, the average pattern can be subtracted from the copies to generate an unmarked video.
It seems complicated but there are several encoders out there that are able to perform the collusion attack without you having to study all this stuff.
Collusion or Convolution?
I was caught in a curious discussion with a friend when the term collusion was first presented to me. Although the technique made sense and sounded reasonable I had never heard about it before. He on the other hand didn’t know about Convolution either. So which term is the correct one, when referring to merging two sources to produce a third? In the literature the term convolution is used to describe a math operation of two functions (f and g) to produce a third function that expresses how the shape of one is modified by the other. The term convolution refers to both the result function and to the process of computing it. It is defined as the integral of the product of the two functions after one is reversed and shifted. While collusion is about people getting together to defraud a system. Both terms are correct, in my humble opnion and context helps to employ them properly. If one would be talking about people getting together to remove watermark that would be Collusion (could be a single guy btw). if you are talking about the math process to merge to different signals and produce a third than it is Convolution.
The concept of diceware is pretty awesome! You can read the nasty details here. It requires you to have a dice… lame! Who will bring a dice to work to generate passwords!? Come on?! The principle is really cool, regardless.
We need some code to do it. This lady did it! It is meant for you to use words that you will remember without loosing the security aspect and prevent you from using “abc123”.
What is the composition of a diceware password?
The recommended size is 4 sets of words – separated by space. The size of the words can vary from 1 to 5 characters each.
At work I wanted to make it simple and standardized, so I choose a set of 4 words with 4 letters each. You can have a list of a lot of words within that pattern. But you can use the list however you want.. The more the better. The idea is that the words are easy to remember, so it has to be within your language dictionary and composed within your pattern. In my case it is Portuguese… check this one:
volt come rena xepa giro poti roxa …
So the password would be something like – “volt come giro poti”
How strong is it?
Considering the 4 sets of 4 letters words (no pun intended) I’m using, the size of it is of 152 bits. And I’m counting the space bits as well that actually has one byte ( we have 3 spaces). It basically yields a gigantic number of possibilities.. something around the 5,708990770823839524233143877798e+45. That’s right.. 45 other digits after the last one seen.
If we count the characters only without meaning + plus space. The number of combinations would be smaller but still very big. But would be hard to remember a 4 set of random letters and we would be back to this “autg xdrv gvcn xmg”, right? That’s not a choice. What we need then is a list with words that make sense in our language. So let’s get one. You can generate yours (good idea), look on the internet, or grab from a book.
Say you have finished editing your list and ended up with 1000 words in it. That would give you a 1000 * 1000 * 1000 * 1000 = 1.000.000.000.000. Yep, that is a trillion. With your crappy list of a 1000 random 4 letter words, you would get a trillion different passwords, that would be actually easy to remember.
So basically what you have to do is process that list and spit each word randomly to compose your password. It would be rolling the dice for you.
with open('words.txt','r') as f:
mywords = [line.strip() for line in f]
print 'New Password: %s %s %s %s' %(random.choice(mywords),random.choice(mywords),random.choice(mywords),random.choice(mywords))
But this is not the diceware per se. The real diceware requires you to roll the dice 5 times to get each word. So each word of your dictionary would be assigned a number that goes from 11111 – 66666 getting you a list of 7776 unique words. Than our calculations becomes even more interesting now. Resulting in 7776 * 7776 * 7776 * 7776 = 3.656.158.440.062.976. I don’t know how to say that number in English! This is where the trues randomness in python needs to be explained, because I’m not rolling no freaking dice 5 times!
True or Pseudo Randomness
In computers system true randomness (rolling the dice) is hard to be achieved. Randomness is described as follow:
Randomness is the lack of pattern or predictability in events. A random sequence of events, symbols or steps has no order and does not follow an intelligible pattern or combination. Individual random events are by definition unpredictable, but in many cases the frequency of different outcomes over a large number of events (or “trials”) is predictable. For example, when throwing two dice, the outcome of any particular roll is unpredictable, but a sum of 7 will occur twice as often as 4. In this view, randomness is a measure of uncertainty of an outcome, rather than haphazardness, and applies to concepts of chance, probability, and information entropy.
In python for example the pseudorandomness method is used and is based on a set of mathematical functions called Mersenne Twister. In python the function “random” is used to generate a sequence of numbers and it takes a “seed” to start off. That is a deterministic way of generating numbers. You can choose that seed but generally the time of the system in milliseconds from epoch (1970) is used. Let me give you an example.
from random import seed
from random import random
# seed random number generator
# generate some random numbers
print(random(), random(), random())
# resetting the seed to 1 again
# see the pseudo thing happening
print(random(), random(), random())
You get two sets of random, but predictable numbers like the following.
You can see that after resetting the seed value, the randomness started off again from the same point, and the “randomness” is the same from the that point onwards.. hence the term pseudorandomness and deterministic.
As we set the seed number to 1, the random numbers will be given within the interval 0 and 1. Predicting the randomness can be useful to be used in production financial, engineering or machine learning systems.
If we use the python pseudo random function in a list, without setting a seed value (there is no point in it anyway) the result will be given based on a uniform likelihood or in other words, the choices are distributed evenly. In a list of 1000 words like the one I used, the likelihood of a given word to be given as a results is 1/1000 or 0,1%.
All that to say, that we don’t really need to roll the dice five times since the “entropy” is embedded in the python function.
If none of that is sufficient for you, you can order a true diceware password (made on paper) for 2 dollars.
It is almost a year since I bought the Google Home device for my personal use. I bought the Google mini and a alternative version of the Google Home called Insignia just because it had a watch.
My kids just looooved the thing.. it was a great incentive for them to speak English and train their vocabulary.. you know ask for musics, small tasks, ask for address and such. But what I noticed was that after a while, they kept asking the same things from memory.. it wasn’t like they were actually learning anything new. And there was no way to change the language because back then it only spoke a few languages, like french and Spanish.. but no Portuguese.
But, I just noticed that after the latest update and the app revamp , Portuguese was available. It is just easier for the kids and wife to interact with the assistant now and we are really having fun!
Another upgrade is that for some reason the network driver of the Google home devices was not so good, it kept loosing the wifi signal and we often asked for something and it replied with something like “Sorry, there seems to have a problem with the network”. It was not showing up in Spotify as a possible device connection and a reboot would have to be done to have it available again.. now it is everywhere.. connection simply works!
At work we recently (not that recently) we purchased a HDMI encoder. The product comes with 16 HDMI inputs and outputs everything to the network to be consumed via RTSP, RTMP, HLS and more. However the bloody device doesn’t bring any embedded solution to merge all the outputs into a single screen. You have to treat it separately. Crazy, right?!
So, the solution then was to find something that would build the mosaic of the 16 outputs. I looked quickly for a opensource tool that would do that for you, but didn’t find anything useful. Then, had to go to old friend linux and its gizmos.
FFMPEG, I knew would do the job, but as always it would take some tinkering and lots or reading…I had no choice but go with it.. and to my surprise was quicker, simpler and funnier than I thought. Continue…