Choose your Ecosystem and love your choice

The phrase “Choose your love and Love your choice” was said in 2011 by Thomas S.Monson, it has to do with marriage and love and fidelity but it made much more sense to me this last days when I decided to move away from Iphone\iOS and go to Android. Then I paraphrased Thomas with the following: Choose your Ecosystem and love your choice.

Well, it is no news that Android, Apple and Microsoft have their own set of tools and programs to deal with the daily human activities. Contacts, messages, pictures, movies and everything else you produce in your phone or tablet has a way to get to the infamous CLOUD. No drama, for most stuff you can easily transfer ones data from one place to another without too much of a hassle. Until you hit a bump that is impossible to transpose – unless you pay – and the sentence will make more sense. Stick to your damn operational system. If you are an apple guy.. be happy with it, Android guy… lovely. Going back and forth will give you nothing but headaches.

Here´s my recent nightmare.

My wife has been an Apple user since day one. She had (still do btw) an Iphone 6. But the old bastard was getting less and less effective on the battery duration. Time to buy another one. Buying another Iphone in today´s price standards is simply impossible for an average guy like me. No fu&%%g way! time to get a cheaper one.. and Samsung A something with Android on it. Perfect solution right!? Yes, for everything but Whatsapp.

Did you know that there is no way to recover your Whatsapp backup file from Icloud and use in Android an vice-versa? Horrible!!!! The only free way is to re- message (email) yourself each and every chat you have interest on. Crazy!! There is no easy way to get it on the same app if you migrate from one platform to another. unless you use a paid solution.

Why recovering was so important.

You could ask yourself -Why the heck do you want to recover your messages so badly? Ohh it could be for any reason, business, personal and sentimental purposes.. or any other prosaic motivation. In my case, my father-in-law had recently passed away and his voice messages along with his conversation with my wife was in her old phone. You get how stressed she was when I gave her newly configured phone without the old man messages.

Recovering it

After googling a little bit I found the Wondershare Dr.Fone Toolkit iOS & Android application that promises to deliver the job. But not for free. I had to pay 21$ US dollars to get the license of the thing without knowing for sure it would work. A leap of faith was needed…

After some reading, I was able to recover all her messages an chats from the Iphone to the Android. The process was simple and straigth forward… besides the waiting time.. it was no rocket science to accomplish the task.

Only one minor detail to add. The messages and chats are all recovered to the Android. but the order of the are not organized by date.. it recovers it all randomly across the timeline which is a bit weird.

Christmas Challenge 2019

Hi there

Some of you might remember the Christmas Challenge 2018 right? It is about to become a tradition!!

This time I had an encounter with a rather unusual Santa Claus and he sent me a curious message along with a set corrupted files that I can’t open. He said that I had to read carefully the message below to be able to crack it… ingenious Santa!

The message goes like this:


“I come with many colors, so beautiful and bright, I turn so many houses into a beautiful sight. What am I?”


The files Santa sent:

SantaClausImage.jpg, key.txt, checksum



Have fun a excellent Christmas!

HTTP and DNS Export to ElasticSearch using NPROBE

One of these days I was trying implement just the Nprobe module from the NTOP stack as I didn’t need the whole pack. Nprobe is netflow collector whiting Ntop. The idea was to ship all the netflow data to Logstash, then have it converted to ElasticSearch and viewed in Kibana.

The logstash has a netflow module to be used and that can be consulted here. If you have a opensource netflow collector that basic logstash module  will work just fine without much effort, the free version of Nprobe will also work limited to a number of lines. None of that was useful to me unless I had the DNS and URL data. After paying 500USD for the license I was supposed to see the URL and DNS info in Kibana..but nothing was showing up and had to dig a bit to discover the reason and here is what I have found.


Finding the problem

First thing I have done was to send an email to the support. As you can see in the shopping table I was entitled to a 5 days support from them.. so I used it. Here is their answer:

Hi Anderson,
In order to export additional fields you need to specify the fields to the nProbe template. A basic template for your use case should contain %HTTP_HOST and %DNS_QUERY . The resulting template option is:

Please check out "nprobe -H" for more details.

It is curious but you actually have to specify each and every field you want to export from nprobe… not just the exception….. craaazy. So I added the line below to the configuration file:


Then next logical path was to check whether the nprobe was sending the data and if the data was being received by Logstash. On the Nprobe node a simple tcpdump looking for the port set in the configuration file – my case it was port 2055 – would give us that answer.

All cool here,

Next step is to check for errors in the logs. With the command journactl -xf running in the logstash node I found the errors below.

[2019-09-16T19:07:40,488][WARN ][2019-09-16T19:57:16,214][WARN ][logstash.codecs.netflow ] Unsupported field in template 259 {:type=>57678, :length=>2}
[2019-09-16T19:57:16,215][WARN ][logstash.codecs.netflow ] Unsupported field in template 260 {:type=>57678, :length=>2}
[2019-09-16T19:57:16,216][WARN ][logstash.codecs.netflow ] Unsupported field in template 261 {:type=>57652, :length=>128}
[2019-09-16T19:57:16,217][WARN ][logstash.codecs.netflow ] Unsupported field in template 262 {:type=>57652, :length=>128}

It basically says that there is a template and some fields are not being recognized. Using Wireshark I was able to spot exactly what were the fields not being recognized by the Logstash. Ha! The fields I wanted related to HTTP and DNS fields.

Ok, I found the error. Cool. So NProbe module that I purchased is sending the data to Logstash but the Nprobe module in Logstash node is not ready to parse this fields. What do to next?

Fixing It

I found out that in the path – usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-netflow-4.2.1/lib/logstash/codecs/netflow there is a netflow.yaml file which is the template the error is refering to. We then need to add the following lines to this template as per what was observed in the wireshark analysis.

- :string
- :http_url
- :uint16
- :http_ret_code
- :string
- :http_referer
- :string
- :http_ua
- :string
- :http_mime
- :string
- :http_host
- :string
- :dns_query
- :uint16
- :dns_query_id
- :uint16
- :dns_query_type
- :uint16
- :dns_ret_code

Now we can see the logstash receiving the data as it should and processing it all right. Next step is to build your nice dashboard in Kibana.

The Password Sharing Problem – A Paradox

The password sharing activity has happened to me at least in two different occasions, and I’m certain it has happened with you too. See if this story is familiar to you. A family that had recently purchased a smart tv asked to use the service, NETFLIX – they were not fond of using credit card transactions. In a under developed countries having people afraid of using their credit card is quite common. The second instance was during my master course, where I had purchased a software that would ease the transcript of documents and the process of writing the dissertation. A friend wanted me to share it to ease his burdens. In both occasions I didn’t feel bad. I don’t think anybody does.

Recent research tells that the younger the audience the bigger the probability of one share his password. It is 35% more likely that a millennial will share his password whereas 19% of the Generation X and 13% of Baby Boomers will do the same. Being a Generation Y dude or in other words a millennial, I’m apparently genetically pre-disposed to share my password (sic).

Working for the Pay-TV industry I’m led to wonder and ask: Is this really a problem? if it is, how can it be solved?

Determining the problem

Netflix is expected to close 2019 with over 155 millions subscribers worldwide. The ever increasing number of users who share their credentials is at 10%. It means in practical terms that with a near $10 dollars average ingress ticket (15,5 x $9,95), Netflix is losing 155 million dollars per year in revenue.

Netflix caused the TV industry to modify themselves to catch on the new type of consumers. Amazon came next, and after dozens of “Watch Everywhere” platforms enables one to watch his favorite show. In essence, if there is a channel there is an app to watch it. But that is not all. The TV operators also have their place for users to log in and watch, movies and their TV shows. The programmers (those that create the content) partnered with the TV/Internet operators (those that distribute the content) to bring entertainment close to the end users.

Each one of these “nodes” have their access conditions. 3 simultaneous devices, 10 simultaneous sessions and so on. If you consider that a family account is shared with one of the children to be used in college, that is not a problem. But the problem seems to arise when one considers the amount of concurrent sessions given by the platforms when they are stacked. If you go to Twitter for example and perform a simple search, you will find numerous events where people request and are given each other’s passwords. That keeps circling until it reaches its limit (none). It is not uncommon to see a single credential be used by no less than 50 different people.

It is piracy! People using, consuming a service they have not paid for, and it is causing the industry damages in the order of billions. According to Parks Associates the TV industry’s losses from password sharing are expected to rise to $9.9 billion by 2021 from $3.5 billion this year.

Crap! Are the channels devaluing their product by allowing multiple logins or are they actually promoting it? It seems to be an interesting paradox.

Finding the Solution

If the operators and channels transform the login into a cumbersome-too-many-steps process it will drive customer away. At one point the executives of such companies have made the math between the losses of sharing passwords versus the losses of stopping it and stimulate other forms of piracy. It is indeed risky! If it becomes crazy expensive and restrictive, nobody will sign it. How to find the balance? Netflix, Amazon and a few other have accepted it as the occupational hazard of being in the new era. How can they grow despite of this problem? Is it a matter of calculated risk? Or the benefits are bigger than the damages? Or the contrary, it is not that they are not doing anything.. but doing just enough to keep balance?

It is clear to me that the matter it not about feasibility but about its impactc. It is more about user experience than security. Note we have not even touched the aspect that a portion of this users are using the same password for more than one service, including the important ones. Apparently nobody cares.

Then, for the sake of being practical, lets consider that those CEOs have decided to fix the problem but also maintain the number of concurrent sessions high. We also have the challenge to avoid transforming the user experience into a boring multiple steps authentication. What can be done? The most popular solutions out there are around the following:

  • Maximum number of concurrent streams
  • Device session control
  • Geoblocking with VPN detection

Machine learning curiously have not being largely used to prevent this. Machine learning and AI could be used, for instance, to identify user consumption pattern. The regular user demonstrate certain habits in the media genre, length, time of the day and so on. The absence of a pattern from a frequent login could be also be an indication of misusage. The device ID in conjunction with geo location calculations could also be used to determine if a user is way too far from each other or in a location that is not a match with the identified profiles of that account.

The advantage of using machine learning to construct user behavior patterns is that it will automatically change along with the user taste instead of a hard coded algorithm that would have to be manually updated. Other sources of data could also integrate the learning database such a zipcode, payment address, device type (Android, iOS, SmartTv) ad many others.

Well, technology is available to be used. There is a cost in usability and also a cost in security which have to be leveraged to compose the best solution. Companies in the early stages of OTT launching seems to be more lenient towards password sharing as it is perceived as a way of promoting the new business, however as the business and interest grows, the opinion towards that 10% loss is radically changed.

ELK: Deleting unassigned shards to restore cluster health

Had a red flag on my ElasticSearch cluster these days and found that the reason was related to an unassigned shards between the nodes.

As the data I collect is not that sensitive I could easy delete it and recreate in case I need in the future. But first we need to find it. There is many articles on the internet to help one to understand the shards allocation but I offer here a simple solution which is – simply delete the bastard.

First, we check on the cluster health and get the count of unassigned shards.

curl -XGET http://<elastichost>:9200/_cluster/health?pretty | grep unassigned_shards

The list of unassigned shards can be retrieved using:

curl -XGET http://<elastichost>:9200/_cat/shards | grep UNASSIGNED | awk {'print $1'}

And if you want to delete every unassigned shard in the list above, you could send it to xargs and do a DELETE.

curl -XGET http://<elastichost>:9200/_cat/shards | grep UNASSIGNED | awk {'print $1'} | xargs -i curl -XDELETE "http://<elastichost>:9200/{}"

In my case the I found the name of the index that was cause problems and I deleted

 curl -XDELETE "http://<elastichost>:9200/INDEXNAME


Done Cluster health is green again.


Delete ALL MySQL/MariaDB tables at once

Was playing with the wordpress tables and screwed up. Needed then to delete all the existing tables of the database, but didn’t want to do it one by one..

Here is a snippet that may be helpful for such tasks.

mysql -u USER -p PASSWORD -D DATABASE  -e "show tables" -s | 
egrep "^wp_" | xargs -I "@@" echo mysql -u USER -p PASSWORD -D DATABASE-e "DROP TABLE @@"


Change the USER, PASSWORD and DATABASE accordingly. And there is no space after the -P for the password


Video Watermarking, Collusion or Convolution Attack

There are several ways to protect the audiovisual content and watermarking is  one of them. It is arguably the best solution against content distribution via streaming, simply because it allows one to identify the source of the media theft.

Watermarking, which originally was created for image protection has been intensively researched in the past decade and is now possible to be applied not only in static videos but also in live streaming. That can be done in the hardware or software level and the mark can be inserted in the frames, key frames, bits, video sample and many other ways. It is an amazing technology! It is offered by the specialized companies as the ultimate protection against piracy.. for a lot of money off course.

There are certain desirable characteristics in these type of forensic measures that make it useful to be implemented to prevent piracy and I would like to discuss them first, before getting to the real purpose of this article.


Can you image a soccer transmission where you see a giant logo or number in the screen? That would not be the best way to put a mark on the content. Yet, it needs to be there somewhere. Nobody cares if the owner of the content has to insert something in it as long it doesn’t impact the end-user experience. There should be no degradation in the quality of the video too. The only and best way to do it is by inserting the mark invisible to the human eyes. Or if not invisible, imperceptible.


Robustness means that it should be difficult (if not impossible) to remove the watermarking from the media. What about making it not only invisible but moving? or random? uhh.. what about having it injected in different intervals? or a mix of sound and video marks? So the essence of the term robustness applied to this type of technology is to make it resistant to actions such as resizing, cropping, compression, rotation, noise,  and many other attacks that may be applied in the effort to remove the mark.

Pairwise Independence

This is one is the easiest ! Pairwise independence refers to fact that there shouldn’t be two equal marks in the same media. Although you can carry multiple different marks in the same media (say from different distribution path) they should not be equal.


Collusion Attack

Ok. Now that I have covered what the watermarking algorithm should have to be good I want to discuss a little bit what can be done to break it. Recent watermarking solutions are resistant to the common attacks – resizing, cropping, noise, compression and image overlay. There is one attack, however that still remains a challenge for must companies and it is called – The Collusion attack. The attack consists in merging two sources of the same video to form a third one. That new product would be then without the watermark or in some cases it would have two marks and make it difficult for the source identification.

Colluders collect several watermarked documents and combine them to produce digital content without underlying watermarks.

There are two basic types of collusion attack

Type 1 – In this type of collusion attack, attacker obtains several copies of the same work, with different watermarks. Here, the attacker tries to find out the video frames which are similar in nature. Hence, frames belonging to the same scene have a high degree of correlation. The attacker then separates various scenes of the video. Then statistical average of the neighboring frames is done to mix the different marks together and computes a new unmarked frame. Type-1 collusion attack can only be successful if successive frames are different enough.


Type 2 – In this type of attack, the attacker obtains several different copies that contain the same watermark and studies them to learn about the algorithm. Then several copies are averaged by the attacker. If all copies have the same reference pattern added to them, then this averaging operation would return something that is closed to the pattern. Then, the average pattern can be subtracted from the copies to generate an unmarked video.

It seems complicated but there are several encoders out there that are able to perform the collusion attack without you having to study all this stuff.

Collusion or Convolution?

I was caught in a curious discussion with a friend when the term collusion was first presented to me. Although the technique made sense and sounded reasonable I had never heard about it before. He on the other hand didn’t know about Convolution either. So which term is the correct one, when referring to merging two sources to produce a third? In the literature the term convolution is used to describe a math operation of two functions (f and g) to produce a third function that expresses how the shape of one is modified by the other. The term convolution refers to both the result function and to the process of computing it. It is defined as the integral of the product of the two functions after one is reversed and shifted. While collusion is about people getting together to defraud a system. Both terms are correct, in my humble opnion and context helps to employ them properly. If one would be talking about people getting together to remove watermark that would be Collusion (could be a single guy btw). if you are talking about the math process to merge to different signals and produce a third than it is Convolution.

4 letter Portuguese diceware generator

The concept of diceware is pretty awesome! You can read the nasty details here. It requires you to have a dice… lame! Who will bring a dice to work to generate passwords!? Come on?! The principle is really cool, regardless.

We need some code to do it. This lady did it! It is meant for you to use words that you will remember without loosing the security aspect and prevent you from using “abc123”.

What is the composition of a diceware password?

The recommended size is 4 sets of words – separated by space. The size of the words can vary from 1 to 5 characters each.

At work I wanted to make it simple and standardized, so I choose a set of 4 words with 4 letters each. You can have a list of a lot of words within that pattern. But you can use the list however you want.. The more the better. The idea is that the words are easy to remember, so it has to be within your language dictionary and composed within your pattern. In my case it is Portuguese… check this one:

roxa …

So the password would be something like – “volt come giro poti”

How strong is it?

Considering the 4 sets of 4 letters words (no pun intended) I’m using, the size of it is of 152 bits. And I’m counting the space bits as well that actually has one byte ( we have 3 spaces). It basically yields a gigantic number of possibilities.. something around the 5,708990770823839524233143877798e+45. That’s right.. 45 other digits after the last one seen.

If we count the characters only without meaning + plus space. The number of combinations would be smaller but still very big. But would be hard to remember a 4 set of random letters and we would be back to this “autg xdrv gvcn xmg”, right? That’s not a choice. What we need then is a list with words that make sense in our language. So let’s get one. You can generate yours (good idea), look on the internet, or grab from a book.

Say you have finished editing your list and ended up with 1000 words in it. That would give you a 1000 * 1000 * 1000 * 1000 = Yep, that is a trillion. With your crappy list of a 1000 random 4 letter words, you would get a trillion different passwords, that would be actually easy to remember.

So basically what you have to do is process that list and spit each word randomly to compose your password. It would be rolling the dice for you.

Here is the code for your reference, the list is here.


import random

with open('words.txt','r') as f:
   mywords = [line.strip() for line in f]
print 'New Password: %s %s %s %s' %(random.choice(mywords),random.choice(mywords),random.choice(mywords),random.choice(mywords))

But this is not the diceware per se. The real diceware requires you to roll the dice 5 times to get each word. So each word of your dictionary would be assigned a number that goes from 11111 – 66666 getting you a list of 7776 unique words. Than our calculations becomes even more interesting now. Resulting in 7776 * 7776 * 7776 * 7776 = 3.656.158.440.062.976. I don’t know how to say that number in English! This is where the trues randomness in python needs to be explained, because I’m not rolling no freaking dice 5 times!

True or Pseudo Randomness

In computers system true randomness (rolling the dice) is hard to be achieved. Randomness is described as follow:

Randomness is the lack of pattern or predictability in events. A random sequence of events, symbols or steps has no order and does not follow an intelligible pattern or combination. Individual random events are by definition unpredictable, but in many cases the frequency of different outcomes over a large number of events (or “trials”) is predictable. For example, when throwing two dice, the outcome of any particular roll is unpredictable, but a sum of 7 will occur twice as often as 4. In this view, randomness is a measure of uncertainty of an outcome, rather than haphazardness, and applies to concepts of chance, probability, and information entropy.

In python for example the pseudorandomness method is used and is based on a set of mathematical functions called Mersenne Twister. In python the function “random” is used to generate a sequence of numbers and it takes a “seed” to start off. That is a deterministic way of generating numbers. You can choose that seed but generally the time of the system in milliseconds from epoch (1970) is used. Let me give you an example.

from random import seed
from random import random

# seed random number generator

# generate some random numbers
print(random(), random(), random())

# resetting the seed to 1 again

# see the pseudo thing happening
print(random(), random(), random())

You get two sets of random, but predictable numbers like the following.

You can see that after resetting the seed value, the randomness started off again from the same point, and the “randomness” is the same from the that point onwards.. hence the term pseudorandomness and deterministic.

As we set the seed number to 1, the random numbers will be given within the interval 0 and 1. Predicting the randomness can be useful to be used in production financial, engineering or machine learning systems.

If we use the python pseudo random function in a list, without setting a seed value (there is no point in it anyway) the result will be given based on a uniform likelihood or in other words, the choices are distributed evenly. In a list of 1000 words like the one I used, the likelihood of a given word to be given as a results is 1/1000 or 0,1%.

All that to say, that we don’t really need to roll the dice five times since the “entropy” is embedded in the python function.

Not enough?

If none of that is sufficient for you, you can order a true diceware password (made on paper) for 2 dollars.

I just got my MASTERS!! Yeahhhh – And what have I learned with it?

This Feb 28th is the so called “Thesis Defense” day. It is where me, myself and I, after submitting the theses papers, put myself at the disposal of the thesis committee. In this case, “defend” does not imply that a I will have to argue aggressively about my work (although I see myself doing it).

Resultado de imagem para fight gif

Rather, the thesis defense is designed so that faculty members can ask questions and make sure that students actually understand their field and focus area. It serves as a formality because the paper will already have been evaluated ( have been… it is called Qualification Process). During a defense, a student will be asked questions by members of the thesis committee. Questions are usually open-ended and require that the student think critically about his or her work. The event is supposed to last from one to 3 hours, I have heard it could take more.. geees!

The Defense, is the crowning event of at least 2 years of hard study, dedication and sacrifice. And I want to tell you what I have learned with it.

It is not as hard and mystic as it seems

At least here in Brazil, masters, or as we call it “Mestrado” is not as common as it should be. It has some sort of mysticism around it, like if it was reserved for a certain “class” of student and society. People really tend to go for an MBA. MBA in Brazil, although stands for – Masters in Business Administration, has nothing to do with masters Strict Sensu and it just a Lato-Sensu course, or a specialization. I’m not taking out the credit of those that chooses the MBA, but it is different. The MBA, in the country has a more commercial practical focus. Also it is offered during the night, or weekends which helps a lot those that actually have a job to attend. I guess that is the main reason people tend to go that way. In the other hand a full blown masters course, is considered too academical or meant for those that want to pursue a professor, researcher or academic career. This is not entirely true! You can enroll to a master course, continue to work in the industry and solve a real world problem.

This is part of the mysticism that goes around a Masters course. It is not only meant for academics, it is not meant only for super nerds researchers and you can do it while you keep your actual job! It is true, that the work you have must give you certain flexibility and freedom to cope with the crazy schedules that some schools push, but it is possible.

You can have a job that is not related to universities and academic world. You can work anywhere you want. In fact, big tech companies are the ones that employs masters graduates the most. And, there is a probability that your salary increases by up to 80% if you have a master degree.

The professors and board of teachers are pretty much regular people, with experience on some topics and areas of research.. but are NOT the owners of the entire knowledge. It is pretty common that the student knows more about a given topic than the professor. He is there, to help you to adjust your thought process and writing your ideas within the accepted scientific methodology but he is not a God with omniscience. You can argue, you can defend a statement, heck, you can actually fight with your professor (although not a good idea) if you think X is equal Y.

The other aspect of the mysticism of the Master course is that you are there to learn and have classes.Myth! There is no way for the school to teach each student the specifics about his work. What you actually learn is how to organize your thought process, how to treat numbers you may collect from your research, how to use others work to build the fundamentals of your work. That is it. Don’t expect to be there and have classes of advanced math or signal processing or any in depth classes about your field. That’s not going to happen. Instead you have several classes of debate, several seminars to expose your ideas and have the other student to confront, challenge and disagree with you. You have some writing classes, you have basic statistics classes and a lot of seminars. That’s where the knowledge and ideas are born. That’s where you mature and learn how to “defend” your work and identify potential flaws in it, by discussing it with other people.

Since it depends on you, to understand, collect, test, treat and present the work.. it is not as hard as it seems. give it a try. You might surprise your self.

It is a victory in solitude

Resultado de imagem para solitude gif

It is sad! I know. But it is the hard truth. There is a big chance that not a single soul around you will actually understand what you are doing. I mean, friends, family co-workers. None of them will, one – be interested about what you discovered, two – be willing to discuss it in detail. Nobody cares! If you are married, your wife will be interested about when you will finish it so she can have you back to regular life. Or when you will stop spending the night reading to give more attention to your kids, or – my case – When you will be finished to be able to request a raise at your current job!

There wont be any question about the inner details of your research, and if they initially show interest.. that is rapidly lost when you start going on and on about it.

Upon completion, your friends will be excited to know that you have finished it with success.. remember the myth that it is super hard and reserved for some people? Sure, they will be thrilled with the news. But don’t think they are really interested in bits and bytes of it. And it is not because the don’t like you, or have no interest on your stuff.. it is just because the don’t understand it.. and it is really hard to relate with something you have no idea about.. they are (as everybody else) afraid to look stupid.

Isn’t it sad? That you research, and successfully develop something that could be used by society and have the potential to put your name in the field history.. and nobody cares?! You can’t share it, or brag about it :)?! Come on!! So sad!

It is indeed a victory in solitude. Be glad you made it. The hours, the sacrifice is totally worth!!. Knowledge is one the few things you can actually keep and its value can’t be measured. It is however a satisfaction that only you will feel in its fullness. It is ok!

Are you curious to know what I have studied? Probably not, Maybe?!

My thesis is called – Image Perceptual Hash applied for Video Copy identification. and here is the abstract.

With the event of the Internet, video and image files are widely shared and consumed by users from all over the world. Thus, methods to identify these files have emerged as a way to preserve intelectual and commercial rights. Content based identification or perceptual hashing is the technique capable of generating a numeric identifier from the image characteristics. With this identifier, it is possible compare and decide if two images are equal, similar or different. This study has as objective discuss the application of image perceptual hashing to identify video copies. It proposes the usage of known and public methods such as the Average and Difference Hash that are based on statistics of the image also Phash and Wavelet hash that are based on the image frequency.

An identification technique was applied using a Hamming distance similarity threshold and the combination of perceptual hash algorithms for video copy identification. The method was tested by applying several attacks to the candidate video and the results were properly detailed. It is possible to use perceptual hash algorithms for video copy identification, and there are benefits when there is a combination of more than one of them filling performance gaps and vulnerabilities eliminating false positives