Research

With the event of the Internet, video and image files are widely shared and consumed by users from all over the world. Recent studies point out that one out of two internet users have engaged in activities classified as illicit. Unauthorized copy, distribution or publishing of digital content without the proper rights holder consent is what is commonly called piracy. Those that profit from digital piracy ignore the intellectual property laws and copyrights from the owners, programmers, distributors and many others that live and depend on the economic value of these assets. Methods to identify these files have emerged to preserve intellectual and commercial rights such as content-based identification techniques also known as perceptual hashing or watermarking.

I have performed studies to demonstrate the possibility of video copy identification using image perceptual hash algorithms and presented the results obtained in my master thesis. Existing opensource tools such as  OpenCV library, MySQL, Python and ImageHash were used. An identification framework was proposed using a similarity threshold and a similarity coefficient with the combination of image perceptual hash algorithms to find the video with the higher probability of copy. It is concluded that it is possible to use perceptual hash algorithm of images for video copy identification, however, the combination of more than one of them fills performance gaps and vulnerabilities. It is necessary to stress that fighting piracy plays a social role in its efforts to preserve individual and collective rights, return of investments, creative and production value maintenance and even sustainability of jobs. Future work could include the use of machine learning or artificial intelligence to identify key and repeated frames. Parallel computing could be used in order to increase performance and calculation of the results. One can also explore the insertion of other attack-resistant perceptual hash algorithms to add robustness to the system. The number and characteristics of the attacks, as well as the weights assigned for the generation of the similarity coefficient number could be altered to ensure coverage of as many vulnerabilities as possible and greater efficiency.

In general i’m interested in image processing, image and content identification using hashs, watermarking, opencv and other magics.