Digital Video Source Identification Based on Container’s Structure Analysis

The mobile device ecosystem has dramatically evolved over the last few years, since users have openly embraced a massive use of mobile phones for different purposes: professional use, personal use, etc. Digital videos can be used to define legal responsibilities or as part of the evidence in trials. The forensic analysis of digital videos becomes very relevant to determine the origin and authenticity of a video in order to link an individual with a device, place or event. The field of forensic analysis of digital videos is constantly facing new and direct challenges. Even though the basic principles of this discipline remain unchanged, numerous issues appear every year that require new procedures and tools. Therefore, it is necessary to provide forensic analysts with techniques to identify the origin of multimedia content. In this paper, the topic of source identification in open scenarios will be discussed, since analysts do not know in advance the set of cameras to which a video belongs so they find it difficult to identify its source. This approach is similar to real-life situations since in most cases, analysts are unaware of the set of video cameras. This paper aims to create a technique that identifies the source of digital videos generated by digital devices through the use of unsupervised algorithms based on the analysis of the structure of multimedia video devices.


I. INTRODUCTION
The field of mobile devices forensics has emerged in recent years becoming one of the most important areas of investigation, for several reasons. Firstly, the capabilities of smart devices have improved substantially, and now smartphones are being used more than laptops since users have them at their fingertips at any time of the day. Moreover, these smart devices are constantly recording our activities and movements, which in turn give us a vision of our behaviour [1].
The combination of smart mobile phones with social media platforms and cloud storage has allowed video to become a major source of information for many people and institutions. In turn, these digital videos can be made at any time and anywhere for different purposes. They can also be distributed on the internet in a short period of time and The associate editor coordinating the review of this manuscript and approving it for publication was Tai-Hoon Kim . sometimes they can show illegal acts such as those related to terrorism, child pornography, industrial espionage, etc. The presence of digital videos in judicial investigations is increasingly common. To address these problems, researchers have developed forensic algorithms that verify the authenticity and source of digital content [2]. Forensic techniques that identify information about the source when multimedia content is generated (images or videos), are divided into two main approaches: on the one hand they serve to verify the origin of a multimedia content and on the other hand, they can detect inconsistencies in the source within the multimedia content that could show signs of a forgery [2], [3]. Numerous investigations develop forensic algorithms to determine the identification of the source of an image but when it comes to digital videos, research is very scarce [3]. In [4] it is suggested that these algorithms utilise traces left by a wide variety of physical and algorithmic components in a camera's processing pipeline. Forensic camera model algorithms have been designed that leverage traces left by demosaicing (Color Filter Array (CFA) method and Demosaicing artefacts), [5]- [7], Joint Photographic Experts Group (JPEG) header information [8]. The majority of existing work has focused on using sensor fingerprints to identify a video's specific source device, for example see [9]- [15].
The analysis of the source of video acquisition is one of the first problems that have emerged in forensic analysis techniques. Inside the identification of the source of acquisition there are two major approaches: closed scenarios and open scenarios. A closed scenario is one in which the identification of the source of the video is made on a set of specific and known cameras. For this approach, a set of videos from each device is normally used to train a classifier and subsequently the source of acquisition of the videos under investigation is predicted.
In [16] it is presented a digital video source identification scheme based on Photo Response Non Uniformity (PRNU) noise and Support Vector Machine (SVM). Given an input video, frames with more significant scene changes are extracted using the colour histogram. A total of 81 functions, which are the Wavelet components of the sensor, are used to train the SVM classifier with training videos. A total of 5 different devices from 5 different brands were used to train the SVM classifier. The results obtained show a success rate of 87% or 90%, depending on the resolution of the video. In an open scenario, the forensic analyst does not initially know the set of devices to which the videos belong to identify their source of acquisition. The objective is not to identify the brand and model of the videos but to be able to group them into disjoint sets in which all their videos belong to the same device. This last approach is more realistic since in many cases the analyst completely ignores the set of devices to which a set of videos can belong. Identifying the device that generates digital content is very important in the context of a judicial process because it can incriminate or delimit responsibilities to a suspect before a criminal act.
This work proposes a technique that identifies the source of acquisition of digital videos generated by different devices. It has been shown that it is possible to identify the source of acquisition of a video through the characteristics of its internal elements and its metadata. These characteristics, which are acquired during its creation process and subsequent processing, are part of what could be considered the DNA of a video, and being analyzed, can show determining information about a digital content. The present investigation focuses on the techniques of identification of source of a video, since it is a little studied field in comparison with the digital images. This paper is divided into 5 sections, the first being this introduction. Section II presents the main developed works in this context. The proposed solutions are presented in Section III. The experiments and their results are presented in Section IV. Finally, in Section V the conclusions drawn from this work are presented.

II. RELATED WORKS
Forensic video analysis techniques still raise many issues to investigate, due to the wide range of possible alterations that can be applied to them. In addition, forensic video analysis has proven to be more difficult with respect to image analysis since the data contained in the videos has higher compression formats than in the case of images.
The video is formed by a sequence of images called frames that vary over time giving a sense of movement. Due to the large volume of information that a video has, it is encoded and decoded using a mathematical algorithm known as a codec. In turn, these already encoded frames are encapsulated along with the audio, metadata and subtitle tracks in a single file known as a multimedia container. In Table 1, it is shown the different elements by which a multimedia container is composed. Multimedia containers or video formats are called computer applications that are capable of storing audio and video, and, in some cases, also subtitles and other additional information.
The most used multimedia containers nowadays are: • AVI (Audio Video Interleave): A Windows standard multimedia container.
• FLV (Flash Video): It is the format used to deliver MPEG video through Flash Player.
• MKV (Mastroska): It is another open specification container that appears in the download of animations.
In the most recent literature, it can be find that most of the investigations analyze the internal structure of multimedia containers in the case of the AVI format, the study of MP4, 3GP and MOV containers is almost non-existent.
One of the first works where an analysis of the structures of the videos is made in detail in [19] where AVI and MP4-like (MOV, 3GP, MP4) video streams of mobile phones and digital cameras are analyzed in detail. The authors use customized parsers to extract all file format structures of videos from overall 19 digital camera models, 14 mobile phone models, and 6 video editing toolboxes. They report considerable differences in the choice of container formats, audio and 36364 VOLUME 8, 2020 video compression algorithms, acquisition parameters, and internal file structure. In combination, such characteristics can help to authenticate digital video files in forensic settings by distinguishing between original and post-processed videos, verifying the purported source of a file, or identifying the true acquisition device model or the processing software used for video processing. One of their main findings is that videos from digital cameras and mobile phones often employ different container formats and compression codecs. Mobile phones opt for sophisticated compression algorithms (MP4V, H.26x).
Most digital cameras in their test set prefer a combination of AVI containers and basic MJPEG compression. The structure of AVI and MP4-like containers is not strictly defined. They observed considerable differences both in the order and in the presence of individual data segments. AVI files often contain specific INFO lists or JUNK chunks. MP4-like files may employ various nonstandard Container atoms and different parametrizations of specific atom entries.
In [20] a method for unsupervised analysis of video file containers is introduced, and their authors present two main forensic applications of such method: the first one deals with video integrity verification, based on the dissimilarity between a reference and a query file container; the second one focuses on the identification and classification of the source device brand, based on the analysis of containers structure and content. They tested the effectiveness of both applications on a dataset composed by 578 videos taken with modern smartphones from major brands and models.They conclude that the proposed solution provides an extremely small computational cost as opposed to all available techniques based on the video stream analysis or manual inspection of file containers.
In [21] their authors investigate video content stored in Video Event Data Recorders (VEDRs). VEDRs are used as important evidence when certain events such as vehicle collisions occur. They investigate the video file structure characteristics for each type of video editing software that would leave traces from processing the video editing software. Because such traces are an inherent characteristic of each respective video editing software suite, they can detect the specific video editing software used to manipulate the video, in addition to whether the video was, indeed, manipulated. To evaluate the accuracy of their technique, they examined 296 unmodified Audio Video Interleave (AVI) video files. They performed this study using popular versions of video editing software. As a result, they found that the AVI data structures in modified video files appear consistently according to each video editing software suite. Each resulting data structure is unaffected by the original video file structure.

III. TECHNIQUE DESCRIPTION
The main objective of this work is to propose a technique that allows identifying the source camcorder that generated a digital video. The technique uses clustering algorithms to make the correct grouping of both, brand and model, and the digital video.
To explain the details of the technique, it has been divided into two subsections: first one is explained that it is a multimedia container and which are the best features to perform the identification of the device. In the second subsection, the clustering techniques used to make a correct grouping are described.

A. CONTAINER STRUCTURE ANALYSIS
The elementary structure of a video is the atom. The metadata, video and sound of a video are within them. Container atoms are hierarchical in nature. That is, an atom may contain other atoms, which may contain still others, and so on. The type of atom is specified by a 32-bit unsigned integer, typically interpreted as a four-character ASCII (American Standard Code for Information Interchange) code usually in lowercase letters. It should be noted that there is no rule regarding the Container atoms that must appear and their order, however, most follow a similar structure [22]. This algorithm has been used to extract information from Container atoms. This solution is capable of analyzing multiple information of any video format such as: MP4/H.264, MOV and 3GP video and be able to extract information from Container atoms.
The extraction of Container atoms consists of storing the labels, values and the hierarchy that exists between Container atoms. The process begins by obtaining the initial byte of the atom, size, and the type of the Container atom with a maximum length of 4 bytes formed as a string of characters (eg: fytp, moov, mdat, etc.). Next, the duplicity of Container atoms and the existence of child Container atoms are verified. Finally, a dictionary of a set of Container atoms and tags (Path-tag) with their respective values and orders of appearance is obtained. For a more in-depth study of Container atoms, see [19] and [22]. Table 2 shows the output that is obtained when the atom extraction algorithm is used. First atom found is the ''fytp'' as indicated by the specification [19]. As the Container atoms are organized hierarchically (ie./moov/), they in turn have child Container atoms (ie./moov/trak) and labels (ie./moov/mvhd/tkhd/flags) and this tags also contain values (ie./moov/mvhd/tkhd/version, value:0). In this proposal, in order to make the clustering of videos it has been taken into account that the videos are sets of elements that contain the following features: ''PathField'', ''PathFieldValue'', ''PathOrderField''and ''PathOrderField-Value''. ''PathField'' is defined as the union of the Path and Field tags separated by the character ('/'), ''PathFieldValue'' is defined as the union of the Path and Field tags separated by the '/' character and the Value tag separated by the '=' character is added, ''PathOrderField'' is defined as the union of the PathOrder tags and the Field tag separated by the '/' character and ''PathOrderFieldValue'' is the union of the PathOrder tags and the Field tag separator by the '/' character, then the Value tag separated by '=' is added. Table 3 shows an example of the values of these features for each row in Table 2.
The representation proposed in [20] only uses the PathOrderFieldValue universe, however all possible combinations of universes must be taken into account since good results are obtained as will be detailed later in the experimentation chapter of this article.

B. CLUSTERING TECHNIQUES
Clustering is an initial and fundamental step in data analysis. It is an unsupervised classification of patterns into groups or clusters. Intuitively, patterns within a valid cluster are more similar to each other and dissimilar when compared to a pattern belonging to other cluster. Clustering is useful in several fields such as statistics, machine learning, pattern analysis and many other fields. Clustering can be classified into five major types: Partitioned, Hierarchical, Density-Based, Grid-Based and Model-Based methods. In this work, two clustering algorithms have been used to classify videos, which are: a density algorithm called OPTICS and an hierarchical algorithm. Both are detailed in the subsequent sections.

1) DENSITY-BASED METHOD
This was introduced to discover clusters of arbitrary shape. It is based on the fact that within each cluster there is a typical density of points and this density is higher than outside the cluster. Outside points with lower density are recognized as noise points. One of the most commonly known algorithm in this category is, OPTICS: Ordering Points To Identify the Clustering Structure [23]. It was presented by Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel and J. Sander.
OPTICS computes an ordering of the points augmented by additional information, i.e. the reachability distance, representing the intrinsic hierarchical cluster structure. The result of OPTICS i.e. the cluster ordering, is displayed by the so-called reachability plots, which are 2D-plots generated as follows: the clustered objects are ordered along the x-axis according to the cluster ordering computed by OPTICS and the reachabilities assigned to each object are plotted along the abscissa [23]. The key idea of density-based clustering is that for each object of a cluster the neighborhood of a given radius ( ) has to contain at least a minimum number of objects (MinPts), i.e. the cardinality of the neighborhood has to exceed a threshold. The formal definitions for this notion of a clustering are introduced in [23] as: • Definition 1 (Directly Density-Reachable): Object p is directly density-reachable from object q with respect to and MinPts in a set of objects D if 1) p ∈ N (q) (being N (q) the subset of D contained in the -neighborhood of q). 2) Card(N (q)) ≥ MinPts (where Card(N ) denotes the cardinality of the set N ). The condition Card(N (q)) ≥ MinPts is called the core object condition. If this condition holds for an object p, then we call p a core object. Only from core objects, other objects can be directly density-reachable.
• Definition 2 (Density-Reachable): An object p is density-reachable from an object q with respect to and MinPts in the set of objects D if there is a chain of objects p l , . . . ,p m , p l = q, p n = p such that p i ∈ D and p i+1 is directly density-reachable from p i with respect to and MinPts. Density-reachability is the transitive hull of direct density This relation is not symmetric in general.
Only core objects can be mutually density-reachable.
• Definition 3 (Density-Connected): Object p is densityconnected to object q with respect to and MinPts in the set of objects D if there is an object o in D such that both p and q are density-reachable from o with respect to and MinPts in D. Density-connectivity is a symmetric relation. A density-based cluster is now defined as a set 36366 VOLUME 8, 2020 of density-connected objects which is maximal with respect to density-reachability and the noise is the set of objects not contained in any cluster [23].
• Definition 4 (Core-Distance of an Object p): Let p be an object from a database D, let be a distance value, let N , (p) be the -neighborhood of p, let MinPts be a natural number and let MinPts-distance(p) be the distance from p to its MinPts neighbor. Then, the core-distance of p is defined as core-distance , The core-distance of an object p is simply the smallest distance between p and an object in its -neighborhood such that p would be a core object with respect to if this neighbor is contained in N (p). Otherwise, the core-distance is undefined.

2) HIERARCHICAL METHOD
Hierarchical clustering techniques proceed by either a series of successive mergers or a series of successive divisions. In this way these methods can be classified into two principal groups: • Agglomerative hierarchical methods (bottom-up approach): They start from the individual elements and add them in groups.
• Divisive hierarchical methods (top-down approach): They start from the set of elements and divide it successively until to reach the individual elements. The agglomerative algorithms that are used always have the same structure and only differ in the way the distances between groups are calculated. Their structure is shown in Algorithm 1.

Algorithm 1 Agglomerative Algorithms
Start with N clusters, each containing a single entity and an N × N symmetric matrix of distances (or similarities) D = (d ik ); Select the two closest elements in the distance matrix and form with them a class; Replace the two elements used in (2) to define the class by a new element that represents the built class. The distances between this new element and the are calculated using one of the criteria discussed below; Go back to (2) and repeat (2) and (3) until we have all the elements grouped into a unique class; There are different criteria to calculate the distances between groups.The most common types are for example single linkage, complete linkage, weighted average, etc. [24]. In this work an agglomerative hierarchical clustering algorithm is used and the selected criteria is the weighted average. The results of both agglomerative and divisive methods can be displayed in a two-dimensional diagram known as a dendrogram that shows the mergers or divisions that have been made at successive levels. Once the dendrogram has been obtained, clusters must be extracted.

A. DATASETS
In order to carry out the experiments, two datasets have been used, specifically are: VISION dataset [25] and ACID dataset [4], because they are the most complete and current datasets available for forensic analysis in multimedia videos and they are the most recent in the literature. With both datasets we cover the largest number of digital videos available in our society, that is, videos from mobile devices, digital cameras and videos from the main messaging platforms (Whatsaap) and video sharing (Youtube). The VISION dataset is currently composed by 34427 images and 1914 videos, both in the native format and in their social version (Facebook, YouTube, and WhatsApp are considered), from 35 portable devices of 11 major brands. The video-ACID database contains over 12000 videos from 46 physical devices representing 36 unique camera models. Videos in this database are hand collected in a diversity of real-world scenarios are unedited and have known and trusted provenance. For this work, a subset of these videos have been used, those belonging to the eval subset. In our experimentation we have select two samples, one from VISION dataset (sample 1) and other from ACID dataset (sample 2). Tables 4 and 5 show a description of VISION and ACID dataset samples.

B. EXPERIMENTAL CONDITIONS
The following considerations have been taken into account to carry out the experiments: first of all, it is necessary to keep in mind that Field's tag is not always valid to identify the source because they have specific values that depend on the video itself, in the case of tags related to the creation date, duration, etc. The following Container atoms have been removed: modificationTime, creationTime, entryCount, sampleCount, freeSpace, duration. Secondly, as universe all possible representations of the labels of Container atoms have been defined, specifically: PathField, PathFieldValue, PathOrderField and PathOrderFieldValue. A summary of the experimental conditions is shown below in Table 6.

C. RESULTS
In order to choose the best representation of the dataset in different clusters, and the best metric, the Silhouette Coefficient has been used. The representation and measure with the highest Silhouette Coefficient will be the most likely to be correctly separated. The Silhouette Coefficient has been widely used in other multimedia forensic analysis works [26]- [28]. The Silhouette Coefficient measures the average distance from one point to all other points in the same group (cohesion a j ), on the other hand it also measures the VOLUME 8, 2020  average distance of a point from one of the groups to all other nearby groups (b j separation).
Definition 5: The Silhouette Coefficient is a measure of the consistency of the clusters. It measures both the cohesion  and separation of the clusters. Let C i , i = 1, . . . , k be the set of clusters (or a cluster configuration). For i ∈ C i , let be the mean distance between i and all other data points in the same cluster, where d(i, j) is the distance between i and j in the cluster C i , to be the smallest mean distance of i to all points in any other cluster, of which i is not a member. The silhouette coefficient is defined as: In Table 7 the maximum (for any metric) Silhouette Coefficient in each the two samples of study is shown. In addition, Tables 8 and 9 show the result of the 4 best metrics that have 36368 VOLUME 8, 2020 given good results in both datasets that have been calculated using the Silhouette Coefficient.

D. EVALUATION FOR CLUSTERING PERFORMANCE
Clustering comparison measures play an important role in cluster analysis. Numerous measures for comparing clusters have been proposed [29]. To measure the performance of the clustering, several metrics have been used to compare the predicted groups with the actual classes of the videos. In particular, the Homogeneity, Completeness and Rand Index (RI) scores metrics have been used. A clustering result satisfies homogeneity if all of its clusters contain only data points which are members of a single class. A clustering result satisfies completeness if all the data points that are members of a given class are elements of the same cluster. Both scores have positive values between 0.0 and 1.0.
The Rand Index (RI) computes a similarity measure between two clusters by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusters. The Rand Index is related with accuracy. The accuracy of the predicted partition to correctly bound two points in the same cluster, or not, depending on the real partition. This RI is normalized, with values on range [0, 1], however, does not meet the ''constant baseline'' property.

E. RESULTS OF HIERARCHICAL CLUSTER ALGORITHM
All executions have been complete with the different configurations, shown in Tables 8 and 9, for each of the datasets used in this work. Table 10 shows the summary of the experimental conditions that have been used in the Hierarchical cluster algorithm.

1) RESULTS FOR THE SAMPLE OF VISION DATASET
The maximum RI value for the original data set is with PathField universe, Euclidean metric and Threshold = 1.132 value, producing the clusters are shown in Figure 1 and Table 11 where it can be seen that the number of clusters that have been identified, are 17 of the 13 models available in the VISION dataset. Videos from the YouTube and WhatsApp platforms have been identified almost entirely in a cluster each. The Apple brand identifies several clusters having a different behavior than the other brands. Other devices from different brands, on the contrary, show no difference with this representation, as the Asus Zenfone, producing videos exactly like the ones from Huawei's Honor 5c and P8. It is always impossible to distinguish an Asus video from a Huawei one with this representation of the data, the OnePlus brand is also distinguishable, as the LG or the Wiko brands.
Detail of each of the models that make up the dataset can be seen in Figure 2 and Table 12. They show the result in the case of the models of each of the devices that belong to the dataset.
Finally, the RI value of this configuration is 0.8839, likewise, it can be seen that this configuration has a homogeneity of 0.9195 and an integrity of 0.7970 for VISION's dataset. Summary of these results for brand is shown in Table 13. Table 14 shows detail of the result grouped by model.

2) RESULTS FOR THE SAMPLE OF ACID DATASET
As it can be seen in Figure 3 and Table 15 there are 11 clusters of the 11 brands that belong to the dataset. In this case, unlike the VISION dataset, the Apple brand is correctly classified into a single cluster. The LG and Moto brands cannot be distinguished with this representation. As for digital cameras, it can be seen that the Canon and Olimpus brand are correctly classified, however the same is not true with the Kodak brand that cannot be distinguished from the Samsung brand. The performance execution is shown in Table 16. Figure 4 and Table 17 show the detail of the clusters generated by the models that belong to the sample of ACID dataset. Table 18 shows the result of the algorithm execution.  In the results shown above for this dataset and with the configuration selected, it can be seen that RI values are obtained for both the brand and the model greater than 0.80 specifically, 0.8128 and 0.8233 respectively. The homogeneity in the case of the brand is higher than in the case of the model because fewer clusters are obtained than models. The opposite occurs with the integrity that reaches 1.0 in the case of the model.
Finally , Tables 19 and 20 show the comparative results for both samples using the Hierarchical Clustering algorithm.

F. RESULTS OF OPTICS ALGORITHM
The all configurations shown in Table 8 and Table 9 had been used to run OPTICS. The remaining parameters are minPoints and epsilon, values. The first will be fixed to 5, as OPTICS main difference with Hierarchical Clustering is its ability to     ignore the noise. , however, has been varied. After several executions with different values of the parameter, it has been concluded that the value = 0.01 offers good results in both datasets. Table 21 shows the summary of the experimental conditions of the OPTICS algorithm.

1) RESULTS OF OPTICS ALGORITHM FOR THE SAMPLE OF VISION DATASET
As it can be seen in Figure 5 and Table 22 the algorithm has generated 25 clusters of the 13 brands that belong to the sample of VISION's dataset. As with the hierarchical clustering algorithm, the Apple brand needs several VOLUME 8, 2020     clusters to identify itself, the good news is that in those clusters there is no mix of another brand. Videos from YouTube or WhatsApp are mostly classified in a cluster by model. Therefore, the algorithm is capable of grouping native videos from mobile devices and also videos that have been downloaded from online platforms such as YouTube or WhatsApp. The result of the execution of the algorithm    can be seen in detail in Table 23. With the OPTICS algorithm the parameters RI and Homogeneity and integrity are very similar to the hierarchical algorithm. It can be concluded that the selected algorithm does not interfere with the identification but with the configuration selected in each algorithm. The detail by model is shown in Figure 6 and Table 24. The result of the execution of the algorithm can be seen in Table 25.

2) RESULTS OF OPTICS ALGORITHM FOR THE SAMPLE OF ACID DATASET
In Figure 7 and Table 26 they can be seen that the algorithm has originated 16 clusters of the 11 marks available in the sample of ACID dataset. The classification is correct both in videos originated by mobile devices and in videos generated by digital cameras. This algorithm has better results than the Hierarchical algorithm. The detail of the execution result can be seen in Table 27.    The classification by models can be visualized in Figure 8 and Table 28. In Table 29 the detail of the result of the execution group by model is shown.
Finally we show two tables for comparative purposes of the experiments using OPTICS algorithm. The results grouped by brand for the selected samples (sample of VISION dataset and sample of ACID dataset) can be seen in Table 30. On the other hand the comparative results grouped by model for the two selected samples can be seen in Table 31.

V. CONCLUSION
This work has shown how the information of the video files can be exploited to group videos by data source, without prior training of a classifier. In the literature currently available there is a great shortage in the investigation of the source of video acquisition that uses the structure of the video container to obtain the characteristics. An essential point of the proposed methodology has been the correct acquisition of data for further processing and processing. With a good preliminary acquisition, the subsequent treatment through the use of classification algorithms has been effective in determining through the use of Data Mining techniques the final clustering of the same. The proposed methodology has been validated through two sets of data to which it has been applied with the same selection of parameters in order to obtain comparable results. The data sets used have been obtained by sampling on the two most current databases in the literature. The databases contain videos from various sources: native videos from mobile devices, native videos from digital cameras and videos that have been downloaded from platforms such as WhatsApp and YouTube. It has been considered to obtain sufficiently significant samples to carry out the study. The proposed methodology is general enough to be able to apply it and adapt it to other types of data, as well as apply other classification techniques present in Multivariate Analysis (non-hierarchical classification techniques, use of methods based on statistical models [24], among others). As has been seen in the numerical results obtained from the samples, the proposed clustering algorithms have provided good results from the perspective of the classification. The usage of simple algorithms was also proven effective separating video files by brand. The results were positive, and an algorithm was proven to be able to correctly group videos in homogeneous clusters per brand, even if too many clusters appeared.