• ### 3DQoE-Oriented and Energy-Efficient 2D plus Depth based 3D Video Streaming over Centrally Controlled Networks

IP networks have become the dominant platform for video delivery. However, bandwidth-hungry video is pushing networks to their limits: costs are rising for the operators and the viewing experience is not always satisfactory for the users. When considering 3D video delivery, the previous problems are exacerbated because of the higher volume of data that must be communicated, and the difficulty in c... View full abstract»

• ### Field-of-Experts Filters Guided Tensor Completion

Most low-rank tensor approximations are NP-hard problems. In this paper, we introduce a novel concept: Field-of-Experts (FoE) filters guided tensor completion, which aims to integrate the strengths of the emerging tensor completion method and the conventional FoE filters. Specifically, the target image is con-volved by FoE filters to produce multi-view features as a high-order tensor, which captur... View full abstract»

• ### Exploiting Web Images for Video Highlight Detection with Triplet Deep Ranking

Highlight detection from videos has been widely studied due to the fast growth of video contents. However, most existing approaches to highlight detection, either hand-craft feature-based or deep learning-based, heavily rely on human-curated training data, which is very expensive to obtain and thus hinders the scalability to large datasets and unlabeled video categories. We observe that the largel... View full abstract»

• ### Query-adaptive Image Retrieval by Deep Weighted Hashing

Hashing methods have attracted much attention for large scale image retrieval. Some deep hashing methods have achieved promising results by taking advantage of the strong representation power of deep networks recently. However, existing deep hashing methods treat all hash bits equally. On one hand, a large number of images share the same distance to a query image due to the discrete Hamming distan... View full abstract»

• ### Exploiting Pseudo-Quadtree Structure for Accelerating HEVC Spatial Resolution Downscaling Transcoder

In this paper, a novel method to accelerate the spatial resolution downscaling transcoding operation for High Efficiency Video Coding (HEVC) is proposed. The proposed transcoder first extracts the information about the coding unit (CU) structure during the decoding process and analyzes it to construct a pseudo-quadtree of the target resolution. By utilizing the constructed pseudo-quadtree, the enc... View full abstract»

• ### Dual Graph Regularized Discriminative Multi-task Tracker

Multi-task and low rank learning methods have attracted increasing attention for visual tracking. However, most trackers only focus on learning appearance subspace basis or the sparse low rankness of representation, thus do not make full use of the structure information among and inside target candidates (or samples). In this work, we propose a dual graph regularized discriminative low rank learni... View full abstract»

• ### Fusing Geometric Features for Skeleton-Based Action Recognition using Multilayer LSTM Networks

Recent skeleton-based action recognition approaches achieve great improvement by using RNN models. Currently these approaches build an end-to-end network from coordinates of joints to class categories and improve accuracy by extending RNN to spatial domains. First, while such well-designed models and optimization strategies explore relations between different parts directly from joint coordinates,... View full abstract»

• ### Real-Time Long-Term Tracking with Prediction-Detection-Correction

Real-time long-term visual tracking is one of the most challenging problems in computer vision due to various factors such as occlusion and motion ambiguity. To achieve robust long-term tracking, most state-of-the-art methods typically construct an online detector in each frame. However, they fail to achieve real-time performance due to high computational complexity. In this paper, we propose a no... View full abstract»

• ### Visual Sentiment Prediction based on Automatic Discovery of Affective Regions

Automatic assessment of sentiment from visual content has gained considerable attention with the increasing tendency of expressing opinions via images and videos online. This paper investigates the problem of visual sentiment analysis, which involves a high-level abstraction in the recognition process. While most of the current works focus on improving holistic representations, we aim to utilize t... View full abstract»

• ### Robust 3D Human Detection in Complex Environments with Depth Camera

Human detection has received great attention during the past few decades, which is yet still a challenging problem. In this paper, we focus on the problem of 3D human detection, i.e., finding the human bodies and determining their 3D coordinates in complex three-dimensional space using depth data only. Since the traditional sliding window-based approaches for target localization are time-consuming... View full abstract»

• ### Robust Multiview Synthesis For Wide-Baseline Camera Arrays

In many advanced multimedia systems, multiview content can offer more immersion compared to classical stereoscopy. The feeling of immersiveness is increased substantially by offering motion-parallax, as well as stereopsis. This drives both the so-called free-navigation and super-multiview technologies. However, it is currently still challenging to acquire, store, process and transmit this type of ... View full abstract»

• ### Scene Text Detection using Superpixel based Stroke Feature Transform and Deep Learning based Region Classification

Scene text detection is a crucial step in end-to-end scene text recognition, a greatly challenging problem in computer vision. This paper proposes a novel scene text detection method that involves superpixel based stroke feature transform (SSFT) and deep learning based region classification (DLRC). The SSFT is developed for candidate character region (CCR) extraction, which consists in partitionin... View full abstract»

• ### Detecting Socially Significant Music Events using Temporally Noisy Labels

In this paper, we focus on event detection over the timeline of a music track. Such technology is motivated by the need for innovative applications such as searching, non-linearaccess and recommendation. Event detection over the timeline requires time-code level labels in order to train machine learning dels. We use timed comments from SoundCloud, a modern social music sharing platform, to obtain ... View full abstract»

• ### A Novel No-Reference Metric for Estimating the Impact of Frame Freezing Artifacts on Perceptual Quality of Streamed Videos

Online monitoring of multimedia networks is required to ensure seamless and ubiquitous delivery of services to the end users. Quality of multimedia content, such as video streams, often gets degraded due to network losses such packet loss. Frame freezing artifacts are introduced in a video stream when packet loss or packet delay takes place. Estimating the perceptual impact of these artifacts on q... View full abstract»

• ### Image Style Classification based on Learnt Deep Correlation Features

This paper presents a comprehensive study of deep correlation features on image style classification. Inspired by that correlation between feature maps can effectively describe image texture, we design various correlations and transform them into style vectors, and investigate classification performance brought by different variants. In addition to intra-layer correlation, inter-layer correlation ... View full abstract»

• ### MixedEmotions: An Open-Source Toolbox for Multi-Modal Emotion Analysis

Recently, there is an increasing tendency to embed the functionality of recognizing emotions from the user generated contents, to infer richer profile about the users or contents, that can be used for various automated systems such as call-center operations, recommendations, and assistive technologies. However, to date, adding this functionality was a tedious, costly, and time consuming effort, an... View full abstract»

• ### Quality-guided Fusion-based Co-saliency Estimation for Image Co-segmentation and Co-localization

Despite the advantage of exploiting inter-image information by performing joint processing of images for co-saliency, co-segmentation or co-localization, it introduces a few drawbacks: (i) its necessity in scenarios where the joint processing might not perform better than individual image processing, (ii) increased complexity over individual image processing and (iii) complex parameter tuning. In ... View full abstract»

• ### Depth Adaptive Deep Neural Network for Semantic Segmentation

In this work, we present the depth-adaptive deep neural network using a depth map for semantic segmentation. Typical deep neural networks receive inputs at the predetermined locations regardless of the distance from the camera. This fixed receptive field presents a challenge to generalize the features of objects at various distances in neural networks. Specifically, the predetermined receptive fie... View full abstract»

• ### Joint Optimization of Radio and Virtual Machine Resources with Uncertain User Demands in Mobile Cloud Computing

The resource reservation is one of the most key techniques to ensure the Quality-of-Service (QoS) of a multimedia application. In mobile cloud computing (MCC), The resource reservation and allocation (RRA) in advance can also significantly reduce the total provisioning cost of cloud service providers. However, the uncertain features of mobile users' demands for resources make RRA challengeable. In... View full abstract»

• ### Group Sensitive Triplet Embedding for Vehicle Re-identification

The widespread use of surveillance cameras towards smart and safe city poses the critical but challenging problem of vehicle re-identification (Re-ID). The state-of-the-art research work performed vehicle re-identification relying on deep metric learning with a triplet network. However, most of existing methods basically ignore the impact of intra-class variance incorporated embedding on the perfo... View full abstract»

• ### Cross-Media Similarity Evaluation for Web Image Retrieval in the Wild

In order to retrieve unlabeled images by textual queries, cross-media similarity computation is a key ingredient. Although novel methods are continuously introduced, little has been done to evaluate these methods together with large-scale query log analysis. Consequently, how far have these methods brought us in answering real-user queries is unclear. Given baseline methods that use relatively sim... View full abstract»

• ### A novel digital watermarking based on General non-negative matrix factorization

In this paper, we propose a novel General-NMF (General Non-negative Matrix Factorization) based digital watermarking scheme for copyright protection and integrity authentication of digital media content. Specifically, the proposed General-NMF algorithm is able to factorize a matrix C ${in}R{^{s times t}_{+}}$ decomposed into a basis matrix A \${in}R{^{m t... View full abstract»

• ### An Audio-Visual System for Object-Based Audio: From Recording to Listening

Object-based audio is an emerging representation for audio content, where content is represented in a reproduction-format-agnostic way and thus produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This article introduces an end-to-end object-based spatial audio pipeline, from sound recor... View full abstract»

• ### Mobile Instant Video Clip Sharing with Screen Scrolling: Measurement and Enhancement

The rapid development and penetration of mobile social networking have enabled new-generation video sharing services that use smart mobile terminals to instantly capture and share short video clips (usually of several seconds). The instant video clips are then directly consumed at smart-terminals with specially designed mobile interfaces and operations. A number of mobile apps, e.g., Twitter's Vin... View full abstract»

• ### Summarization of User-Generated Sports Video by Using Deep Action Recognition Features

Automatically generating a summary of sports video poses the challenge of detecting interesting moments, or highlights, of a game. Traditional sports video summarization methods leverage editing conventions of broadcast sports video that facilitate the extraction of high-level semantics. However, user-generated videos are not edited, and thus traditional methods are not suitable to generate a summ... View full abstract»

