• ### Small Object Sensitive Segmentation of Urban Street Scene With Spatial Adjacency Between Object Classes

Publication Year: 2019, Page(s):2643 - 2653
Recent advancements in deep learning have shown an exciting promise in the urban street scene segmentation. However, many objects, such as poles and sign symbols, are relatively small, and they usually cannot be accurately segmented, since the larger objects usually contribute more to the segmentation loss. In this paper, we propose a new boundary-based metric that measures the level of spatial ad... View full abstract»

• ### User-Ranking Video Summarization With Multi-Stage Spatio–Temporal Representation

Publication Year: 2019, Page(s):2654 - 2664
Video summarization is a challenging task, mainly due to the difficulties in learning complicated semantic structural relations between videos and summaries. In this paper, we present a novel supervised video summarization scheme based on three-stage deep neural networks. The scheme takes a divide-and-conquer strategy to resolve the complicated task of 3D video summarization into a set of easy and... View full abstract»

• ### Local Semantic-Aware Deep Hashing With Hamming-Isometric Quantization

Publication Year: 2019, Page(s):2665 - 2679
Hashing is a promising approach for compact storage and efficient retrieval of big data. Compared to the conventional hashing methods using handcrafted features, emerging deep hashing approaches employ deep neural networks to learn both feature representations and hash functions, which have been proven to be more powerful and robust in real-world applications. Currently, most of the existing deep ... View full abstract»

• ### Detecting and Mapping Video Impairments

Publication Year: 2019, Page(s):2680 - 2691
Automatically identifying the locations and severities of video artifacts without the advantage of an original reference video is a difficult task. We present a novel approach to conducting no-reference artifact detection in digital videos, implemented as an efficient and unique dual-path (parallel) excitatory/inhibitory neural network that uses a simple discrimination rule to define a bank of acc... View full abstract»

• ### Structure-Texture Image Decomposition Using Deep Variational Priors

Publication Year: 2019, Page(s):2692 - 2704
Most variational formulations for structure-texture image decomposition force the structure images to have small norm in some functional spaces and to share a common notion of edges, i.e., large-gradients or large-intensity differences. However, such a definition makes it difficult to distinguish structure edges from oscillations that have fine spatial scale but high contrast. In this paper, we in... View full abstract»

• ### Automated Method for Retinal Artery/Vein Separation via Graph Search Metaheuristic Approach

Publication Year: 2019, Page(s):2705 - 2718
Separation of the vascular tree into arteries and veins is a fundamental prerequisite in the automatic diagnosis of retinal biomarkers associated with systemic and neurodegenerative diseases. In this paper, we present a novel graph search metaheuristic approach for automatic separation of arteries/veins (A/V) from color fundus images. Our method exploits local information to disentangle the comple... View full abstract»

• ### Weighted Tensor Rank-1 Decomposition for Nonlocal Image Denoising

Publication Year: 2019, Page(s):2719 - 2730
Natural images often contain patches with high similarity. In this paper, to effectively utilize the local and nonlocal self-similarity for low-rank models, we propose a novel weighted tensor rank-1 decomposition method (termed as WTR1) for nonlocal image denoising. Although the low-rank approximation problem has been well studied for matrices, it remains elusive of the theoretical extension to te... View full abstract»

• ### Fractional-Pel Accurate Motion-Adaptive Transforms

Publication Year: 2019, Page(s):2731 - 2742
Fractional-pel accurate motion is widely used in video coding. For subband coding, fractional-pel accuracy is challenging since it is difficult to handle the complex motion field with temporal transforms. In our previous work, we designed integer accurate motion-adaptive transforms (MAT) which can transform integer accurate motion-connected coefficients. In this paper, we extend the integer MAT to... View full abstract»

• ### Topic-Oriented Image Captioning Based on Order-Embedding

Publication Year: 2019, Page(s):2743 - 2754
We present an image captioning framework that generates captions under a given topic. The topic candidates are extracted from the caption corpus. A given image’s topics are then selected from these candidates by a CNN-based multi-label classifier. The input to the caption generation model is an image-topic pair, and the output is a caption of the image. For this purpose, a cross-modal embedding me... View full abstract»

• ### Correntropy-Induced Robust Low-Rank Hypergraph

Publication Year: 2019, Page(s):2755 - 2769
Hypergraph learning has been widely exploited in various image processing applications, due to its advantages in modeling the high-order information. Its efficacy highly depends on building an informative hypergraph structure to accurately and robustly formulate the underlying data correlation. However, the existing hypergraph learning methods are sensitive to non-Gaussian noise, which hurts the c... View full abstract»

• ### Collective Reconstructive Embeddings for Cross-Modal Hashing

Publication Year: 2019, Page(s):2770 - 2784
In this paper, we study the problem of cross-modal retrieval by hashing-based approximate nearest neighbor search techniques. Most existing cross-modal hashing works mainly address the issue of multi-modal integration complexity using the same mapping and similarity calculation for data from different media types. Nonetheless, this may cause information loss during the mapping process due to overl... View full abstract»

• ### A Robust Group-Sparse Representation Variational Method With Applications to Face Recognition

Publication Year: 2019, Page(s):2785 - 2798
In this paper, we propose a Group-Sparse Representation-based method with applications to Face Recognition (GSR-FR). The novel sparse representation variational model includes a non-convex sparsity-inducing penalty and a robust non-convex loss function. The penalty encourages group sparsity by using an approximation of the $\ell _{0}$ View full abstract»

• ### Action-Stage Emphasized Spatiotemporal VLAD for Video Action Recognition

Publication Year: 2019, Page(s):2799 - 2812
Despite outstanding performance in image recognition, convolutional neural networks (CNNs) do not yet achieve the same impressive results on action recognition in videos. This is partially due to the inability of CNN for modeling long-range temporal structures especially those involving individual action stages that are critical to human action recognition. In this paper, we propose a novel action... View full abstract»

• ### Focal Boundary Guided Salient Object Detection

Publication Year: 2019, Page(s):2813 - 2824
The performance of salient object segmentation has been significantly advanced by using the deep convolutional networks. However, these networks often produce blob-like saliency maps without accurate object boundaries. This is caused by the limited spatial resolution of their feature maps after multiple pooling operations and might hinder downstream applications that require precise object shapes.... View full abstract»

• ### Three-Stream Attention-Aware Network for RGB-D Salient Object Detection

Publication Year: 2019, Page(s):2825 - 2835
Previous RGB-D fusion systems based on convolutional neural networks typically employ a two-stream architecture, in which RGB and depth inputs are learned independently. The multi-modal fusion stage is typically performed by concatenating the deep features from each stream in the inference process. The traditional two-stream architecture might experience insufficient multi-modal fusion due to two ... View full abstract»

• ### Class Agnostic Image Common Object Detection

Publication Year: 2019, Page(s):2836 - 2846
Learning similarity of two images is an important problem in computer vision and has many potential applications. Most of the previous works focus on generating image similarities in three aspects: global feature distance computing, local feature matching, and image concepts comparison. However, the task of directly detecting the class agnostic common objects from two images has not been studied b... View full abstract»

• ### Deep Reconstruction of Least Significant Bits for Bit-Depth Expansion

Publication Year: 2019, Page(s):2847 - 2859
Bit-depth expansion (BDE) is important for displaying a low bit-depth image in a high bit-depth monitor. Current BDE algorithms often utilize traditional methods to fill the missing least significant bits and suffer from multiple kinds of perceivable artifacts. In this paper, we present a deep residual network-based method for BDE. Based on the different properties of flat and non-flat areas, two ... View full abstract»

• ### Deep Representation Learning With Part Loss for Person Re-Identification

Publication Year: 2019, Page(s):2860 - 2871
Learning discriminative representations for unseen person images is critical for person re-identification (ReID). Most of the current approaches learn deep representations in classification tasks, which essentially minimize the empirical classification risk on the training set. As shown in our experiments, such representations easily get over-fitted on a discriminative human body part on the train... View full abstract»

