<![CDATA[ IEEE Transactions on Image Processing - new TOC ]]>
http://ieeexplore.ieee.org
TOC Alert for Publication# 83 2020January 27<![CDATA[Neural Multimodal Cooperative Learning Toward Micro-Video Understanding]]>291143359<![CDATA[A Comparative Review of Recent Kinect-Based Action Recognition Algorithms]]>2915283903<![CDATA[Fast Collective Activity Recognition Under Weak Supervision]]>2929434884<![CDATA[Hyperspectral Images Denoising via Nonconvex Regularized Low-Rank and Sparse Matrix Decomposition]]>2944564327<![CDATA[RESLS: Region and Edge Synergetic Level Set Framework for Image Segmentation]]>2957713270<![CDATA[FAMED-Net: A Fast and Accurate Multi-Scale End-to-End Dehazing Network]]>2972845426<![CDATA[4D Light Field Superpixel and Segmentation]]>29859913812<![CDATA[Jointly Using Low-Rank and Sparsity Priors for Sparse Inverse Synthetic Aperture Radar Imaging]]>291001153608<![CDATA[Super-Resolution for Hyperspectral and Multispectral Image Fusion Accounting for Seasonal Spectral Variability]]>291161273692<![CDATA[Weaklier Supervised Semantic Segmentation With Only One Image Level Annotation per Category]]>291281414807<![CDATA[Multi-Channel and Multi-Model-Based Autoencoding Prior for Grayscale Image Restoration]]>2914215621547<![CDATA[Multipatch Unbiased Distance Non-Local Adaptive Means With Wavelet Shrinkage]]>ij only once and keep it unchanged during the subsequent denoising iterations, or use only the structure information of the denoised image to update weight ω_{ij}. These may lead to the limited denoising performance. To address these issues, this paper proposes the non-local adaptive means (NLAM) for image denoising. NLAM treats weight ω_{ij} as an optimization variable and iteratively updates its value. We then introduce three unbiased distances, namely, pixel-pixel, patch- patch, and coupled unbiased distances. These unbiased distances are more robust to measure the image pixel/patch similarity than Euclidean distance. Using the coupled unbiased distance, we propose the unbiased distance non-local adaptive means (UD-NLAM). Because UD-NLAM uses only a single patch size to compute weight ω_{ij}, we introduce multipatch UD-NLAM (MUD-NLAM) to adapt different noise levels. To further improve denoising performance, we then propose a new denoising method called MUD-NLAM with wavelet shrinkage (MUD-NLAM-WS). Experimental results show that the proposed NLAM, UD-NLAM, and MUD-NLAM outperform existing NLM methods, and MUDNLAM-WS achieves a better performance than the state-of-theart denoising methods.]]>291571697519<![CDATA[Online-Learning-Based Bayesian Decision Rule for Fast Intra Mode and CU Partitioning Algorithm in HEVC Screen Content Coding]]>291701854503<![CDATA[Constrained Discriminative Projection Learning for Image Classification]]>291861983536<![CDATA[A Graph Embedding Framework for Maximum Mean Discrepancy-Based Domain Adaptation Algorithms]]>291992133913<![CDATA[Temporally Coherent Video Harmonization Using Adversarial Networks]]>292142244335<![CDATA[Coarse-to-Fine Semantic Segmentation From Image-Level Labels]]>292252368769<![CDATA[Unsupervised Online Video Object Segmentation With Motion Property Understanding]]>292372496348<![CDATA[Convolutional Deblurring for Natural Imaging]]>292502645317<![CDATA[Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition]]>292652765064<![CDATA[Model-Free Tracker for Multiple Objects Using Joint Appearance and Motion Inference]]>292772883619<![CDATA[Advanced 3D Motion Prediction for Video-Based Dynamic Point Cloud Compression]]>292893024039<![CDATA[Deep Neural Network Regression for Automated Retinal Layer Segmentation in Optical Coherence Tomography Images]]>293033122089<![CDATA[Geometry Coding for Dynamic Voxelized Point Clouds Using Octrees and Multiple Contexts]]>293133222415<![CDATA[HA-CCN: Hierarchical Attention-Based Crowd Counting Network]]>2932333510416<![CDATA[Morphology-Based Noise Reduction: Structural Variation and Thresholding in the Bitonic Filter]]>293363506241<![CDATA[Inpainting Versus Denoising for Dose Reduction in Scanning-Beam Microscopies]]>293513591588<![CDATA[Deep Salient Object Detection With Contextual Information Guidance]]>293603743599<![CDATA[Image Compressed Sensing Using Convolutional Neural Network]]>2937538813824<![CDATA[Exemplar-Based Recursive Instance Segmentation With Application to Plant Image Analysis]]>293894043988<![CDATA[Supervised Deep Sparse Coding Networks for Image Classification]]>294054186075<![CDATA[Graph Transform Optimization With Application to Image Compression]]>294194322872<![CDATA[Wavelet-Based Spectral–Spatial Transforms for CFA-Sampled Raw Camera Image Compression]]>294334445154<![CDATA[Enhanced Fuzzy-Based Local Information Algorithm for Sonar Image Segmentation]]>294454603137<![CDATA[High-Resolution Encoder–Decoder Networks for Low-Contrast Medical Image Segmentation]]>294614757276<![CDATA[Learning Rich Part Hierarchies With Progressive Attention Networks for Fine-Grained Image Recognition]]>294764884932<![CDATA[Holistic Multi-Modal Memory Network for Movie Question Answering]]>294894994769<![CDATA[Weighted Guided Image Filtering With Steering Kernel]]>295005084015<![CDATA[Sparse Representation-Based Video Quality Assessment for Synthesized 3D Videos]]>295095249461<![CDATA[Image Representations With Spatial Object-to-Object Relations for RGB-D Scene Recognition]]>295255373107<![CDATA[Semi-Supervised Deep Coupled Ensemble Learning With Classification Landmark Exploration]]>295385502202<![CDATA[Summit Navigator: A Novel Approach for Local Maxima Extraction]]>295515647583<![CDATA[Hyperspectral Image Denoising via Matrix Factorization and Deep Prior Regularization]]>295655784433<![CDATA[Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification]]>295795905937<![CDATA[Unambiguous Scene Text Segmentation With Referring Expression Comprehension]]>295916015700<![CDATA[Geometry-Aware Graph Transforms for Light Field Compact Representation]]>296026164328<![CDATA[Multi-View Image Classification With Visual, Semantic and View Consistency]]>296176272140<![CDATA[RYF-Net: Deep Fusion Network for Single Image Haze Removal]]>296286405268<![CDATA[Deep Learning-Based Picture-Wise Just Noticeable Distortion Prediction Model for Image Compression]]>296416564569<![CDATA[Multi-Task Deep Relative Attribute Learning for Visual Urban Perception]]>296576695272<![CDATA[Compressive Color Pattern Detection Using Partial Orthogonal Circulant Sensing Matrix]]>296706781192<![CDATA[Context-Adaptive Neural Network-Based Prediction for Image Compression]]>296796934153<![CDATA[Re-Caption: Saliency-Enhanced Image Captioning Through Two-Phase Learning]]>296947093700<![CDATA[Blind Deblurring of Text Images Using a Text-Specific Hybrid Dictionary]]>297107238225<![CDATA[Single-Perspective Warps in Natural Image Stitching]]>2972473514205<![CDATA[Robust Feature Matching Using Spatial Clustering With Heavy Outliers]]>297367464605<![CDATA[A Novel Key-Point Detector Based on Sparse Coding]]>297477561952<![CDATA[Multimodal Change Detection in Remote Sensing Images Using an Unsupervised Pixel Pairwise-Based Markov Random Field Model]]>297577672446<![CDATA[Arc-Support Line Segments Revisited: An Efficient High-Quality Ellipse Detection]]>2976878112889<![CDATA[Degraded Image Semantic Segmentation With Dense-Gram Networks]]>297827955863<![CDATA[3D Point Cloud Attribute Compression Using Geometry-Guided Sparse Representation]]>0-norm regularized optimization problem. Also, an inter-block prediction scheme is applied to remove the redundancy between blocks. Finally, by quantitatively analyzing the characteristics of the resulting transform coefficients by GSR, an effective entropy coding strategy that is tailored to our GSR is developed to generate the bitstream. Experimental results over various benchmark datasets show that the proposed compression scheme is able to achieve better rate-distortion performance and visual quality, compared with state-of-the-art methods.]]>297968083677<![CDATA[Segmenting Cellular Retinal Images by Optimizing Super-Pixels, Multi-Level Modularity, and Cell Boundary Representation]]>298098184517<![CDATA[Robust Seismic Image Interpolation With Mathematical Morphological Constraint]]>2981982933555<![CDATA[Multiresolution Localization With Temporal Scanning for Super-Resolution Diffuse Optical Imaging of Fluorescence]]>298308422487<![CDATA[Category-Aware Spatial Constraint for Weakly Supervised Detection]]>2984385815006<![CDATA[A Framework of Reversible Color-to-Grayscale Conversion With Watermarking Feature]]>298598707426<![CDATA[Neural Compatibility Modeling With Probabilistic Knowledge Distillation]]>298718823335<![CDATA[Exploiting Related and Unrelated Tasks for Hierarchical Metric Learning and Image Classification]]>298838963604<![CDATA[Pothole Detection Based on Disparity Transformation and Road Surface Modeling]]>298979089705<![CDATA[Structure-Preserving Neural Style Transfer]]>299099206799<![CDATA[An Efficient Algorithm for the Piecewise Affine-Linear Mumford-Shah Model Based on a Taylor Jet Splitting]]>299219332269<![CDATA[Online Multi-Expert Learning for Visual Tracking]]>299349464256<![CDATA[Burst Ranking for Blind Multi-Image Deblurring]]>299479589555<![CDATA[Learning Latent Global Network for Skeleton-Based Action Prediction]]>299599703314<![CDATA[Multi-View Video Synopsis via Simultaneous Object-Shifting and View-Switching Optimization]]>299719857033<![CDATA[Progressive Object Transfer Detection]]>2998610005924<![CDATA[Cross-View Gait Recognition by Discriminative Feature Learning]]>29100110153339<![CDATA[Deep Cascade Model-Based Face Recognition: When Deep-Layered Learning Meets Small Data]]>29101610294629<![CDATA[ME R-CNN: Multi-Expert R-CNN for Object Detection]]>29103010444086<![CDATA[Ring Difference Filter for Fast and Noise Robust Depth From Focus]]>291045106010400<![CDATA[EleAtt-RNN: Adding Attentiveness to Neurons in Recurrent Neural Networks]]>29106110733459<![CDATA[Homologous Component Analysis for Domain Adaptation]]>29107410893362<![CDATA[Improved Robust Video Saliency Detection Based on Long-Term Spatial-Temporal Information]]>291090110021981<![CDATA[Multiple Cycle-in-Cycle Generative Adversarial Networks for Unsupervised Image Super-Resolution]]>29110111125223<![CDATA[Video Saliency Prediction Using Spatiotemporal Residual Attentive Networks]]>29111311265393<![CDATA[Perceptual Evaluation for Multi-Exposure Image Fusion of Dynamic Scenes]]>29112711385159<![CDATA[From Pairwise Comparisons and Rating to a Unified Quality Scale]]>29113911514220<![CDATA[Color Control Functions for Multiprimary Displays—I: Robustness Analysis and Optimization Formulations]]>29115211631231<![CDATA[Color Control Functions for Multiprimary Displays—II: Variational Robustness Optimization]]>29116411761907<![CDATA[Edge-Sensitive Human Cutout With Hierarchical Granularity and Loopy Matting Guidance]]>29117711914670<![CDATA[Learning Nonclassical Receptive Field Modulation for Contour Detection]]>29119212034102<![CDATA[Compositional Attention Networks With Two-Stream Fusion for Video Question Answering]]>291204121812648<![CDATA[Noise-Robust Iterative Back-Projection]]>29121912327594<![CDATA[Few-Shot Deep Adversarial Learning for Video-Based Person Re-Identification]]>29123312452324<![CDATA[Variational-Based Mixed Noise Removal With CNN Deep Learning Regularization]]>29124612584915<![CDATA[Path-Based Dictionary Augmentation: A Framework for Improving <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Sparse Image Processing]]>29125912702900<![CDATA[Scalable Deep Hashing for Large-Scale Social Image Retrieval]]>29127112843096<![CDATA[Deep Tone Mapping Operator for High Dynamic Range Images]]>29128512985027<![CDATA[Deep Heterogeneous Hashing for Face Video Retrieval]]>29129913124718<![CDATA[Tunable VVC Frame Partitioning Based on Lightweight Machine Learning]]>29131313283226<![CDATA[Domain-Transformable Sparse Representation for Anomaly Detection in Moving-Camera Videos]]>29132913433893<![CDATA[Collective Affinity Learning for Partial Cross-Modal Hashing]]>29134413552349<![CDATA[Towards Weakly-Supervised Focus Region Detection via Recurrent Constraint Network]]>29135613674852<![CDATA[Deep MR Brain Image Super-Resolution Using Spatio-Structural Priors]]>291368138310113<![CDATA[50 FPS Object-Level Saliency Detection via Maximally Stable Region]]>29138413964367<![CDATA[Anisotropic Guided Filtering]]>29139714125792<![CDATA[Efficient Single-Stage Pedestrian Detector by Asymptotic Localization Fitting and Multi-Scale Context Encoding]]>29141314256783<![CDATA[Low-Rank Quaternion Approximation for Color Image Processing]]>29142614394541<![CDATA[Imaging With SPADs and DMDs: Seeing Through Diffraction-Photons]]>29144014492087<![CDATA[LCSCNet: Linear Compressing-Based Skip-Connecting Network for Image Super-Resolution]]>29145014644041<![CDATA[Collision-Free Video Synopsis Incorporating Object Speed and Size Changes]]>29146514784298<![CDATA[DP-Siam: Dynamic Policy Siamese Network for Robust Object Tracking]]>29147914925240<![CDATA[A Biological Vision Inspired Framework for Image Enhancement in Poor Visibility Conditions]]>29149315064892<![CDATA[Cascaded Face Sketch Synthesis Under Various Illuminations]]>29150715219172<![CDATA[Deep Reinforcement Learning for Weak Human Activity Localization]]>29152215353984<![CDATA[Local Regression Ranking for Saliency Detection]]>29153615477417<![CDATA[A Unified Probabilistic Formulation of Image Aesthetic Assessment]]>29154815615235<![CDATA[Semi-Supervised Human Detection via Region Proposal Networks Aided by Verification]]>29156215743348<![CDATA[Hierarchical Recurrent Deep Fusion Using Adaptive Clip Summarization for Sign Language Translation]]>29157515902721<![CDATA[Accurate Pedestrian Detection by Human Pose Regression]]>29159116053210<![CDATA[Unsupervised Monocular Depth Estimation From Light Field Image]]>29160616175840<![CDATA[LEARNet: Dynamic Imaging Network for Micro Expression Recognition]]>29161816273397<![CDATA[Binocular Light-Field: Imaging Theory and Occlusion-Robust Depth Perception Application]]>29162816406052<![CDATA[Image Processing Methods for Coronal Hole Segmentation, Matching, and Map Classification]]>29164116534574<![CDATA[Exploiting Block-Sparsity for Hyperspectral Kronecker Compressive Sensing: A Tensor-Based Bayesian Method]]>29165416683536<![CDATA[Receptive Field Size Versus Model Depth for Single Image Super-Resolution]]>29166916824564<![CDATA[Deep Coupled ISTA Network for Multi-Modal Image Super-Resolution]]>29168316988526<![CDATA[Heterogeneous Multireference Alignment for Images With Application to 2D Classification in Single Particle Reconstruction]]>29169917102044<![CDATA[Image Super-Resolution as a Defense Against Adversarial Attacks]]>29171117244221<![CDATA[Deep Non-Local Kalman Network for Video Compression Artifact Reduction]]>29172517373526<![CDATA[Aggregation Signature for Small Object Tracking]]>29173817476745<![CDATA[Joint Stereo Video Deblurring, Scene Flow Estimation and Moving Object Segmentation]]>29174817614735<![CDATA[Deep Spatial and Temporal Network for Robust Visual Object Tracking]]>29176217754080<![CDATA[Hazy Image Decolorization With Color Contrast Restoration]]>291776178726121<![CDATA[Learning Interleaved Cascade of Shrinkage Fields for Joint Image Dehazing and Denoising]]>29178818018731<![CDATA[Blind Quality Metric of DIBR-Synthesized Images in the Discrete Wavelet Transform Domain]]>29180218144108<![CDATA[Skeleton Filter: A Self-Symmetric Filter for Skeletonization in Noisy Text Images]]>29181518264529<![CDATA[Stereoscopic Image Generation From Light Field With Disparity Scaling and Super-Resolution]]>29182718427991<![CDATA[Efficient Evaluation of Image Quality via Deep-Learning Approximation of Perceptual Metrics]]>29184318555166<![CDATA[Mumford–Shah Loss Functional for Image Segmentation With Deep Learning]]>29185618664266<![CDATA[Attention-Aware Multi-Task Convolutional Neural Networks]]>29186718785201<![CDATA[Saliency Detection via Depth-Induced Cellular Automata on Light Field]]>29187918899610<![CDATA[Graph Sequence Recurrent Neural Network for Vision-Based Freezing of Gait Detection]]>29189019013572<![CDATA[Spatiotemporal Knowledge Distillation for Efficient Estimation of Aerial Video Saliency]]>29190219145380<![CDATA[Self-Motion-Assisted Tensor Completion Method for Background Initialization in Complex Video Sequences]]>29191519283610<![CDATA[Parallax Tolerant Light Field Stitching for Hand-Held Plenoptic Cameras]]>29192919434261<![CDATA[Needles in a Haystack: Tracking City-Scale Moving Vehicles From Continuously Moving Satellite]]>29194419574743<![CDATA[Deep Unbiased Embedding Transfer for Zero-Shot Learning]]>29195819713428<![CDATA[Attended End-to-End Architecture for Age Estimation From Facial Expression Videos]]>29197219842783<![CDATA[Fast Single Image Dehazing Using Saturation Based Transmission Map Estimation]]>29198519986856<![CDATA[Spaghetti Labeling: Directed Acyclic Graphs for Block-Based Connected Components Labeling]]>29199920124235<![CDATA[Learning Sparse and Identity-Preserved Hidden Attributes for Person Re-Identification]]>orthogonal generation module, along with identity-preserving and sparsity constraints. 1) Orthogonally generating: In order to make DHAs different from each other, Singular Vector Decomposition (SVD) is introduced to generate DHAs orthogonally. 2) Identity-preserving constraint: The generated DHAs should be distinct for telling different persons, so we associate DHAs with person identities. 3) Sparsity constraint: To enhance the discriminability of DHAs, we also introduce the sparsity constraint to restrict the number of effective DHAs for each person. Experiments conducted on public datasets have validated the effectiveness of the proposed network. On two large-scale datasets, i.e., Market-1501 and DukeMTMC-reID, the proposed method outperforms the state-of-the-art methods.]]>29201320252922<![CDATA[Variational Bayesian Blind Color Deconvolution of Histopathological Images]]>29202620368911<![CDATA[Deep Adversarial Metric Learning]]>29203720516058<![CDATA[Combining Faster R-CNN and Model-Driven Clustering for Elongated Object Detection]]>29205220655008<![CDATA[Semantic Image Segmentation by Scale-Adaptive Networks]]>29206620773047<![CDATA[Mask SSD: An Effective Single-Stage Approach to Object Instance Segmentation]]>29207820934972<![CDATA[Learning Latent Low-Rank and Sparse Embedding for Robust Image Feature Extraction]]>$ell _{2,1}$ -norm term to encourage the features to be more compact, discriminative and interpretable. Then, we enforce a columnwise $ell _{2,1}$ -norm constraint on an error component to resist noise. Finally, we integrate a classification loss term into the objective function to fit supervised scenarios. Our method performs better than several state-of-the-art methods in terms of effectiveness and robustness, as demonstrated on six publicly available datasets.]]>29209421073505<![CDATA[Convolutional Analysis Operator Learning: Acceleration and Convergence]]>patch-domain approaches that extract and store many overlapping patches across training signals. Due to memory demands, patch-domain methods have limitations when learning kernels from large datasets – particularly with multi-layered structures, e.g., convolutional neural networks – or when applying the learned kernels to high-dimensional signal recovery problems. The so-called convolution approach does not store many overlapping patches, and thus overcomes the memory problems particularly with careful algorithmic designs; it has been studied within the “synthesis” signal model, e.g., convolutional dictionary learning. This paper proposes a new convolutional analysis operator learning (CAOL) framework that learns an analysis sparsifying regularizer with the convolution perspective, and develops a new convergent Block Proximal Extrapolated Gradient method using a Majorizer (BPEG-M) to solve the corresponding block multi-nonconvex problems. To learn diverse filters within the CAOL framework, this paper introduces an orthogonality constraint that enforces a tight-frame filter condition, and a regularizer that promotes diversity between filters. Numerical experiments show that, with sharp majorizers, BPEG-M significantly accelerates the CAOL convergence rate compared to the state-of-the-art block proximal gradient (BPG) method. Numerical experiments for sparse-view computational tomography show that a convolutional sparsifying regularizer learned via CAOL significantly improves reconstruction quality compared to a conventional edge-preserving regularizer. Using more and wider kernels in a learned regularizer better preserves edges in reconstructed images.]]>29210821221820<![CDATA[Optical-Flow Based Nonlinear Weighted Prediction for SDR and Backward Compatible HDR Video Coding]]>29212321384858<![CDATA[Discriminative and Uncorrelated Feature Selection With Constrained Spectral Analysis in Unsupervised Learning]]>$sigma $ -norm regularization for interpolating between F-norm and $ell _{2,1}$ -norm. Due to the flexible gradient and global differentiability, our model converges fast. Extensive experiments on benchmark datasets among several state-of-the-art approaches verify the effectiveness of the proposed method.]]>29213921492817<![CDATA[Face Hallucination Using Cascaded Super-Resolution and Identity Priors]]>i) a cascaded super-resolution network that upscales the low-resolution facial images, and ii) an ensemble of face recognition models that act as identity priors for the super-resolution network during training. Different from most competing super-resolution techniques that rely on a single model for upscaling (even with large magnification factors), our network uses a cascade of multiple SR models that progressively upscale the low-resolution images using steps of $2times $ . This characteristic allows us to apply supervision signals (target appearances) at different resolutions and incorporate identity constraints at multiple-scales. The proposed C-SRIP model (Cascaded Super Resolution with Identity Priors) is able to upscale (tiny) low-resolution images captured in unconstrained conditions and produce visually convincing results for diverse low-resolution inputs. We rigorously evaluate the proposed model on the Labeled Faces in the Wild (LFW), Helen and CelebA datasets and report superior performance compared to the existing state-of-the-art.]]>29215021657530<![CDATA[Unsupervised Rotation Factorization in Restricted Boltzmann Machines]]>$gamma $ -score, a measure that calculates the amount of invariance, to mathematically and experimentally demonstrate that our approach indeed learns rotation invariant features. We show that our method outperforms the current state-of-the-art RBM approaches for rotation invariant feature learning on three different benchmark datasets, by measuring the performance with the test accuracy of an SVM classifier. Our implementation is available at https://bitbucket.org/tuttoweb/rotinvrbm.]]>29216621751247<![CDATA[Semi-Linearized Proximal Alternating Minimization for a Discrete Mumford–Shah Model]]>29217621897173<![CDATA[A Deep Learning Reconstruction Framework for Differential Phase-Contrast Computed Tomography With Incomplete Data]]>29219022026438<![CDATA[A Volumetric Approach to Point Cloud Compression—Part I: Attribute Compression]]>29220322163289<![CDATA[A Volumetric Approach to Point Cloud Compression–Part II: Geometry Compression]]>29221722293456<![CDATA[Toward Intelligent Sensing: Intermediate Deep Feature Compression]]>29223022433341<![CDATA[Low-Rank Approximation via Generalized Reweighted Iterative Nuclear and Frobenius Norms]]>${p}$ norm is stated to be a closer approximation to restrain the singular values for practical applications in the real world. However, Schatten-${p}$ norm minimization is a challenging non-convex, non-smooth, and non-Lipschitz problem. In this paper, inspired by the reweighted $ell _{1}$ and $ell _{2}$ norm for compressive sensing, the generalized iterative reweighted nuclear norm (GIRNN) and the generalized iterative reweighted Frobenius norm (GIRFN) algorithms are proposed to approximate Schatten-${p}$ norm minimization. By involving the proposed algorithms, the problem becomes more tractable and the closed solutions are derived from the iteratively reweighted subproblems. In addition, we prove that both proposed algorithms converge at a linear rate to a bounded optimum. Numerical experiments for the practical matrix completion (MC), robust principal component analysis (RPCA), and image decomposition problems are illustrated to validate the superior performance of both algorithms over some common state-of-the-art methods.]]>29224422575926<![CDATA[Visual Saliency Detection via Kernelized Subspace Ranking With Active Learning]]>29225822704175<![CDATA[2D Quaternion Sparse Discriminant Analysis]]>w using the weighted pairwise between-class distances. Extensive experiments on RGB and RGB-D databases demonstrate the effectiveness of 2D-QSDA and 2D-QSDA_{w} compared with peer competitors.]]>29227122863726<![CDATA[How is Gaze Influenced by Image Transformations? Dataset and Model]]>https://github.com/CZHQuality/Sal-CFS-
GAN.]]>292287230012828<![CDATA[A Wave-Shaped Deep Neural Network for Smoke Density Estimation]]>29230123134048<![CDATA[Robust Low-Rank Tensor Minimization via a New Tensor Spectral <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Support Norm]]>$k$ ) by an alternative convex relaxation. As an interpolation between the existing tensor nuclear norm (TNN) and tensor Frobenius norm (TFN), it is able to simultaneously drive minor singular values to zero to induce low-rankness, and to capture more global information for better preserving intrinsic structure. We provide the proximal operator and the polar operator for the TSP-$k$ norm as key optimization blocks, along with two showcase optimization algorithms for medium- and large-size tensors. Experiments on synthetic, image and video datasets in medium and large sizes, all verify the superiority of the TSP-$k$ norm and the effectiveness of both optimization methods in comparison with the existing counterparts.]]>29231423276721<![CDATA[Low Cost Gaze Estimation: Knowledge-Based Solutions]]>29232823438902<![CDATA[Deep Portrait Image Completion and Extrapolation]]>29234423555801<![CDATA[Local-Adaptive Image Alignment Based on Triangular Facet Approximation]]>29235623697210<![CDATA[Repeated Look-Up Tables]]>29237023793256<![CDATA[Deep Active Shape Model for Robust Object Fitting]]>$(i)$ comparing the performance of several image features used to extract observations from an input image; and $(ii)$ improving the performance of the model fitting by relying on a probabilistic framework that allows the use of multiple observations and is robust to the presence of outliers. The goal in $(i)$ is to maximize the quality of the observations by exploring a wide set of handcrafted features (HOG, SIFT, and texture templates) and more recent DL-based features. Regarding $(ii)$ , we use the Generalized Expectation-Maximization algorithm to deal with outliers and to extend the fitting process to multiple observations. The proposed framework is evaluated in the context of facial landmark fitting and the segmentation of the endocardium of the left ventricle in cardiac magnetic resonance volumes. We experimentally observe that the proposed approach is robust not only to outliers, but also to adverse initialization conditions and to large search regions (from where the observations are extracted from the image). Furthermore, the results of the proposed combination of the ASM with DL-based features are competitive with more recent DL approaches (e.g. FCN [1], U-Net [2] and CNN Cascade [3]), showing that it is possible to combine the benefits of statistical models and DL into a new deep ASM probabilistic framework.]]>29238023944565<![CDATA[BMAN: Bidirectional Multi-Scale Aggregation Networks for Abnormal Event Detection]]>29239524084895<![CDATA[RhythmNet: End-to-End Heart Rate Estimation From Face via Spatial-Temporal Representation]]>^{1}), which contains 2,378 visible light videos (VIS) and 752 near-infrared (NIR) videos of 107 subjects. Our VIPL-HR database contains various variations such as head movements, illumination variations, and acquisition device changes, replicating a less-constrained scenario for HR estimation. The proposed approach outperforms the state-of-the-art methods on both the public-domain and our VIPL-HR databases.

VIPL-HR is available at: http://vipl.ict.ac.cn/view_database.php?id=15

]]>29240924235538<![CDATA[Class-Specific Reconstruction Transfer Learning for Visual Recognition Across Domains]]>https://github.com/wangshanshanCQU/CRTL.]]>29242424383368<![CDATA[Graph-Based Compensated Wavelet Lifting for Scalable Lossless Coding of Dynamic Medical Data]]>29243924513850<![CDATA[Reconstruction of Binary Shapes From Blurred Images via Hankel-Structured Low-Rank Matrix Recovery]]>$r$ matrix that is formed by a Hankel structure on the pixels. We further propose efficient ADMM-based algorithms to recover the low-rank matrix in both noiseless and noisy settings. We also analytically investigate the number of required samples for successful recovery in the noiseless case. For this purpose, we study the problem in the random sampling framework, and show that with $mathcal {O}(rlog ^{4}(n_{1}n_{2}))$ random samples (where the size of the image is assumed to be $n_{1}times n_{2}$ ) we can guarantee the perfect reconstruction with high probability under mild conditions. We further prove the robustness of the proposed recovery in the noisy setting by showing that the reconstruction error in the noisy case is bounded when the input noise is bounded. Simulation results confirm that our proposed method outperform the conventional total variation minimization in the noiseless settings.]]>29245224623193<![CDATA[Tensor Multi-Task Learning for Person Re-Identification]]>29246324772990<![CDATA[Local Proximity for Enhanced Visibility in Haze]]>29247824915287<![CDATA[Fast and Accurate Depth Estimation From Sparse Light Fields]]>29249225066759<![CDATA[Adaptive Regularization of Some Inverse Problems in Image Analysis]]>29250725217499<![CDATA[Evaluating Local Geometric Feature Representations for 3D Rigid Data Matching]]>29252225353703<![CDATA[Quality Measurement of Images on Mobile Streaming Interfaces Deployed at Scale]]>29253625518817<![CDATA[Deep Retinal Image Segmentation With Regularization Under Geometric Priors]]>jointly learned for any given training set. To obtain physically meaningful and practically effective representation filters, we propose two new constraints that are inspired by expected prior structure on these filters: 1) orientation constraint that promotes geometric diversity of curvilinear features, and 2) a data adaptive noise regularizer that penalizes false positives. Multi-scale extensions are developed to enable accurate detection of thin vessels. Experiments performed on three challenging benchmark databases under a variety of training scenarios show that the proposed prior guided deep network outperforms state of the art alternatives as measured by common evaluation metrics, while being more economical in network size and inference time.]]>29255225677125<![CDATA[PML-LocNet: Improving Object Localization With Prior-Induced Multi-View Learning Network]]>only image-level supervision is provided. The key to solve such problems is to infer the object locations accurately. Previous methods usually model the missing object locations as latent variables, and alternate between updating their estimates and learning a detector accordingly. However, the performance of such alternative optimization is sensitive to the quality of the initial latent variables and the resulted localization model is prone to overfitting to improper localizations. To address these issues, we develop a Prior-induced Multi-view Learning Localization Network (PML-LocNet) which exploits both view diversity and sample diversity to improve object localization. In particular, the view diversity is imposed by a two-phase multi-view learning strategy, with which the complementarity among learned features from different views and the consensus among localized instances from each view are leveraged to benefit localization. The sample diversity is pursued by harnessing coarse-to-fine priors at both image and instance levels. With these priors, more emphasis would go to the reliable samples and the contributions of the unreliable ones would be decreased, such that the intrinsic characteristics of each sample can be exploited to make the model more robust during network learning. PML-LocNet can be easily combined with existing WSOL models to further improve the localization accuracy. Its effectiveness has been proved experimentally. Notably, it achieves 69.3% CorLoc and 50.4% mAP on PASCAL VOC 2007, surpassing the state-of-the-arts by a large margin.]]>29256825826569<![CDATA[Accurate Transmission Estimation for Removing Haze and Noise From a Single Image]]>292583259711213<![CDATA[Super-Resolution Phase Retrieval From Designed Coded Diffraction Patterns]]>29259826095636<![CDATA[Distilling Channels for Efficient Deep Tracking]]>29261026212578<![CDATA[Improved Techniques for Adversarial Discriminative Domain Adaptation]]>29262226374299<![CDATA[Group-Group Loss-Based Global-Regional Feature Learning for Vehicle Re-Identification]]>e.g., the decorations on the windshields. To accelerate the GRF learning and promote its discrimination power, we propose a Group-Group Loss (GGL) to optimize the distance within and across vehicle image groups. Different from the siamese or triplet loss, GGL is directly computed on image groups rather than individual sample pairs or triplets. By avoiding traversing numerous sample combinations, GGL makes the model training easier and more efficient. Those two contributions highlight this work from previous methods on vehicle Re-ID task, which commonly learn global features with triplet loss or its variants. We evaluate our methods on two large-scale vehicle Re-ID datasets, i.e., VeRi and VehicleID. Experimental results show our methods achieve promising performance in comparison with recent works.]]>29263826524321<![CDATA[Color Channel Compensation (3C): A Fundamental Pre-Processing Step for Image Enhancement]]>29265326654275<![CDATA[Fast Online 3D Reconstruction of Dynamic Scenes From Individual Single-Photon Detection Events]]>29266626753632<![CDATA[Learning No-Reference Quality Assessment of Multiply and Singly Distorted Images With Big Data]]>et al., TIP 2018] is a successful NR algorithm, this approach is still limited to the three distortion types. In this paper, we extend MUSIQUE to MUSIQUE-II to blindly assess the quality of images corrupted by five distortion types (white noise, Gaussian blur, JPEG compression, JPEG2000 compression, and contrast change) and their combinations. The proposed MUSIQUE-II algorithm builds upon the classification and parameter-estimation framework of its predecessor by using more advanced models and a more comprehensive set of distortion-sensitive features. Specifically, MUSIQUE-II relies on a three-layer classification model to identify 19 distortion types. To predict the five distortion parameter values, MUSIQUE-II extracts an additional 14 contrast features and employs a multi-layer probability-weighting rule. Finally, MUSIQUE-II employs a new most-apparent-distortion strategy to adaptively combine five quality scores based on outputs of three classification models. Experimental results tested on three multiply-distorted and six singly-distorted image quality databases show that MUSIQUE-II yields not only a substantial improvement in quality predictive performance as compared with its predecessor, but also highly competitive performance relative to other state-of-the-art FR/NR IQA algorithms.]]>29267626914856<![CDATA[Unsupervised Single Image Dehazing Using Dark Channel Prior Loss]]>29269227016182<![CDATA[High-Order Feature Learning for Multi-Atlas Based Label Fusion: Application to Brain Segmentation With MRI]]>e.g., the high-order relationship between voxels within a patch) of brain magnetic resonance (MR) images. To address this issue, this paper develops a high-order feature learning framework for multi-atlas based label fusion, where high-order features of image patches are extracted and fused for segmenting ROIs of structural brain MR images. Specifically, an unsupervised feature learning method (i.e., means-covariances restricted Boltzmann machine, mcRBM) is employed to learn high-order features (i.e., mean and covariance features) of patches in brain MR images. Then, a group-fused sparsity dictionary learning method is proposed to jointly calculate the voting weights for label fusion, based on the learned high-order and the original image intensity features. The proposed method is compared with several state-of-the-art label fusion methods on ADNI, NIREP and LONI-LPBA40 datasets. The Dice ratio achieved by our method is 88.30%, 88.83%, 79.54% and 81.02% on left and right hippocampus on the ADNI, NIREP and LONI-LPBA40 datasets, respectively, while the best Dice ratio yielded by the other methods are 86.51%, 87.39%, 78.48% and 79.65% on three datasets, respectively.]]>29270227133473<![CDATA[PaDNet: Pan-Density Crowd Counting]]>i.e., either a sparse or a dense crowd, meaning they performed well in global estimation while neglecting local accuracy. To make crowd counting more useful in the real world, we propose a new perspective, named pan-density crowd counting, which aims to count people in varying density crowds. Specifically, we propose the Pan-Density Network (PaDNet) which is composed of the following critical components. First, the Density-Aware Network (DAN) contains multiple subnetworks pretrained on scenarios with different densities. This module is capable of capturing pan-density information. Second, the Feature Enhancement Layer (FEL) effectively captures the global and local contextual features and generates a weight for each density-specific feature. Third, the Feature Fusion Network (FFN) embeds spatial context and fuses these density-specific features. Further, the metrics Patch MAE (PMAE) and Patch RMSE (PRMSE) are proposed to better evaluate the performance on the global and local estimations. Extensive experiments on four crowd counting benchmark datasets, the ShanghaiTech, the UCF_CC_50, the UCSD, and the UCF-QNRF, indicate that PaDNet achieves state-of-the-art recognition performance and high robustness in pan-density crowd counting.]]>29271427279816<![CDATA[MAVA: Multi-Level Adaptive Visual-Textual Alignment by Cross-Media Bi-Attention Mechanism]]>cross-media multi-pathway fine-grained network to extract not only the local fine-grained patches as discriminative image regions and key words, but also visual relations between image regions as well as textual relations from the context of sentences, which contain complementary information to exploit fine-grained characteristics within different media types. Second, we propose visual-textual bi-attention mechanism to distinguish the fine-grained information with different saliency from both local and relation levels, which can provide more discriminative hints for correlation learning. Third, we propose cross-media multi-level adaptive alignment to explore global, local and relation alignments. An adaptive alignment strategy is further proposed to enhance the matched pairs of different media types, and discard those misalignments adaptively to learn more precise cross-media correlation. Extensive experiments are conducted to perform image-sentence matching on 2 widely-used cross-media datasets, n-
mely Flickr-30K and MS-COCO, comparing with 10 state-of-the-art methods, which can fully verify the effectiveness of our proposed MAVA approach.]]>29272827414760<![CDATA[A Context Knowledge Map Guided Coarse-to-Fine Action Recognition]]>i.e. sports, cooking, etc. Therefore, in this paper, we propose a novel approach which recognizes human actions from coarse to fine. Taking full advantage of contributions from high-level semantic contexts, a context knowledge map guided recognition method is designed to realize the coarse-to-fine procedure. In the approach, we define semantic contexts with interactive objects, scenes and body motions in action videos, and build a context knowledge map to automatically define coarse-grained groups. Then fine-grained classifiers are proposed to realize accurate action recognition. The coarse-to-fine procedure narrows action categories in target classifiers, so it is beneficial to improving recognition performance. We evaluate the proposed approach on the CCV, the HMDB-51, and the UCF101 database. Experiments verify its significant effectiveness, on average, improving more than 5% of recognition precisions than current approaches. Compared with the state-of-the-art, it also obtains outstanding performance. The proposed approach achieves higher accuracies of 93.1%, 95.4% and 74.5% in the CCV, the UCF-101 and the HMDB51 database, respectively.]]>29274227523020<![CDATA[MonoFENet: Monocular 3D Object Detection With Feature Enhancement Networks]]>MonoFENet. Specifically, with the estimated disparity from the input monocular image, the features of both the 2D and 3D streams can be enhanced and utilized for accurate 3D localization. For the 2D stream, the input image is used to generate 2D region proposals as well as to extract appearance features. For the 3D stream, the estimated disparity is transformed into 3D dense point cloud, which is then enhanced by the associated front view maps. With the RoI Mean Pooling layer, 3D geometric features of RoI point clouds are further enhanced by the proposed point feature enhancement (PointFE) network. The region-wise features of image and point cloud are fused for the final 2D and 3D bounding boxes regression. The experimental results on the KITTI benchmark reveal that our method can achieve state-of-the-art performance for monocular 3D object detection.]]>29275327652870<![CDATA[Semi-Supervised Image Dehazing]]>29276627798174<![CDATA[Adaptive Sample-Level Graph Combination for Partial Multiview Clustering]]>29278027944047<![CDATA[A Multi-Domain and Multi-Modal Representation Disentangler for Cross-Domain Image Manipulation and Classification]]>$M^{2}RD$ ), with the goal of learning domain-invariant content representation with the associated domain-specific representation observed. By advancing adversarial learning and disentanglement techniques, the proposed model is able to perform continuous image manipulation across data domains with multiple modalities. More importantly, the resulting domain-invariant feature representation can be applied for unsupervised domain adaptation. Finally, our quantitative and qualitative results would confirm the effectiveness and robustness of the proposed model over state-of-the-art methods on the above tasks.]]>292795280710035<![CDATA[Deep Guided Learning for Fast Multi-Exposure Image Fusion]]>https://github.com/makedede/MEFNet.]]>29280828193109<![CDATA[Latent Elastic-Net Transfer Learning]]>29282028334547<![CDATA[Unsupervised Deep Contrast Enhancement With Power Constraint for OLED Displays]]>29283428444495<![CDATA[Phase Asymmetry Ultrasound Despeckling With Fractional Anisotropic Diffusion and Total Variation]]>29284528594015<![CDATA[Context-Interactive CNN for Person Re-Identification]]>29286028746527<![CDATA[Discriminative Residual Analysis for Image Set Classification With Posture and Age Variations]]>residual representations into a discriminant subspace. Such a projection subspace is expected to magnify the useful information of the input space as much as possible, then the relation between the training set and the test set described by the given metric or distance will be more precise in the discriminant subspace. We also propose a nonfeasance strategy by defining another approach to construct the unrelated groups, which help to reduce furthermore the cost of sampling errors. Two regularization approaches are used to deal with the probable small sample size problem. Extensive experiments are conducted on benchmark databases, and the results show superiority and efficiency of the new methods.]]>29287528885652