<![CDATA[ IEEE Transactions on Pattern Analysis and Machine Intelligence - new TOC ]]>
http://ieeexplore.ieee.org
TOC Alert for Publication# 34 2020November 30<![CDATA[Table of Contents]]>4212C1C1118<![CDATA[Cover]]>4212C2C2328<![CDATA[CoRRN: Cooperative Reflection Removal Network]]>4212296929829245<![CDATA[Gravitational Laws of Focus of Attention]]>4212298329953507<![CDATA[Guided Attention Inference Network]]>4212299630105964<![CDATA[Local Deformable 3D Reconstruction with Cartan's Connections]]>4212301130263971<![CDATA[On the Convergence of Learning-Based Iterative Methods for Nonconvex Inverse Problems]]>4212302730395391<![CDATA[On the Robustness of Semantic Segmentation Models to Adversarial Attacks]]>adversarial examples. This phenomenon has recently attracted a lot of attention but it has not been extensively studied on multiple, large-scale datasets and structured prediction tasks such as semantic segmentation which often require more specialised networks with additional components such as CRFs, dilated convolutions, skip-connections and multiscale processing. In this paper, we present what to our knowledge is the first rigorous evaluation of adversarial attacks on modern semantic segmentation models, using two large-scale datasets. We analyse the effect of different network architectures, model capacity and multiscale processing, and show that many observations made on the task of classification do not always transfer to this more complex task. Furthermore, we show how mean-field inference in deep structured models, multiscale processing (and more generally, input transformations) naturally implement recently proposed adversarial defenses. Our observations will aid future efforts in understanding and defending against adversarial examples. Moreover, in the shorter term, we show how to effectively benchmark robustness and show which segmentation models should currently be preferred in safety-critical applications due to their inherent robustness.]]>4212304030533784<![CDATA[One Shot Segmentation: Unifying Rigid Detection and Non-Rigid Segmentation Using Elastic Regularization]]>non-rigid segmentation of deformable objects in image sequences, which is based on one-shot segmentation that unifies rigid detection and non-rigid segmentation using elastic regularization. The domain of application is the segmentation of a visual object that temporally undergoes a rigid transformation (e.g., affine transformation) and a non-rigid transformation (i.e., contour deformation). The majority of segmentation approaches to solve this problem are generally based on two steps that run in sequence: a rigid detection, followed by a non-rigid segmentation. In this paper, we propose a new approach, where both the rigid and non-rigid segmentation are performed in a single shot using a sparse low-dimensional manifold that represents the visual object deformations. Given the multi-modality of these deformations, the manifold partitions the training data into several patches, where each patch provides a segmentation proposal during the inference process. These multiple segmentation proposals are merged using the classification results produced by deep belief networks (DBN) that compute the confidence on each segmentation proposal. Thus, an ensemble of DBN classifiers is used for estimating the final segmentation. Compared to current methods proposed in the field, our proposed approach is advantageous in four aspects: (i) it is a unified framework to produce rigid and non-rigid segmentations; (ii) it uses an ensemble classification process, which can help the segmentation robustness; (iii) it provides a significant reduction in terms of the number of dimensions of the rigid and non-rigid segmentations search spaces, compared to current approaches that divide these two problems; and (iv) this lower dimensionality of the search space can also reduce the need for large annotated training sets to be used for estimating the DBN models. Expe-
iments on the problem of left ventricle endocardial segmentation from ultrasound images, and lip segmentation from frontal facial images using the extended Cohn-Kanade (CK+) database, demonstrate the potential of the methodology through qualitative and quantitative evaluations, and the ability to reduce the search and training complexities without a significant impact on the segmentation accuracy.]]>4212305430704474<![CDATA[Real-World Image Denoising with Deep Boosting]]>https://github.com/ngchc/deepBoosting.]]>4212307130877246<![CDATA[Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning]]>4212308831013428<![CDATA[Semantic Fisher Scores for Task Transfer: Using Objects to Classify Scenes]]>$Q$Q-function of the expectation-maximization (EM) algorithm used to estimate its parameters by maximum likelihood. This enables 1) immediate derivation of FVs for any model for which an EM algorithm exists, and 2) leveraging efficient implementations from the EM literature for the computation of FVs. It is then shown that standard FVs, such as those derived from Gaussian or even Dirichlet mixtures, are unsuccessful for the transfer of semantic posteriors, due to the highly non-linear nature of the probability simplex. The analysis of these FVs shows that significant benefits can ensue by 1) designing FVs in the natural parameter space of the multinomial distribution, and 2) adopting sophisticated probabilistic models of semantic feature covariance. The combination of these two insights leads to the encoding of the BoS in the natural parameter space of the multinomial, using a vector of Fisher scores derived from a mixture of factor analyzers (MFA). A network implementation of the MFA Fisher Score (MFA-FS), denoted as the MFAFSNet, is finally proposed to enable end-to-end training. Experiments with various object CNNs and datasets show that the approach has state-of-the-art transfer performance. Somewhat surprisingly, the scene classification results are superior to those of a CNN explicitly trained for scene -
lassification, using a large scene dataset (Places). This suggests that holistic analysis is insufficient for scene classification. The modeling of local object semantics appears to be at least equally important. The two approaches are also shown to be strongly complementary, leading to very large scene classification gains when combined, and outperforming all previous scene classification approaches by a sizable margin.]]>4212310231181725<![CDATA[Trace Quotient with Sparsity Priors for Learning Low Dimensional Image Representations]]>SparLow) function, for jointly learning both a sparsifying dictionary and a dimensionality reduction transformation. The SparLow function is widely applicable for developing various algorithms in three classic machine learning scenarios, namely, unsupervised, supervised, and semi-supervised learning. In order to develop efficient joint learning algorithms for maximizing the SparLow function, we deploy a framework of sparse coding with appropriate convex priors to ensure the sparse representations to be locally differentiable. Moreover, we develop an efficient geometric conjugate gradient algorithm to maximize the SparLow function on its underlying Riemannian manifold. Performance of the proposed SparLow algorithmic framework is investigated on several image processing tasks, such as 3D data visualization, face/digit recognition, and object/scene categorization.]]>4212311931351983<![CDATA[Vocabulary-Informed Zero-Shot and Open-Set Learning]]>4212313631522520<![CDATA[Cover]]>4212C3C3328<![CDATA[Cover]]>4212C4C4699