Loading [MathJax]/extensions/MathMenu.js
Alessio Paolo Buccino - IEEE Xplore Author Profile

Showing 1-25 of 109 results

Filter Results

Show

Results

Recently, foundation models (such as ChatGPT) have emerged with powerful learning, understanding, and generalization abilities, showcasing tremendous potential to revolutionarily promote modern industry. Despite significant advancements in various fields, existing general foundation models face challenges in industry when dealing with the data of specialized modalities, the tasks of varying-scenar...Show More
Generating animatable and editable 3D head avatars is essential for various applications in computer vision and graphics. Traditional 3D-aware generative adversarial networks (GANs), often using implicit fields like Neural Radiance Fields (NeRF), achieve photo-realistic and view-consistent 3D head synthesis. However, these methods face limitations in deformation flexibility and editability, hinder...Show More
The action recognition task involves analyzing video content and temporal relationships between frames to identify actions. Crucial to this process are action representations that effectively capture varying temporal scales and spatial motion variations. To address these challenges, we propose the Joint Coarse to Fine-Grained Spatio-Temporal Modeling (JCFG-STM) approach, which is designed to captu...Show More
The subtle and slight motions of micro-expressions (MEs) leave few effective features to micro-expression recognition (MER), making MER a challenging task. Existing works mainly focus on constructing strong representations from entire videos, individual frames, or redundant structural graphs, however, spatial structure feature learning of MEs leaves much space for further improvement. To solve the...Show More
Accurate representations of 3D faces are of paramount importance in various computer vision and graphics applications. However, the challenges persist due to the limitations imposed by data discretization and model linearity, which hinder the precise capture of identity and expression clues in current studies. This paper presents a novel 3D morphable face model, named ImFace++, to learn a sophisti...Show More
Graph Neural Networks(GNNs), such as Graph Convolutional Network, have exhibited impressive performance on various real-world datasets. However, many researches have confirmed that deliberately designed adversarial attacks can easily confuse GNNs on the classification of target nodes (targeted attacks) or all the nodes (global attacks). According to our observations, different attributes tend to b...Show More
Micro-expressions (MEs) are spontaneous facial movements that reveal an individual's genuine emotions and play a crucial role in various domains, including lie detection, criminal analysis, mental health treatment, national security, and others. Micro-expression recognition is a highly complex aspect within the domain of affective computing, aimed at identifying subtle facial motions that are diff...Show More
Facial expression recognition (FER) is a critical area of research in face analysis. While 2D data has been extensively used, 3D data offers inherent advantages, such as increased resilience to illumination and pose variations. However, the limited size of current 3D FER datasets significantly constrains the performance of 3D FER methods. To overcome this challenge, we propose a novel self-supervi...Show More
We focus on the generalization ability of the 6-DoF grasp detection method in this paper. While learning-based grasp detection methods can predict grasp poses for unseen ob-jects using the grasp distribution learned from the training set, they often exhibit a significant performance drop when encountering objects with diverse shapes and struc-tures. To enhance the grasp detection methods' general-...Show More
Recent strides in the development of diffusion models, ex-emplified by advancements such as Stable Diffusion, have underscored their remarkable prowess in generating visu-ally compelling images. However, the imperative of achieving a seamless alignment between the generated image and the provided prompt persists as a formidable challenge. This paper traces the root of these difficulties to invalid...Show More
Recent advancements in video semantic segmentation have made substantial progress by exploiting temporal correlations. Nevertheless, persistent challenges, including redundant computation and the reliability of the feature propagation process, underscore the need for further innovation. In response, we present Deep Common Feature Mining (DCFM), a novel approach strategically designed to address th...Show More
This paper focuses on the sim-to-real issue of RGB-D grasp detection and formulates it as a domain adaptation problem. In this case, we present a global-to-local method to address hybrid domain gaps in RGB and depth data and insufficient multi-modal feature alignment. First, a self-supervised rotation pre-training strategy is adopted to deliver robust initialization for RGB and depth networks. We ...Show More
In semi-supervised learning (SSL), many approaches follow the effective self-training paradigm with consistency regularization, utilizing threshold heuristics to alleviate label noise. However, such threshold heuristics lead to the underutilization of crucial discriminative information from the excluded data. In this paper, we present OTAMatch, a novel SSL framework that reformulates pseudo-labeli...Show More
In this letter, we pioneer to propose a binarization embedded weakly-supervised video anomaly detection (BE-WSVAD) method by constructing a binarized GCN-based anomaly detection module. Compared to the existing weakly-supervised video anomaly detection (WS-VAD) methods, BE-WSVAD focuses on the detection efficiency, which is ignored by the existing literature yet vital in real applications. Specifi...Show More
With the emergence of digitalization technology, digital twin bridges the gap between physical and virtual worlds in industrial production with synchronization, reliability, and fidelity. The manufacturing process of complex products needs multiple working procedures, where novel industrial parts occur, causing scenes to be variable for robots to perceive and grasp. Due to the geometric difference...Show More
Time-series prediction plays a crucial role in the Industrial Internet of Things (IIoT) to enable intelligent process control, analysis, and management, such as complex equipment maintenance, product quality management, and dynamic process monitoring. Traditional methods face challenges in obtaining latent insights due to the growing complexity of IIoT. Recently, the latest development of deep lea...Show More
Facial expression recognition (FER) in the wild is challenging due to various unconstrained conditions, i.e., occlusions and head pose variations. Previous methods tend to improve the performance of facial expression recognition through resorting to holistic methods or coarse local-based methods, while ignoring the local fine-grained feature structure knowledge and the correlation between features...Show More
The visual models pretrained on large-scale benchmarks encode general knowledge and prove effective in building more powerful representations for downstream tasks. Most existing approaches follow the fine-tuning paradigm, either by initializing or regularizing the downstream model based on the pretrained one. The former fails to retain the knowledge in the successive fine-tuning phase, thereby pro...Show More
Anomaly detection (AD), aiming to find samples that deviate from the training distribution, is essential in safety-critical applications. Though recent self-supervised learning based attempts achieve promising results by creating virtual outliers, their training objectives are less faithful to AD which requires a concentrated inlier distribution as well as a dispersive outlier distribution. In thi...Show More
Inspired by recent advances in diffusion models, which are reminiscent of denoising autoencoders, we investigate whether they can acquire discriminative representations for classification via generative pre-training. This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners: by pre-training on unconditional image ge...Show More
A key challenge for LiDAR-based 3D object detection is to capture sufficient features from large scale 3D scenes especially for distant or/and occluded objects. Albeit recent efforts made by Transformers with the long sequence modeling capability, they fail to properly balance the accuracy and efficiency, suffering from inadequate receptive fields or coarse-grained holistic correlations. In this p...Show More
Object detection on drone images with low-latency is an important but challenging task on the resource-constrained unmanned aerial vehicle (UAV) platform. This paper investigates optimizing the detection head based on the sparse convolution, which proves effective in balancing the accuracy and efficiency. Nevertheless, it suffers from inadequate integration of contextual information of tiny object...Show More
Realistic face rendering from multi-view images is beneficial to various computer vision and graphics applications. Due to complex spatially-varying reflectance properties and geometry characteristics of faces, however, it remains challenging to recover 3D facial representations both faithfully and efficiently in the current studies. This paper presents a novel 3D face rendering model, namely NeuF...Show More
Planar grasp detection is one of the most fundamental tasks to robotic manipulation, and the recent progress of consumer-grade RGB-D sensors enables delivering more comprehensive features from both the texture and shape modalities. However, depth maps are generally of a relatively lower quality with much stronger noise compared to RGB images, making it challenging to acquire grasp depth and fuse m...Show More
Group Activity Recognition (GAR) is a challenging task, where modeling spatio-temporal relationships among participants plays a fundamental role. To address this issue, we propose a novel end-to-end trainable network, termed Key Role Guided Transformer (KRGFormer). Different from current methods that concurrently take all individuals into account for global reasoning, it captures crucial contextua...Show More