Loading [MathJax]/extensions/MathMenu.js
Sabri M. A. A. Ahmed - IEEE Xplore Author Profile

Showing 1-22 of 22 results

Filter Results

Show

Results

Bundle Adjustment (BA) aims to estimate the camera poses and build maps utilizing the nonlinear optimization algorithm. The update step of the optimization is obtained by solving a linear system, which is the bottleneck of the BA efficiency. Many works perform bundle adjustment in a distributed manner to reduce the computational cost of solving the linear system and achieve outstanding performance...Show More
Rich texture information is sometimes not good for deep learning, and can introduce unwanted noise information, especially for monocular depth learning. In order to solve the phenomenon of ‘messy textures’ caused by rich texture in monocular depth estimation, this paper exploits the property of intra-patch disparity consistency to reconstruct the image to mitigate this phenomenon. The joint traini...Show More
Panoptic segmentation, a unifying computer vision task combining semantic and instance segmentation, plays a crucial role in autonomous driving and robot navigation applications. With advancements in deep learning technology, particularly the introduction of the Transformer model, panoptic segmentation achieved significant progress. However, existing methods still struggle with performance in occl...Show More
Panoptic segmentation is a challenging perception task, which can help robots to comprehensively perceive the surrounding environment. In the task, we notice that semantic, instance, and panoptic have rich relations, however, which are rarely explored. In this work, we propose a novel panoptic, instance, and semantic bridged network to delve into the reciprocal relation. To make semantic and insta...Show More
With the increasing demands for perception accuracy in autonomous driving, there is a growing focus on fine-grained 3D semantic occupancy prediction. Effectively representing detailed three-dimensional scenes has become a significant challenge in the development of this task. In this paper, we present a novel transformer-based framework named CVFormer, which leverages two-dimensional circum-views ...Show More
Accurate localization for intelligent robots remains a significant challenge, and self-supervised visual-inertial odometry (VIO) has emerged as a promising solution. However, existing self-supervised VIO works consider inertial information as the ordinary data input, losing its ability to recover absolute scales and ignoring the modality difference of acceleration and angular velocity in inertial ...Show More
Perspective-n-Point (PnP) problem aims to estimate pose from known 3D map points and their projections. Efficient PnP (EPnP), one of the classical PnP solvers, represents camera pose with control points, which are easier to estimate utilizing the least square (LS) formulation. However, the geometry refinement procedure performed by most EPnP-based methods is separated from the solution of LS formu...Show More
Unsupervised domain adaptation (UDA) is proposed to better adapt the network trained on labeled synthetic data to unlabeled real-world data for addressing the annotation cost. However, most of these methods pay more attention to domain distributions in input and output stages while ignoring the important differences in semantic expressions and local details in middle feature stages. Therefore, a n...Show More
Stereo matching has been a hot research topic in 3D reconstruction. Although great progress has been made thanks to deep learning technique, the performance still suffers a significant drop when dealing with a new domain. Recently, LiDAR and Radar guidance strategies are explored to alleviate the generalization problem. However, they rely on extra depth sensor cues. In this paper, a novel SGM-base...Show More
The weakly-supervised audio-visual video parsing (AVVP) task aims to parse duration and categories of each snippet when only the video-level event labels are provided. Most methods either leverage attention mechanisms to explore cross-modal and cross-video event semantics or alleviate label noise to improve performance. However, the distributional modality discrepancy caused by the heterogeneity o...Show More
Category-level 6D object pose estimation aims to predict the full pose and size information for previously unseen instances from known categories, which is an essential portion of robot grasping and augmented reality. However, the core challenge of this task still is the enormous shape variation within each category. With regard to the challenge, we propose a novel framework SD-Pose, which utilize...Show More
Occlusions have long been a hard nut to crack in optical flow estimation due to ambiguous pixels matching between abutting images. Current methods only take two consecutive images as input, which is challenging to capture temporal coherence and reason about occluded regions. In this paper, we propose a novel optical flow estimation framework, namely MFCFlow, which attempts to compensate for the in...Show More
Neighborhood construction plays a key role in point cloud processing. However, existing models only use a single neighborhood construction method to extract neighborhood features, which limits their scene understanding ability. In this paper, we propose a learnable Dual-Neighborhood Feature Aggregation (DNFA) module embedded in the encoder that builds and aggregates comprehensive surrounding knowl...Show More
Depth estimation and semantic edge detection are two key tasks in computer vision, which have made great progress. To date, how to associatively predict the depth and the semantic edge is rarely explored. In this work, we first propose a flexible two-branch framework that can make the two tasks take advantage of each other, achieving a win-win situation. Specifically, for the semantic edge detecti...Show More
Recovering depth information from a single image is a long-standing challenge, and self-supervised depth estimation methods have gradually attracted attention due to not relying on high-cost ground truth. Constructing an accurate photometric loss based on photometric consistency is crucial for these self-supervised methods to obtain high-quality depth maps. However, the photometric loss in most st...Show More
Estimating head pose from a single RGB image has recently attracted considerable research attention. Prior arts employ a CNN backbone to process face images and then directly output Euler angles. We argue that they may ignore essential features that are highly correlated to head pose due to the non-global perspective, and the ambiguity and discontinuity issues of Euler angles representation could ...Show More
The 3D semantic map plays an increasingly important role in a wide variety of applications, especially for many kinds of task-driven robots. In this paper, we present a semantic mapping methodology for 3D semantic map obtaining from RGB-D scans. In contrast to existing methods that use 3D annotated information as supervisory, we focus on accurate 2D frame labeling and combine labels in 3D space us...Show More
Neural networks have recently achieved impressive success in semantic and instance segmentation on 2D images. However, their capabilities have not been fully explored to address semantic instance segmentation on unstructured 3D point cloud data. Digging into the regional feature representation to boost point cloud comprehension, we propose a region-feature-enhanced structure consisting of adaptive...Show More
Scene understanding is one of the foundations for robots to achieve true artificial intelligence. Semantic segmentation that imitates the mechanism of human visual system can effectively promote the correctness of scene understanding. It conforms to the basic principle of human environment perception and enables robots to better serve human society. In this paper, we propose a multilevel cross-awa...Show More
Semantic segmentation is the main step towards scene understanding which is one of the most important tasks of computer vision. As the depth and color information are independent, the combination of depth and RGB images can improve the quality of semantic labeling. In this paper, we proposed a multilevel cross-aware network (MCA-Net) for RGBD semantic segmentation to jointly reason about 2D appear...Show More
Motivated by recent successes on learning 3D feature representations, we present a Siamese network to generate representative 3D descriptors for 3D point matching in point cloud registration. Our system, dubbed HAF-Net, consists of feature extraction module, hierarchical feature reweighting and recalibration module (HRR), as well as feature aggregation and compression module. The HRR module is pro...Show More
Very deep convolutional neural networks (CNNs) have recently achieved great success in stereo matching. It is still highly desirable to learn a robust feature map to improve ill-posed regions, such as weakly textured regions, reflective surfaces, and repetitive patterns. Therefore, we propose an end-to-end multi-dimensional residual dense attention network (MRDA-Net) in this paper, focusing on mor...Show More