ROBOVIS 2024 Abstracts


Area 1 - Computer Vision

Full Papers
Paper Nr: 17
Title:

Uncertainty Driven Active Learning for Image Segmentation in Underwater Inspection

Authors:

Luiza Ribeiro Marnet, Yury Brodskiy, Stella Grasshof and Andrzej Wasowski

Abstract: Active learning aims to select the minimum amount of data to train a model that performs similarly to a model trained with the entire dataset. We study the potential of active learning for image segmentation in underwater infrastructure inspection tasks, where large amounts of data are typically collected. The pipeline inspection images are usually semantically repetitive but with great variations in quality. We use mutual information as the acquisition function, calculated using Monte Carlo dropout. To assess the framework’s effectiveness, DenseNet and HyperSeg are trained with the CamVid dataset using active learning. In addition, HyperSeg is trained with a pipeline inspection dataset of over 50,000 images. For the pipeline dataset, HyperSeg with active learning achieved 67.5% meanIoU using 12.5% of the data, and 61.4% with the same amount of randomly selected images. This shows that using active learning for segmentation models in underwater inspection tasks can lower the cost significantly.

Paper Nr: 26
Title:

UCorr: Wire Detection and Depth Estimation for Autonomous Drones

Authors:

Benedikt Kolbeinsson and Krystian Mikolajczyk

Abstract: In the realm of fully autonomous drones, the accurate detection of obstacles is paramount to ensure safe navigation and prevent collisions. Among these challenges, the detection of wires stands out due to their slender profile, which poses a unique and intricate problem. To address this issue, we present an innovative solution in the form of a monocular end-to-end model for wire segmentation and depth estimation. Our approach leverages a temporal correlation layer trained on synthetic data, providing the model with the ability to effectively tackle the complex joint task of wire detection and depth estimation. We demonstrate the superiority of our proposed method over existing competitive approaches in the joint task of wire detection and depth estimation. Our results underscore the potential of our model to enhance the safety and precision of autonomous drones, shedding light on its promising applications in real-world scenarios.

Paper Nr: 31
Title:

Utilizing Dataset Affinity Prediction in Object Detection to Assess Training Data

Authors:

Stefan Becker, Jens Bayer, Ronny Hug, Wolfgang Hübner and Michael Arens

Abstract: Data pooling offers various advantages, such as increasing the sample size, improving generalization, reducing sampling bias, and addressing data sparsity and quality, but it is not straightforward and may even be counterproductive. Assessing the effectiveness of pooling datasets in a principled manner is challenging due to the difficulty in estimating the overall information content of individual datasets. Towards this end, we propose incorporating a data source prediction module into standard object detection pipelines. The module runs with minimal overhead during inference time, providing additional information about the data source assigned to individual detections. We show the benefits of the so-called dataset affinity score by automatically selecting samples from a heterogeneous pool of vehicle datasets. The results show that object detectors can be trained on a significantly sparser set of training samples without losing detection accuracy.

Paper Nr: 34
Title:

DAFDeTr: Deformable Attention Fusion Based 3D Detection Transformer

Authors:

Gopi Krishna Erabati and Helder Araújo

Abstract: Existing approaches fuse the LiDAR points and image pixels by hard association relying on highly accurate calibration matrices. We propose Deformable Attention Fusion based 3D Detection Transformer (DAFDeTr) to attentively and adaptively fuse the image features to the LiDAR features with soft association using deformable attention mechanism. Specifically, our detection head consists of two decoders for sequential fusion: LiDAR and image decoder powered by deformable cross-attention to link the multi-modal features to the 3D object predictions leveraging a sparse set of object queries. The refined object queries from the LiDAR decoder attentively fuse with the corresponding and required image features establishing a soft association, thereby making our model robust for any camera malfunction. We conduct extensive experiments and analysis on nuScenes and Waymo datasets. Our DAFDeTr-L achieves 63.4 mAP and outperforms well established networks on the nuScenes dataset and obtains competitive performance on the Waymo dataset. Our fusion model DAFDeTr achieves 64.6 mAP on the nuScenes dataset. We also extend our model to the 3D tracking task and our model outperforms state-of-the-art methods on 3D tracking.

Short Papers
Paper Nr: 20
Title:

Weapon Detection Using PTZ Cameras

Authors:

Juan D. Muñoz, Jesus Ruiz-Santaquiteria, Oscar Deniz and Gloria Bueno

Abstract: Massive shooting in public places are a stigma in some countries. Computer vision techniques are being actively researched in the last few years to process video from surveillance cameras and immediately detect the presence of an armed individual. The research, however, has focused on images taken from cameras that are (as is the typical case) far from the entrance where the individual first appears. However, most modern video surveillance cameras have some pan-tilt-zoom (PTZ) capabilities, fully controllable by the operator or some control software. In this paper, we make the first (as far as the authors know) exploration on the use of PTZ cameras in this particular problem. Our results unequivocally reveal the transformative impact of integrating PTZ functionality, particularly zoom and tracking capabilities, on the overall performance of these weapon detection models. Experiments were carefully executed in controlled environments, including laboratory and classroom settings, allowing for a comprehensive evaluation. In these settings, the utility of PTZ in improving detection outcomes became evident, especially when confronted with challenging conditions such as dim lighting or multiple individuals in the scene. This research underscores the immense potential of modern PTZ cameras for automatic firearm detection. This advancement holds the promise of augmenting public safety and security.

Paper Nr: 21
Title:

Improving Semantic Mapping with Prior Object Dimensions Extracted from 3D Models

Authors:

Abdessalem Achour, Hiba Al Assaad, Yohan Dupuis and Madeleine El Zaher

Abstract: Semantic mapping is a critical challenge that must be addressed to ensure the safe navigation of mobile robots. Equipping robots with semantic information enhances their interactions with humans, as well as their navigation and task planning capabilities. Semantic maps go beyond occupancy information, providing supplementary details about mapped elements that empower robots to gain a deeper understanding of their environment. In this study, we present a real-time RGBD-based semantic mapping solution designed for autonomous mobile robots. Our proposal focuses on a specific aspect of this solution: a novel association approach to generate the 2D shape of semantic objects using prior knowledge. We evaluate our approach in two diverse environments, employing the MIR mobile robot. Our experimental results, along with a comparison to existing approaches, demonstrate that our proposal can generate maps that closely approximate the ground truth.

Paper Nr: 24
Title:

GAT-POSE: Graph Autoencoder-Transformer Fusion for Future Pose Prediction

Authors:

Armin D. Pazho, Gabriel Maldonado and Hamed Tabkhi

Abstract: Human pose prediction, interchangeably known as human pose forecasting, is a daunting endeavor within computer vision. Owing to its pivotal role in many advanced applications and research avenues like smart surveillance, autonomous vehicles, and healthcare, human pose prediction models must exhibit high precision and efficacy to curb error dissemination, especially in real-world settings. In this paper, we unveil GAT-POSE, an innovative fusion framework marrying the strengths of graph autoencoders and transformers crafted for deterministic future pose prediction. Our methodology encapsulates a singular compression and tokenization of pose sequences through graph autoencoders. By harnessing a transformer architecture for pose prediction and capitalizing on the tokenized pose sequences, we construct a new paradigm for precise pose prediction. The robustness of GAT-POSE is ascertained through its deployment in three diverse training and testing ecosystems, coupled with the utilization of multiple datasets for a thorough appraisal. The stringency of our experimental setup underscores that GAT-POSE outperforms contemporary methodologies in human pose prediction, bearing significant promise to influence a variety of real-world applications favorably and lay a robust foundation for subsequent explorations in computer vision research.

Paper Nr: 27
Title:

A Quality Based Criteria for Efficient View Selection

Authors:

Rémy Alcouffe, Sylvie Chambon, Géraldine Morin and Simone Gasparini

Abstract: The generation of complete 3D models of real-world objects is a well-known problem. To reconstruct a 3D model, a Next Best View algorithm proposes the optimized next viewpoint to complete the reconstruction. The completeness of a model is important to avoid holes or missing parts on the object. Nevertheless, the accuracy of the reconstructed model is also very important. The accuracy of a reconstruction can be defined as the fidelity to the original model, but in the context of the 3D reconstruction, the ground truth model is usually not available. In this paper, we propose to evaluate the accuracy of the model through local intrinsic metrics, that reflect the quality of the current reconstructed model and based on geometric parameters of the model. We then use those metrics to propose new criteria for the View Selection problem. Tests performed on real and synthetic data show that the use of quality metrics helps the NBV algorithm to focus the view selection on the poor-quality parts of the reconstructed model and to improve its overall quality.

Area 2 - Intelligent Systems

Full Papers
Paper Nr: 15
Title:

Park Marking Detection and Tracking Based on a Vehicle On-Board System of Fisheye Cameras

Authors:

Ruben Naranjo, Joan Sintes, Cristina Pérez Benito, Pablo Alonso, Guillem Delgado, Nerea Aranjuelo and Aleksandar Jevtić

Abstract: Automatic parking assistance systems based on vehicle perception are becoming increasingly helpful both for driver's experience and road safety. In this paper, we propose a complete and embedded compatible parking assistance system able to detect, classify, and track parking spaces around the vehicle based on a 360º surround view camera system. Unlike the majority of the state-of-the-art studies, the approach outlined in this work is able to detect most types of parking slots without any prior parking slot information. Additionally, the method does not rely on bird-eye view images, since it works directly on fisheye images increasing coverage area around the vehicle while eliminating computational complexity. The authors propose a system to detect and classify, in real time, the parking slots on the fisheye images based on deep learning models. Moreover, the 2D camera detections are projected in a 3D space in which a Kalman Filter-based tracking is used to provide a unique identifier for each parking slot. Experiments done with a configuration of four cameras around the vehicle show that the presented method obtains qualitative and quantitative satisfactory results in different real live parking scenarios while maintaining real-time performance.

Paper Nr: 18
Title:

Enhancing Connected Cooperative ADAS: Deep Learning Perception in an Embedded System Utilizing Fisheye Cameras

Authors:

Guillem Delgado, Mikel Garcia, Jon Ander Íñiguez de Gordoa, Marcos Nieto, Gorka Velez, Cristina Pérez-Benito, David Pujol, Alejandro Miranda, Iu Aguilar and Aleksandar Jevtić

Abstract: This paper explores the potential of Cooperative Advanced Driver Assistance Systems (C-ADAS) that leverage Vehicle-to-Everything (V2X) communication to enhance road safety. The authors propose a deep learning based perception system, on a 360º surround view within the C-ADAS. This system also utilizes an On-Board Unit (OBU) for V2X message sharing to cater to vehicles lacking their own perception sensors. The feasibility of these systems is demonstrated, showcasing their effectiveness in various real-world scenarios, executed in real-time. The contributions include the introduction of a design for a perception system employing fish-eye cameras in the context of C-ADAS, with the potential for embedded integration, the validation of the feasibility of day 2 services in C-ITS, and the expansion of ADAS functions through Local Dynamic Map (LDM) for Collision Warning Application. The findings highlight the promising potential of C-ADAS in improving road safety and pave the way for future advancements in cooperative perception and driving systems.

Paper Nr: 37
Title:

A Meta-MDP Approach for Information Gathering Heterogeneous Multi-Agent Systems

Authors:

Alvin Gandois, Abdel-Illah Mouaddib, Simon Le-Gloannec and Ayman Alfalou

Abstract: In this paper, we address the problem of heterogeneous multi-robot cooperation for information gathering and situation evaluation in a stochastic and partially observable environment. The goal is to optimally gather information about targets in the environment with several robots having different capabilities. The classical Dec-POMDP framework is a good tool to compute an optimal joint policy for such problems. However, its scalability is weak. To overcome this limitation, we developed a Meta-MDP model with actions being individual policies of information gathering based on POMDPs. We compute an optimal exploration policy for each couple of robot and target, and the Meta-MDP model acts as a long-term optimal task allocation algorithm. We experiment our model on a simulation environment and compare to an optimal MPOMDP approach and show promising results on solution quality and scalability.

Short Papers
Paper Nr: 23
Title:

BiGSiD: Bionic Grasping with Edge-AI Slip Detection

Authors:

Youssef Nassar, Mario Radke, Atmaraaj Gopal, Tobias Knöller, Thomas Weber, ZhaoHua Liu and Matthias Rätsch

Abstract: Object grasping is a crucial task for robots, inspired by nature, where humans can flexibly grasp virtually any object and detect whether it is slipping or not, more by the sense of feeling than vision. In this work we present a bionic gripper with an Edge-AI device that is able to dextrously grasp the handled objects, sense and predict their slippage. In this paper, a bionic gripper with tactile sensors and a Time of Flight sensor is developed. We propose a LSTM model which is used to detect the slipping, where a 6 degree-of-freedom robot manipulator is used for data collection and testing. The aim of this paper is to develop an efficient slip detection system which we can apply to an edge device, so that our gripper can be a stand-alone product that can be attached to almost any robotic manipulator. We have collected a dataset, trained the model and achieved a slip detection accuracy of 95.34%. Due to the efficiency of our model we were able to implement the slip detection on an edge device. We use the Nvidia Jetson AGX Orin development board to show the efficiency in a real-time scenario. We show an experiment of slip detection of grasped objects and the gripper adjusting the grasping force accordingly to prevent slipping.

Paper Nr: 36
Title:

Operational Modeling of Temporal Intervals for Intelligent Systems

Authors:

J. I. Olszewska

Abstract: Time is a crucial notion for intelligent systems, such as robotic systems, cognitive systems, multi-agent systems, cyber-physical systems, or autonomous systems, since it is inherent to any real-world process and/or environment. Hence, in this paper, we present operational temporal logic notations for modeling the time aspect of intelligent systems in terms of temporal interval concepts. Their application to intelligent systems’ application scenarios have demonstrated the usefulness and effectiveness of our developed approach.

Area 3 - Robotics

Full Papers
Paper Nr: 43
Title:

Intuitive Multi-Modal Human-Robot Interaction via Posture and Voice

Authors:

Yuzhi Lai, Mario Radke, Youssef Nassar, Atmaraaj Gopal, Thomas Weber, ZhaoHua Liu, Yihong Zhang and Matthias Rätsch

Abstract: Collaborative robots promise to greatly improve the quality-of-life for the aging population and also easing elder care. However existing systems often rely on hand gestures, which can be restrictive and less accessible for users with cognitive disability. This paper introduces a multi-modal command input, which combines voice and deictic postures, to create a natural humanrobot interaction. In addition, we combine our system with a chatbot to make the interaction responsive. The demonstrated deictic postures, voice and the perceived table-top scene are processed in real-time to extract the human’s intention. The system is evaluated for increasingly complex tasks using a real Universal Robots UR3e 6-DoF robot arm. The preliminary results demonstrate a high success rate in task completion and a notable improvement compared to gesture-based systems. Controlling robots through multi-modal commands, as opposed to gesture control, can save up to 48.1% of the time taken to issue commands to the robot. Our system adeptly integrates the advantages of voice commands and deictic postures to facilitate intuitive human-robot interaction. Compared to conventional gesture control methods, our approach requires minimal training, eliminating the need to memorize complex gestures, and results in shorter interaction times.

Short Papers
Paper Nr: 12
Title:

Compute Optimal Waiting Times for Collaborative Route Planning

Authors:

Jörg Roth

Abstract: Collaborative routing tries to discover paths of multiple robots that avoid mutual collisions while optimising a common cost function. A collision can be avoided in two ways: a robot modifies its route to pass another robot, or one robot waits for the other to move first. Recent work assigns priorities to robots or models waiting times as an 'action' similar to driving. However, these methods have certain disadvantages. This paper introduces a new approach that computes theoretically optimal waiting times for given multi-routes. If all collisions can be avoided through waiting, the algorithm computes optimal places and durations to wait. We used this approach as component to introduce a collaborative routing system capable of solving complex routing problems involving mutual blocking.

Paper Nr: 22
Title:

Offline Deep Model Predictive Control (MPC) for Visual Navigation

Authors:

Taha Bouzid and Youssef Alj

Abstract: In this paper, we propose a new visual navigation method based on a single RGB perspective camera. Using the Visual Teach & Repeat (VT&R) methodology, the robot acquires a visual trajectory consisting of multiple subgoal images in the teaching step. In the repeat step, we propose two network architectures, namely ViewNet and VelocityNet. The combination of the two networks allows the robot to follow the visual trajectory. ViewNet is trained to generate a future image based on the current view and the velocity command. The generated future image is combined with the subgoal image for training VelocityNet. We develop an offline Model Predictive Control (MPC) policy within VelocityNet with the dual goals of (1) reducing the difference between current and subgoal images and (2) ensuring smooth trajectories by mitigating velocity discontinuities. Offline training conserves computational resources, making it a more suitable option for scenarios with limited computational capabilities, such as embedded systems. We validate our experiments in a simulation environment, demonstrating that our model can effectively minimize the metric error between real and played trajectories.

Paper Nr: 28
Title:

Multi-UAV Weed Spraying

Authors:

Ali Moltajaei Farid, Malek Mouhoub, Tony Arkles and Greg Hutch

Abstract: In agriculture, weeds reduce soil productivity and harvest quality. A common practice for weed control is via weed spraying. Ground spray of weeds is a common approach that may be harmful, destructive, and too slow, while aerial UAV spraying can be safe, non-destructive, and quick. Spraying efficiency and accuracy can be enhanced when adopting multiple UAVs. In this context, we propose a new multiple UAV spraying system that autonomously and accurately sprays weeds within the field. In our proposed system, a weed pressure map is first clustered. Then, the Voronoi approach generates the appropriate number of waypoints. Finally, a variant of the Traveling Salesman Problem (TSP) is solved to find the best UAV tour for each cluster. The latter task is performed using two nature-inspired techniques, namely, NSGA2 and MOEA/D. To assess the performance of each method, we conducted a set of simulation tests. The results reported in this paper demonstrate the superiority of NSGA2 over MOEA/D. In addition, the heterogeneity of UAVs is studied, where we have a mix of fixed-wing and multi-rotor drones for spraying.

Paper Nr: 29
Title:

Human Comfort Factors in People Navigation: Literature Review, Taxonomy and Framework

Authors:

Matthias Kalenberg, Christian Hofmann, Sina Martin and Jörg Franke

Abstract: Due to demographic shifts and improvements in medical care, person navigation systems (PNS) for people with disabilities are becoming increasingly important. So far, PNS have received less attention than mobile robots. However, the work on mobile robots cannot always be transferred to PNS because there are important differences in navigating people. In this paper, we address these differences by providing a comprehensive literature review on human comfort factors in people navigation, presenting a unified taxonomy for PNS and proposing a framework for integrating these factors into a navigation stack. Based on the results, we extract the key differences and human comfort factors that have been addressed in current literature. Furthermore, the literature review shows that there is no unified taxonomy in this field. To address this, we introduce the term people navigation and a taxonomy to categorize existing systems. Finally, we summarize the human comfort factors that have been considered so far and provide an outlook on their implementation. Our survey serves as a foundation for a comprehensive research in people navigation and identifies open challenges.

Paper Nr: 30
Title:

Region Prediction for Efficient Robot Localization on Large Maps

Authors:

Matteo Scucchia and Davide Maltoni

Abstract: Recognizing already explored places (a.k.a. place recognition) is a fundamental task in Simultaneous Localization and Mapping (SLAM) to enable robot relocalization and loop closure detection. In topological SLAM the recognition takes place by comparing a signature (or feature vector) associated to the current node with the signatures of the nodes in the known map. However, as the number of nodes increases, matching the current node signature against all the existing ones becomes inefficient and thwarts real-time navigation. In this paper we propose a novel approach to pre-select a subset of map nodes for place recognition. The map nodes are clustered during exploration and each cluster is associated with a region. The region labels become the prediction targets of a deep neural network and, during navigation, only the nodes associated with the regions predicted with high probability are considered for matching. While the proposed technique can be integrated in different SLAM approaches, in this work we describe an effective integration with RTAB-Map (a popular framework for real-time topological SLAM) which allowed us to design and run several experiments to demonstrate its effectiveness.

Paper Nr: 38
Title:

Interacting with a Visuotactile Countertop

Authors:

Michael Jenkin, Francois R. Hogan, Kaleem Siddiqi, Jean-François Tremblay, Bobak Baghi and Gregory Dudek

Abstract: We present the See-Through-your-Skin Display (STS-d), a device that integrates visual and tactile sensing with a surface display to provide an interactive user experience. This expands the application of visuotactile optical sensors to Human-Robot Interaction (HRI) tasks and more generally, Human-Computer Interaction (HCI) tasks. A key finding of this paper is that it is possible to display graphics on the reflective membrane of semi-transparent optical tactile sensors without interfering with their sensing capabilities, thus permitting simultaneous sensing and animation. A proof of concept demonstration of the technology is presented where the STS Visual Display (STS-d) is used to provide an animated countertop that responds to visual and tactile events. We show that the integrated sensor can monitor interactions with the countertop, such as predicting the timing and location of contact with an object, or the amount of liquid in a container being placed on it.

Paper Nr: 39
Title:

A Color Event-Based Camera Emulator for Robot Vision

Authors:

Ignacio Bugueno-Cordova, Miguel Campusano, Robert Guaman-Rivera and Rodrigo Verschae

Abstract: Event-based cameras are gaining popularity due to their asynchronous nature of information sensing and their speed, efficiency and high dynamic range advantages. Despite these benefits, the adoption of these sensors has been hindered, mainly due to their high cost. While the price is decreasing and commercial options exist, researchers and developers face barriers to addressing the potential of event-driven vision, especially with more specialized models. Although accurate simulators and emulators exist, their primary limitation lies in their inability to operate in real-time and in that they are designed only for greyscale video streams. This limitation creates a gap between theoretical exploration and practical application, impeding the seamless integration of event-based systems into real-world applications, especially in fields such as robotics. Moreover, the importance of color information is well recognized for many tasks, and most existing event-based cameras do not handle color information, except for a few exceptions. To address this challenge, we propose a ROS-based real-time color event camera emulator to aid in reducing the gap between in real-world applicability of event-based color cameras by presenting its software design and implementation. The paper also provides a preliminary evaluation to demonstrate its efficacy.

Paper Nr: 40
Title:

Fast Point Cloud to Mesh Reconstruction for Deformable Object Tracking

Authors:

Elham Amin Mansour, Hehui Zheng and Robert K. Katzschmann

Abstract: The world around us is full of soft objects that we as humans learn to perceive and deform with dexterous hand movements from a young age. In order for a Robotic hand to be able to control soft objects, it needs to acquire online state feedback of the deforming object. While RGB-D cameras can collect occluded information at a rate of 30 Hz, the latter does not represent a continuously trackable object surface. Hence, in this work, we developed a method that can create deforming meshes of deforming point clouds at a speed of above 50 Hz for different object categories. The reconstruction of meshes from point clouds has been long studied in the field of Computer graphics under 3D reconstruction and 4D reconstruction, however both lack the speed and generalizability needed for robotics applications. Our model is designed using a point cloud auto-encoder and a Real-NVP architecture. The latter is a continuous flow neural network with manifold-preservation properties. Our model takes a template mesh which is the mesh of an object in its canonical state and then deforms the template mesh to match a deformed point cloud of the object. Our method can perform mesh reconstruction and tracking at a rate of 58 Hz for deformations of six different ycb categories. An instance of a downstream application can be the control algorithm for a robotic hand that requires online feedback from the state of the manipulated object which would allow online grasp adaptation in a closed-loop manner. Furthermore, the tracking capacity that our method provides can help in the system identification of deforming objects in a marker-free approach. In future work, we will extend our method to more object categories and real world deforming point clouds

Paper Nr: 41
Title:

Estimation of Optimal Gripper Configuration Through an Embedded Array of Proximity Sensors

Authors:

Jonathas Henrique H. Pereira, Carlos Fernando Joventino, João Alberto Fabro and André Schneider de Oliveira

Abstract: The task of picking up and handling objects is a great robotic challenge. Estimating the best point where the gripper fingers should come into contact with the object before performing the pick-up task is essential to avoid failures.This study presents a new approach to estimating the grasping pose of objects using a database generated by a gripper through its proximity sensors. The grasping pose estimation simulates the points where the fingers should be positioned to obtain the best grasp of the object. In this study, we used a database generated by a reconfigurable gripper with three fingers that can scan different objects through distance sensors attached to the fingers and palm of the gripper. The grasping pose of 13 objects was estimated, which were classified according to their geometries. The analysis of the grasping pose estimates considered the versatility of the gripper used. These object grasping pose estimates were validated using the CoppeliaSim software, where it was possible to configure the gripper according to the estimates generated and pick up the objects using just two or three fingers of the reconfigurable gripper.

Paper Nr: 42
Title:

The Twinning Technique of the SyncLMKD Method

Authors:

Fabiano S. Cardoso, Ronnier F. Rohrich and André S. de Oliveira

Abstract: This article introduces a novel technique for establishing a Digital Twin counterpart twinning methodology, aiming to attain elevated fidelity levels for mobile robots. The proposed technique, denominated as Synchronization Logarithmic Mean Kinematic Difference (SyncLMKD), is elucidated in detail within the confines of this study. Addressing the diverse fidelity requirements intrinsic to Industry 4.0’s dynamic landscape necessitates a sophisticated numerical method. The SyncLMKD technique, being numerical, facilitates the dynamic and decoupled adjustment of compensations about trajectory planning. Consequently, this numerical methodology empowers the definition of various degrees of freedom when configuring environmental layouts. Moreover, this technique incorporates considerations such as the predictability of distances between counterparts and path planning. The article also comprehensively explores tuning control, insights, metrics, and control strategies associated with the SyncLMKD approach. Experimental validations of the proposed methodology were conducted on a virtual platform designed to support the SyncLMKD technique, affirming its efficacy in achieving the desired level of high fidelity for mobile robots across diverse operational scenarios.

Paper Nr: 13
Title:

Robot Vision and Deep Learning for Automated Planogram Compliance in Retail

Authors:

Adel Merabet, Abhishek V. Latha, Francis A. Kuzhippallil, Mohammad Rahimipour, Jason Rhinelander and Ramesh Venkat

Abstract: In this paper, automated planogram compliance technique is proposed for re-tail applications. A mobile robot with camera vision capabilities provides the images of the products on shelves, which are processed to reconstruct an overall image of the shelves to be compared to the planogram. The image re-construction includes image frames extraction from live video stream, images stitching and concatenation. Object detection, for the products, is achieved using a deep learning tool based on YOLOv5 model. Dataset, for algorithm training and testing, is built to identify the products based on their image identification, number, and location on the shelf. A small scale of shelves with products is built and different cases of products on shelves are tested in a laboratory environment. It was found that YOLOv5 algorithm detects various products on shelves with a precision of 0.98, recall of 0.99, F-measure of 0.98, and clarification loss of 0.006.

Paper Nr: 16
Title:

Analysis of Age Invariant Face Recognition Efficiency Using Face Feature Vectors

Authors:

Anders Hast, Yijie Zhou, Congting Lai and Ivar Blohm

Abstract: One of the main problems for face recognition when comparing photos of various ages is the influence of age progression on the facial features. The face undergoes many changes as a person grows older, including geometrical changes, but also changes in facial hair, the presence of glasses, etc. Even though biometric markers such as computed face feature vectors should preferably be invariant to such factors, face recognition becomes less reliable as the age span grows larger. Therefore, this study was conducted with the aim of exploring the efficiency of such feature vectors in recognising individuals despite variations in age, and how to measure face recognition performance and behaviour in the data. It is shown that they are indeed discriminative enough to achieve age-invariant face recognition without synthesising age images through generative processes or training on specialised age related features.

Paper Nr: 32
Title:

Optimizing Mobile Robot Navigation Through Neuro-Symbolic Fusion of Deep Deterministic Policy Gradient (DDPG) and Fuzzy Logic

Authors:

Muhammad Faqiihuddin Nasary, Azhar Mohd Ibrahim, Suaib Al Mahmud, Amir Akramin Shafie and Muhammad Imran Mardzuki

Abstract: Mobile robot navigation has been a sector of great importance in the autonomous systems research arena for a while. For ensuring successful navigation in complex environments several rule-based traditional approaches have been employed previously which possess several drawbacks in terms of ensuring navigation and obstacle avoidance efficiency. Compared to them, reinforcement learning is a novel technique being assessed for this purpose lately. However, the constant reward values in reinforcement learning algorithms limits their performance capabilities. This study enhances the Deep Deterministic Policy Gradient (DDPG) algorithm by integrating fuzzy logic, creating a neuro-symbolic approach that imparts advanced reasoning capabilities to the mobile agents. The outcomes observed in the environment resembling real-world scenarios, highlighted remarkable performance improvements of the neuro-symbolic approach, displaying a success rate of 0.71% compared to 0.39%, an average path length of 35 meters compared to 25 meters, and an average execution time of 120 seconds compared to 97 seconds. The results suggest that the employed approach enhances the navigation performance in terms of obstacle avoidance success rate and path length, hence could be reliable for navigation purpose of mobile agents.

Paper Nr: 35
Title:

MDC-Net: Multimodal Detection and Captioning Network for Steel Surface Defect

Authors:

Anthony Ashwin Peter Chazhoor, Shanfeng Hu, Bin Gao and Wai Lok Woo

Abstract: In the highly competitive steel sector, product quality, particularly in terms of surface integrity, is critical. Surface defect detection (SDD) is essential in maintaining high production standards, as it directly impacts product quality and manufacturing efficiency. Traditional SDD approaches, which rely primarily on manual inspection or traditional computer vision techniques, are plagued with difficulties, including reduced accuracy and potential health concerns to inspectors. This research describes an innovative solution that uses a sequence generation model with transformers to improve the defect detection process while manufacturing hot-rolled steel sheets and generating captions about the defect and its spatial location. This method, which views object detection as a sequence generation problem, allows for a more sophisticated understanding of image content and a complete and contextually rich investigation of surface defects whilst providing captions. While this method can potentially improve detection accuracy, its actual power rests in its scalability and flexibility to various industrial applications. Furthermore, this technique has the potential to be further enhanced for visual question-answering applications, opening up opportunities for interactive and intelligent image analysis.

Paper Nr: 45
Title:

Virtual Model of a Robotic Arm Digital Twin with MuJoCo

Authors:

Mauricio Becerra Vargas, Bernardo Perez Inturias and João Pedro G. Marques de Oliveira

Abstract: In this paper, a digital twin architecture for a Robotis manipulator’s arm is constructed on the Mujoco physics engine SDK. The virtual model in the Mujoco OpenGL virtual environment runs synchronously with the real robot via a TTL-USB physical communication and a C++ script running in Linux. The robot servomotor Dinamixel SDK and the MuJoCo SDK are segregated through threads for parallel execution in the C++ script. From the data flow perspective, we have proposed three scenarios in real-time: Digital Shadow, Digital Driven, and a Digital Twin itself. A preliminary test is performed to confirm the system is functioning as expected. This test compares the motor’s real and virtual torque in a static home position and a digital twin scenario. As this study is to be used as an exemplar for future research on Digital Twin frameworks, we propose future works to continue this research.