ROBOVIS 2026 Abstracts


Area 1 - Computer Vision

Full Papers
Paper Nr: 17
Title:

Diffusion and Sensor Cross-Attention Based Deblurring of Camera Images

Authors:

Alexander Schwind, Johann Nikolai Hark, Bernd Schaeufele and Ilja Radusch

Abstract: Automated driving requires highly precise and up-to-date maps. One option is to use camera data from mobile phones from large vehicle fleets to derive map data. While this is a promising approach, handling effects from motion blur is still a challenging task. This paper investigates the performance of diffusion models for image deblurring. To this end, a diffusion model is trained, and a modified attention mechanism is employed to incorporate inertial measurement unit sensor data into the generative process. A transformer decoder is trained, incorporating cross-attention blocks to correlate inertial measurement unit data with pixel values. A new mechanism is introduced, utilizing two nested cross attention blocks to dynamically modify pixel dependencies based on dependencies from the inertial measurement unit. The improved attention mechanism provides significant advantages in visual quality, evaluating both direct comparisons and extended training experiments. The integration of inertial measurement unit data via an attention mechanism has the potential to improve the efficiency and accuracy of image restoration techniques by providing a larger contextual basis for modeling the recovery process.

Paper Nr: 50
Title:

Generating Human-Understandable Descriptions of Novel Objects for Verbal Interactions with Edge-Based Robots

Authors:

Sarah Schneider, Evan Krause, Marlow Fawn, Doris Antensteiner, Csaba Beleznai, Daniel Soukup and Matthias Scheutz

Abstract: Mobile robots are becoming increasingly prevalent across a wide range of environments. They must effectively perceive the open world despite constraints in computational power and network resources, while also communicating their understanding to human partners. We present a compact neural structural encoder that supports object-level open-world understanding by decomposing novel objects into a set of known primitives drawn from a component vocabulary. Embedded within a cognitive architecture, the system maps geometric information into human-language descriptions and visualizations that prioritize structured interpretability over unrestricted expressiveness. Our approach uses synthetic data generation, model training on synthetic data, and reconstruction consistency estimation to indicate description reliability. A user study confirms that the generated descriptions are informative for human collaborators and shows how our human-language descriptions compare to GPT-generated descriptions, which rely on far greater computational resources. Different description versions are compared based on user preferences, and an on-robot demonstration illustrates the practical feasibility of our method. This work serves as a blueprint for an efficient and accessible vision-based object description system suited for open-world robotic collaboration.

Short Papers
Paper Nr: 24
Title:

Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments

Authors:

Raman Jha, Yang Zhou and Giuseppe Loianno

Abstract: In this paper, we propose an adaptive keyframe selection method for improved 3D scene reconstruction in dynamic environments using RGB-D sensors. The proposed method integrates two complementary modules: an error-based selection module utilizing photometric and structural similarity (SSIM) errors derived from depth-based warping, and a momentum-based update module that dynamically adjusts keyframe selection thresholds according to scene motion dynamics. By dynamically curating the most informative frames, our approach addresses a key data bottleneck in real-time perception. This allows for the creation of high-quality 3D world representations from a compressed data stream, a critical step towards scalable robot learning and deployment in complex, dynamic environments. Experimental results demonstrate significant improvements over traditional static keyframe selection strategies, such as fixed temporal intervals or uniform frame skipping. These findings highlight a meaningful advancement toward adaptive perception systems that can dynamically respond to complex and evolving visual scenes. We evaluate our proposed adaptive keyframe selection module on two recent state-of-the-art 3D reconstruction networks, Spann3r and CUT3R, and observe consistent improvements in reconstruction quality across both frameworks. Furthermore, an extensive ablation study confirms the effectiveness of each individual component in our method, underlining their contribution to the overall performance gains.

Paper Nr: 51
Title:

Hierarchical Semantic Gating for Efficient Real-Time Action Recognition in Critical Infrastructure

Authors:

Julen Beldarrain Portugal, Javier Calle Armendariz, David Redó and Peter Leškovský

Abstract: Surveillance in Critical Infrastructures (CI) demands real-time understanding of human actions. However, deploying continuous Action Recognition (AR) models on edge devices (such as UGVs or smart CCTVs) presents a severe computational bottleneck. In a target-based approach, the processing cost scales linearly with the number of people in the scene, making standard heavy video classifiers unfeasible for crowded environments or battery-constrained robots. This position paper proposes a cost-effective perception pipeline designed to operate independently on both fixed cameras and mobile robots. Instead of relying on complex multi-sensor data fusion, we present a hierarchical workflow governed by a "Semantic Gate". This lightweight mechanism utilizes geometric and temporal metadata from the object detector, specifically exploiting the temporal stability of NMS-free end-to-end detectors to suppress routine behaviours. This gating strategy triggers heavy action recognition models only when non-routine dynamics are observed, drastically reducing energy consumption while maintaining responsiveness to critical events like intrusions or fallen person scenarios.

Paper Nr: 15
Title:

Visual Perception as Constraint Resolution: A Spiral-Time Perspective

Authors:

Siamak Khatibi, Linus de Petris and Yuan Zhou

Abstract: Visual perception is often modeled as a causal sequence of feature extraction and decision stages. In contrast, this paper explores a non-causal account of perception based on spiral-time, where globally coherent scene interpretations emerge as fixed points of a constraint-resolution process. We formalize visual understanding as the search for label assignments over image regions that jointly satisfy a set of declarative structural constraints (e.g., part–whole relations, mutual exclusions, and co-occurrence patterns). We instantiate this formulation in a simple spiral-time-inspired simulation that searches over candidate region–label configurations and retains those that satisfy the specified constraints. The framework is illustrated in toy scenarios that involve arrangement of objects in parts and basic relational structure. We evaluate the resolved concepts using semantic, structural, and string-based coherence metrics, showing that the constraint mechanism can consistently recover the intended concepts in randomized initializations. Although our current experiments operate on symbolic region descriptions rather than low-level image features and are not intended as a competitive vision system, they serve as a proof of concept for viewing perception as constraint resolution instead of feed-forward causal computation. We discuss connections to classical constraint-satisfaction methods, structured inference in computer vision, and non-causal perspectives in philosophy of science, and outline how future work could ground the framework in real visual data and learned constraints.

Area 2 - Intelligent Systems

Full Papers
Paper Nr: 19
Title:

Gesture-Aware Federated Learning with KAN-Based Feature Modeling for Biometric Recognition

Authors:

Berke Cansiz, Murat Taskiran and Nihan Kahraman

Abstract: With federated learning approaches, integration into biometric systems with different client distributions becomes possible. In this context, gesture-related pattern differences can be effectively utilized to create distinct clients, particularly when individuals perform different gestures. The variety of gestures performed by users enriches the patterns learned across clients. However, conventional aggregation methods disregard these pattern-based differences and treat all data as if it were of the same type. It is thought that this situation will lead to performance losses. Therefore, we hypothesize that classification performance would improve if clients are aware of the gestures performed in their datasets and the server is informed about both the updated model parameters and the corresponding gesture of each client. Within this scope, we propose the FedGest approach, which considers gesture-based heterogeneity in federated learning. This approach provides two key modifications: assigning coefficients to clients based on initial subject classification performance of each gesture data and the addition of gesture information at the feature-level. During the evaluation, it was compared with the benchmark aggregation methods. The model architecture used in this comparison was designed by integrating a Kolmogorov Arnold Networks-based feature extraction module. Results reveal that the proposed FedGest demonstrates superior performance compared to benchmark aggregation methods.

Short Papers
Paper Nr: 31
Title:

Accreditation and Conformity Assessment as a Bridge between Regulation and Intelligent Systems: An Ontology-based Cybersecurity Framework under Article 15(5) of the European Artificial Intelligence Act

Authors:

Danny Neubauer, Loui Al Sardy and Reinhard German

Abstract: In Europe, national accreditation and conformity assessment bodies serve as a bridge between regulations and the products of manufacturers. Conformity assessment activities ensure the production of high-quality products for civil society, fostering trust, promoting free trade, and ensuring safety. However, security is equally crucial, especially when intelligent technology systems are involved. For example, products such as small robot toys that utilize computer vision and intelligent systems for emotion recognition should comply with ethical principles, ensuring that neither mental health is harmed nor children′s cognitive development is manipulated. The European Union has recognized this and introduced the risk-based Artificial Intelligence Act to prevent such misuse. In particular, Article 15 of the European Union Artificial Intelligence Act requires technical and organizational measures for high-risk Artificial Intelligence systems that interact and communicate with natural persons to ensure cybersecurity. The aim of this research project is to identify gaps in the application of cybersecurity standards, such as the Common Criteria, to meet the requirements of Article 15 of the European Artificial Intelligence Act during conformity assessment activities. Furthermore, it seeks to demonstrate an ontology-based approach for a cybersecurity framework for non-deterministic intelligent technology systems. The gap analysis and demonstration of an alternative solution approach will focus on high-risk children′ s robot toys with emotional Artificial Intelligence. Generally, the project aims to enhance the security of non-deterministic intelligent technology systems for the European market through adapted conformity assessment activities. In simple terms, secure intelligent technology systems, and even more succinctly referred to as SEC IT ("secure it").

Area 3 - Robotics

Full Papers
Paper Nr: 22
Title:

An Automated Modular In Situ Machine Vision System for Real-Time Phytoplankton Monitoring Using the AFTI-scope

Authors:

Andreas Vik Aasum, Isak Orlando Wangensteen, Mathias Haugum, Glaucia Moreira Fragoso, Christian Schellewald, Tor Arne Johansen and Annette Stahl

Abstract: Phytoplankton are central to global biogeochemical cycles and aquatic food webs, yet their monitoring remains constrained by costly instrumentation and labor-intensive workflows. We present a modular, automated machine vision system for in situ, real-time detection, classification, and tracking of phytoplankton using the open-source AFTI-scope imaging platform. Our approach systematically benchmarks one-stage, two-stage, and transformer-based detectors, identifying YOLOv11 as the most effective balance of accuracy and efficiency for resource-constrained deployment. To improve species-level identification, we integrate an ensemble classifier (EfficientNet_B0 + Swin_T), boosting F1-scores to 0.9852, and couple this with BoT-SORT tracking to enable robust, non-redundant abundance estimation under dynamic flow conditions. Unlike prior systems, our pipeline is designed for embedded GPU hardware (Jetson AGX Orin) and validated on real-world deployments, demonstrating feasibility for scalable, autonomous plankton monitoring. This work presents an open-source flow-through imaging system with a real-time machine learning pipeline, along with integrated and standardized datasets combining public and AFTI-scope imagery. It further provides comprehensive benchmarking of detection, classification, and tracking architectures under in-situ conditions, and demonstrates in-field validation of real-time species-level monitoring with a design optimized for deployment on autonomous surface vehicles. This work provides the first embedded, low-cost platform capable of continuous, automated phytoplankton observation, supporting future large-scale marine ecosystem monitoring.

Paper Nr: 28
Title:

GeoFetch: Design and Proof-of-Concept of a Soil Sampling Module for Quadruped Robots

Authors:

Sleiman El Bobbou, Aleksander Dabrowski, Silas Gramlich, Lisa Hessenthaler, Tobias Hogh, Michael Poncelet and Arne Rönnau

Abstract: This paper presents a modular soil-sampling system developed as a proof of concept for agricultural mobile robots. The use of legged robots for soil sampling for precision agriculture remain relatively underexplored compared to wheeled solutions, despite offering better rough‑terrain mobility. GeoFetch is designed to be attached to a Unitree Go2 walking robot. It employs an auger drilling mechanism to collect soil samples in compliance with precision farming requirements. For the product development process the VDI 2221 design methodology was used to create a lightweight and portable prototype. Laboratory and initial field tests demonstrate reliable soil sample retrieval. Our design offers advantages in terrain accessibility, modular adaptability, and reduced human labor. Multipurpose robotic solution has the potential to support fertilizer optimization, while additional modules could be integrated to perform other tasks throughout the crop cycle.

Paper Nr: 33
Title:

ORVIS: An Ontology-Based Robot Vision System

Authors:

Mark Adamik, Ilaria Tiddi, Stefan Schlobach and Michaela Kuempel

Abstract: Most robotic systems rely on perceptual capabilities to perform a task, and their level of autonomy often depends on the granularity of such perception. Although the field of computer vision now offers many advanced, deep-learning-based tools that are publicly available, most of the developed models remain decoupled from robotic applications. One of the key limitations is that the outputs of computer vision models currently lack structured semantic integration across models. As a result, system engineers must manually interpret the meaning of the outputs and the contextual information they convey. To address this gap, we propose using ontologies to enable the perceptual data integration of diverse vision models into robotic systems, and present an Ontology-based Robotic Vision System (ORViS) which makes the computer vision models from the HuggingFace platform available as ROS services. ORViS uses the Ontology for Robotic Knowledge Acquisition (ORKA) to organize the output provided by the vision models and to enrich them with semantic information, and we propose a Perceived-Entity Linking algorithm to link the outputs of the vision algorithms to external knowledge graphs, enabling data integration across models. We evaluate ORViS through a demonstration involving a mobile manipulator equipped with an RGB-D camera, where the robot needs to prepare a meal by selecting a variety of objects. ORViS is a step towards improving the perception capabilities in the perception-action loop of robotic agents. Furthermore, this work seeks to strengthen the bridge between two ommunities, by aiding roboticists in using advanced computer vision models. ORViS is also publicly available (https://github.com/Dorteel/orvis).

Paper Nr: 52
Title:

An Investigation into the Influence of Robot Kinematics and of the Lidar Positioning on Mapping Performance of Cartographer Slam Algorithm

Authors:

Denis N'chot, Ian Sandall and Heba Lakany

Abstract: Mobile robots require reliable navigation systems to carry out tasks in domains such as exploration, warehouse logistics and farming. Navigation relies on localisation, which in turn depends on an accurate representation of the robot’s surroundings; Simultaneous Localisation and Mapping (SLAM) provides this capability by enabling a robot to build a map of its environment and estimate its pose within that map while moving. This paper investigates how the robot’s starting position, mapping speed and LIDAR sensor location affect the quality of maps created with the Cartographer SLAM algorithm. These three factors are systematically varied in a design of experiments of 90 runs. This study addresses the critical lack of standardised evaluation by introducing a robust, multi-metric cost function that synthesises five distinct performance dimensions into a single, objective quality score. By unifying ground-truth-based metrics (pose error, map overlap) with map-internal measures (enclosed areas, occupancy, corner count), this work provides a comprehensive framework for SLAM performance benchmarking. This novel single-score approach empowers practitioners to move beyond subjective visual inspection toward a rigorous, quantifiable assessment of map definition and structural integrity.

Short Papers
Paper Nr: 11
Title:

On a Decentralized Task-Aware Adaptive Multi-Robot Positioning

Authors:

Menaxi J. Bagchi and Shivashankar Nair

Abstract: Positioning robots at the right places such that they remain close to the locations where they are needed the most in a multi-robot scenario can have a great impact on productivity. This is beneficial, particularly in multi-robot settings, where effective coordination and timely task completion are crucial and the robots need to navigate to the task locations to carry out the tasks. In this work, we propose a decentralized algorithm that facilitates the robots to autonomously determine and position themselves at such appropriate locations, termed as goal locations so that they are within close proximity of all the locations where they are required to perform their generally allocated tasks. The robots use the proposed Decentralized Adaptive Robot Positioning (DARP) algorithm to determine goal locations and use Deep Q Networks (DQN) to learn to move towards them while also ensuring they do not collide with both stationary and moving obstacles. The robots also periodically evaluate their task allocation histories at the end of each fixed duration and may relocate to new goal locations if necessary. The experiments conducted in Webots confirm the effectiveness of the proposed method and emphasize the importance of positioning robots at their goal locations.

Paper Nr: 12
Title:

An Improved RRT*M Algorithm for Path Planning in 2D Dynamic Environment

Authors:

Runjin Wang, Tianyu Zhang, Feiyang Xiao, Kunpeng Wang and Saicheong Fok

Abstract: Dynamic path planning is critical for autonomous vehicles, where changing obstacles make static algorithms such as A*, Dijkstra, APF, and RRT* less effective. This paper presents an improved RRT*M algorithm for 2D dynamic environment with time-varying obstacles. By integrating a global path examination module that validates paths against obstacles at precise timestamps, the algorithm ensures collision-free navigation. The algorithm optimizes the sampling process by generating and storing multiple random nodes per time step, enhancing path diversity and quality. Compared with the phased RRT and improved Informed-RRT*, the proposed algorithm achieves higher success rates and smoother, shorter paths, demonstrating its superior path efficiency and adaptability, with potential for further optimization.

Paper Nr: 13
Title:

Multiperspective Approach for Semi-Autonomous Robot Control Using Fusion of Exocentric Cameras

Authors:

Grimaldo Silva, Khansa Rekik and Rainer Müller

Abstract: This paper explores semi-autonomous robot control through interaction with augmented video feeds broadcast by exocentric cameras. With the increasing pervasiveness of cameras in everyday life, their coverage of both private and public spaces continues to grow. Accounting for this trend, our approach allows an operator to direct a robot toward a person or location by issuing commands through exocentric RGB camera feeds using a homography-based mapping between robot and camera coordinate spaces. Moreover, we are able to track a person in motion using single track Kalman filter and update the robot's path plan accordingly. This work also examines the potential challenges and limitations associated with using exocentric cameras for this purpose. For validation, live tests and pre-recorded videos of real world scenarios involving people are used to evaluate the correctness and usability of our approach. In our experiments, the base-footprint detector achieved 0.99 precision and 0.90 recall, and the robot reached the positions selected by the operator with a mean error of 26 cm, roughly half of which were caused by localization rather than mapping issues. A user study with nine participants further indicated high perceived correctness (4.55/5), usability (4.78/5), and real world applicability (4.11/5).

Paper Nr: 14
Title:

From Chat to Grasp Using VLM-Controlled Dual Robot Arm

Authors:

Jonas Stehr, Nermeen Abou Baker and Uwe Handmann

Abstract: The growing capabilities of multimodal large language models (LLMs) have resulted in an expanding focus on their use in robotic control tasks. Natural language control simplifies robot operation, and its capability of function calling reduces the implementation overhead. This paper presents a setup for controlling two collaborative robot arms using a modern Vision Language Model (VLM). The VLM acts as the decision-making agent and manages planning, task allocation, environmental and error perception, as well as error correction. Low-level controllers are used to control the robots. This work also provides a graphical user interface (GUI) that allows users to input tasks for the system and control its workflow. The setup uses segmentation methods to identify objects that the robots need to interact with. The study compares two segmentation methods and analyses different interaction tasks to demonstrate the efficacy of a modern multimodal LLM to control robotic arms.

Paper Nr: 34
Title:

Comprehensive Design and High Performance Control of an Autonomous Underwater Vehicle

Authors:

Sri Sai Deep Duduka, Amit Saraf, Ankur Vadlamani, Sumanth Kandikattu and Gopinath G R

Abstract: Autonomous Underwater Vehicles (AUVs) play an essential role in ocean exploration and monitoring, but challenges in efficient control and power management limit their endurance and reliability. This paper presents the development of a custom AUV platform designed and fabricated at Mahindra University Hyderabad, integrating a custom hull, propulsion and electronics system. A key contribution is the implementation of an analytically derived Proportional-Derivative (PD) controller tailored for underwater dynamics, ensuring a stable trajectory under hydrodynamic disturbances. The proposed control framework is portable across AUVs of varying weights, geometries and sizes. In parallel, a custom power management system is designed to optimize energy efficiency and extend operational duration. Parameters for the AUV are obtained through extensive Computational Fluid Dynamics (CFD) simulations, resulting in design of robust controllers and achieve faster dynamics. This paper highlights the effectiveness and generalization of the proposed design and control strategy for future AUVs.

Paper Nr: 37
Title:

Design and Field Evaluation of a Precision Robotic End-Effector for Table-Grade Citrus Harvesting

Authors:

Alaeddin Bani Milhim, Juan Espinoza, Caleb Bahne, The Nguyen and Brendan Chinnock

Abstract: This paper presents the design, implementation, and field evaluation of a preci-sion-oriented robotic end-effector for autonomous citrus harvesting. Targeting ta-ble-grade quality, the proposed system integrates a three-finger gripper, servo-actuated shears, and a dual-axis YZ positioning stage, coordinated via an Ar-duino-based control system to achieve millimeter-level stem-length control with gentle handling. Field trials on 142 fruits demonstrated a 99% end-effector de-tachment success rate with zero damage and an average residual stem length of 0.8 mm; with 74% of samples meeting the study’s precision stem-length target of ≤ 1 mm. Although the average harvest cycle time was 10.8 s, slower than some speed-optimized systems, results highlight a deliberate quality-centric trade-off and a clear path to accelerate motion without sacrificing precision. Compared to prior work, this study reports detailed, quality-relevant metrics, such as residual stem length and stem-diameter distributions, providing new insights and evaluation standards that are rarely reported in the literature. The findings validate the end-effector’s readiness for integration into autonomous harvesting platforms and establish a benchmark for improving speed and overall system efficiency while preserving table-grade fruit quality.

Paper Nr: 41
Title:

Improving Obstacle Avoidance in End-to-End Deep Learning by Incremental Training on Collisions

Authors:

Alexander Seewald

Abstract: Obstacle avoidance is an essential feature for autonomous robots. Although it is possible to train end-to-end deep learning models on this task, performance is not always competitive to specialized sensors, and the necessity for human-generated training data makes building such systems complex and costly. Here, we propose an incremental training method starting at the model described in Seewald (2020). By successively incrementally training the model on stereo images taken just before observed collisions, the collision rate can be significantly reduced after just a few iterations. Additionally, since training is stopped relatively early, computational effort is much lower than for a more traditional full training on the original and the new collision stereo images. No human-generated training data is needed and human intervention is minimal. A test with a model retrained on all data may even indicate that our method actually performs significantly better than full training.

Paper Nr: 49
Title:

Privacy-Preserving Person Identification for Robotics in Built Environments via Garbled Circuits

Authors:

Barış Şahintekin, Zehra Gülru Çam Taşkıran and Nihan Kahraman

Abstract: Privacy-preserving person identification is a critical requirement for robotic systems operating in human-centered built environments. Robots and embedded sensing nodes frequently rely on biometric or behavioral data to enforce access control and adapt their behavior; however, transmitting raw biometric data introduces privacy and regulatory challenges. This paper presents a privacy-preserving identification framework for robotic systems based on garbled circuit secure computation. A pre-trained multi-class logistic regression model operating on soft biometric features, such as gesture and RFID-based motion patterns, is transformed into a garbled digital circuit and deployed to robots or embedded sensing nodes. Identification is performed locally without exposing raw biometric data, and only minimal authorization outcomes are communicated. The proposed approach combines fixed-point quantization and circuit-level optimization to enable efficient secure inference on honest-but-curious platforms. Experimental results demonstrate that the secure implementation achieves an identification accuracy of 85.7%, matching the plaintext model while eliminating raw biometric data transmission. The framework supports both mobile robots and stationary robotic sensors for privacy-aware access control and human–robot interaction.

Paper Nr: 54
Title:

Initial Investigation of Low-Cost ArUco-Marker Based Localization for an Underwater Glider

Authors:

Jun Niel Paquibot, Takayuki Takahashi and Luis Gerardo Canete

Abstract: Underwater gliders (UG) provide an energy-efficient alternative to thruster-based underwater robots, offering a potential solution for traversing large bodies of water such as Lake Inawashiro, which requires long sampling duration for environmental surveying. However, developing specific control systems for an underwater glider requires extensive analysis of glider dynamics, the verification of which is only as effective as the accuracy of the localization system used. While acoustic systems, underwater motion capture systems, or inertial navigation are common, they often suffer from prohibitively high costs, signal interference, or integration drift. Existing studies have shown the potential for ArUco markers for underwater localization, however, these studies often yield errors around 0.1 m despite sensor fusion, indicating a need to streamline the tuning process before underwater deployment. This study proposes a best-effort approach to reducing error for a real-time, vision-based localization system. The proposed system utilizes a Raspberry Pi 4B and Raspberry Pi Camera v3, optimizing intrinsic parameters through least-squares focal length tuning and mitigating lens distortion through strategic frame cropping. To ensure real-time performance, Adaptive Region of Interest (ROI) and Adaptive Blob Size Thresholding algorithms were implemented. This study presents results against a Computer Numerical Control (CNC) machine for validation. The findings present a significant accuracy improvement over existing literature, providing a viable methodology for streamlining marker accuracy optimization as preparation for underwater localization in controlled environments to develop control strategies.

Paper Nr: 16
Title:

CLIP-Driven Visual Servoing Scheme for Transect Following in Coral Reef Monitoring

Authors:

Waseem Akram, Atif Sultan, Muhayy Ud Din, Tarek Taha and Irfan Hussain

Abstract: Monitoring coral-reef health is crucial for marine biodiversity and coastal livelihoods; however, traditional surveys conducted by expert divers or manually operated ROVs are labour-intensive, poorly scalable, and fragile in marine dynamic environment. We present an automated ROV line following pipeline that enables autonomous coral-reef inspection without task-specific training. The method uses a prompt-based vision approach where each ROV camera frame is contrast-enhanced (CLAHE) and processed once by the CLIPSeg vision model with the prompt “line” to produce a line probability map. In addition, a lightweight post-processing stack (morphology and skeletonization) yields a center point estimation, from which we compute yaw (heading) error and lateral offset to drive a clamped, image-based visual servoing controller. We validate the system using a BlueROV2 in a pool setup to emulate a coral-reef environment under low light, waves, and varying depths. The experimental results demonstrated the stable and smooth generation of heading/lateral commands, which maintain line following across different conditions. These results indicate that zero-shot vision, combined with simple control, provides a practical foundation for autonomous, line-guided reef surveys in support of coral health assessment.

Paper Nr: 25
Title:

6D Pose Estimation of Heavily Occluded and Ill-Textured Objects in a Cluttered Workspace

Authors:

Sukhan Lee and Seokjong Hyeon

Abstract: Estimation of the 6D pose of objects in a workspace is crucial for numerous ro-botic tasks in manufacturing and logistics. Recently, significant progress has been made in 6D object pose estimation, particularly through various deep learning-based approaches that utilize RGB- and RGB-D data, supported by benchmark challenges and datasets. Despite the progress, the 6D pose estimation of heavily occluded and ill-textured objects in a cluttered workspace remains an issue. In this study, we proposed an approach, referred to here as the triple-associative point autoencoder (TAP-AE) framework, that addresses 6D pose estimation of heavily occluded and ill-textured objects within the model-based seen-object pose estimation framework from a single view. The proposed TAP-AE approach transforms an occluded-partial point cloud captured from a visible portion of an object into corresponding full-partial point clouds in both the camera and object frames, which could have been obtained without occlusion. In particular, TAP-AE is trained to maintain one-to-one point correspondence between the recon-structed camera frame and the object frame-based full-partial point clouds, result-ing in improved 6D pose accuracy with reduced reconstruction errors and direct coordinate transformations. For the experiment, we trained TAP-AE on a work-space point cloud dataset generated by applying the Isaac Lab physics simulator to laboratory-generated object point clouds. Testing was conducted on both the physics-simulated and the custom-collected real-world datasets. The test results demonstrated an average ADD-S performance of 0.98 (3.6mm/5.7° average er-ror) and 0.94 (11.4mm/13.4° average error) for simulated and real test datasets, respectively. In particular, for the heavily occluded objects with more than 60% of occlusion in the visible surface, the proposed approach achieved an average ADD-S performance of 0.86 (15.0mm/24.5° average error) and 0.85 (14.8mm/26.1° average error) for simulated and real test datasets, respectively, representing the current state-of-the-art performance in occlusion-robustness for ill-textured objects.

Paper Nr: 26
Title:

Emerging Multi-Robot Cooperation through Action Selection Robust to Communication Disruptions

Authors:

Erwan Martin, Antoine Nongaillard and Philippe Mathieu

Abstract: In critical scenarios, especially with hazardous environments, multi-robot systems (MRS) become an unavoidable solution in order to overcome the human operators limitations. However, the efficient communication of those systems in large-scales environment with constraints remain a major challenge. Distributed methods are a promising approaches, but often require a global consensus in order to validate a task-allocation, which is difficult to achieve in an environment where communications are unreliable or limited. In this paper, we present an emergent coordination mechanism where each robot of a MRS takes individual decisions exclusively relying on local knowledge that has been acquired through its perceptions and through proximity communications. This mechanism doesn't require any communication in order for the robots to execute tasks and shows robustness to communication disturbances.

Paper Nr: 35
Title:

Autonomous Camera-Based Navigation for Visual, Markerless Pallet Re-Identification

Authors:

Nicolás Duque-Suárez, Shania Alessandra Martínez Anaya, Jérôme Rutinowski and Alice Kirchheim

Abstract: Efficient pallet identification and tracking is essential for automating warehouse operations, yet existing mobile solutions rely on human intervention, or depend on expensive sensors or computationally intensive SLAM systems. To address these limitations, an integrated approach for pallet detection, tracking, and re-identification of Euro pallets in a warehouse environment, using only an RGB camera is proposed and implemented on an autonomous mobile robot. The system integrates a YOLOv4-based pallet and block detector, used for a pallet-tracking algorithm, and a re-identification module within a ROS2 framework deployed on a camera-equipped AGV. Experiments show that under controlled conditions, the system successfully executes the full task in 97% of cases and correctly re-identifies pallets in 85% of tests. The main sources of error are odometry inconsistencies and the sensitivity of the block detector to the complexity of the scene, which occasionally leads to pallets being mistakenly inventoried instead of recognized. These results demonstrate the feasibility of this approach as a lightweight alternative for pallet tracking and inventory management.

Paper Nr: 53
Title:

Robotic Vessel Using Optimal Autonomous Pilotage Scheduling while Reducing Risk in St-Laurence and Saguenay Inland Waterway

Authors:

Mahassen Ardhaoui, Jean d'Amour Umuhoza, Martin J.-D. Otis and Oussama Jebbar

Abstract: Robotic vessels require integrated decision-making capabilities that extend beyond local vessel control to encompass comprehensive logistics and path planning optimization. Inland waterway navigation, particularly in ecologically sensitive regions such as the St. Lawrence River and Saguenay Fjord, poses significant environmental challenges, including underwater noise pollution, vessel-cetacean proximity risks, and potential collisions. This study presents an intelligent scheduling and route optimization system designed to minimize the environmental impact on cetacean populations while maintaining operational efficiency for robotic bulk carriers. The proposed approach employs a multi-objective genetic algorithm that simultaneously optimizes departure/arrival times, vessel velocities, and path planning under multiple constraints, including fuel consumption, environmental conditions (currents and wind), tidal variations, and cetacean spatiotemporal location. A key contribution of this work lies in the integration of cetacean spatiotemporal location near Tadoussac (Saguenay-St. Lawrence confluence) during high tides—as explicit constraints within the logistics optimization framework. The methodology is validated using historical Automatic Identification System (AIS) data from three commercial bulk carriers operating in the region. Comparative analysis demonstrates substantial reductions in risk indices and acoustic disturbance metrics while achieving comparable or improved transit efficiency relative to conventional routing practices.