This study investigates the application of generative artificial intelligence visual language models for object detection and obstacle recognition in underwater remotely operated vehicles (ROVs). By combining open-source underwater image datasets with images collected by ROVs, we systematically compare the performance of multiple advanced visual language models. The experimental design encompasses three typical underwater scenarios, aquaculture, marine exploration, and environmental monitoring, to evaluate the models' adaptability under varying underwater environmental conditions. We employ four key indicators for quantitative evaluation: accuracy, which reflects the model's ability to minimize false positives; recall, which measures the completeness of its detection of true targets; F1-score, which comprehensively balances the two; and average precision, which assesses the model's positioning accuracy under an overlap threshold of 50%. The results indicate that model performance is significantly influenced by environmental complexity. For instance, in turbid waters, the recall rate of all models decreases by approximately 15%, underscoring the unique challenges presented by underwater scenes. Additionally, we found that the models' ability to recognize small targets is generally inadequate, necessitating further optimization of the feature extraction architecture or the introduction of domain adaptation training in future work.
Next Article in event
A Review of Current Developments in Generative Artificial Intelligence for Underwater Marine Environments
Published:
19 November 2025
by MDPI
in The 1st International Online Conference on Marine Science and Engineering
session Ocean Engineering
Abstract:
Keywords: Underwater object detection, visual language model, generative, edge computing.
