In order to validate the performance of the model, we have performed experimentation using historical CTD data covering the years 2013-2021. We use a 5-fold cross-validation methodology to ensure robustness in the measured results.
+
+
It is important that the model is able to effectively classify both the scans to be deleted as well as the scans to be preserved, or in the context of machine learning, the positive and negative samples. While the performance for classification of positive and negative samples can be measured separately, we have configured the model to balance performance across these two goals. Therefore, we report only the overall accuracy of the model here for simplicity. The global accuracy achieved by the model is roughly 92.6% measured against the historical decisions made by oceanographers. This demonstrates that the model has learned to effectively observe the patterns indicative of poor-quality scans and replicate the decision-making used by oceanographers with a high degree of accuracy.
+
+
It is also important to note that the quality of the model flags is impacted by the depth of the scans being processed. The shallowest depths are the most difficult to handle as the greatest variations in the physical properties of the water occur near the surface. A breakdown of model performance by depth ranges can be seen in the figure on the right. The first 50 meters are quite challenging, resulting in a model accuracy of roughly 88.7%. While the scans beneath 500 meters are more regular, resulting in a model accuracy of roughly 94.1%.
+
+
Obtaining perfect accuracy will likely never be obtainable as there is uncertainty in the decision-making even for oceanographers. There are many complex factors at play influencing the data that is ultimately recorded such as choppy water causing the scanning equipment to descend irregularly, winds and strong weather conditions causing mixing of waters near the surface, and currents causing mixing of waters under the surface. As the distinction between proper and corrupted scan data can be very difficult to identify under these conditions, the human decision-making process is not guaranteed to be perfect. This in fact highlights a potential area where the development of more mature AI models that can exploit additional information on these factors may have the potential to lead to automated approaches that could eventually augment the human decisions to reach higher overall accuracy.
+
+
[[File:Performance.png|alt=|right|477x477px]]
+
+
−
[[File:Performance.png|Model performance and dataset histogram over the depth range from which CTD scans are collected.]]