Changes

AI-Assisted Quality Control of CTD Data (view source)

Revision as of 12:32, 22 December 2022

5 bytes added , 12:32, 22 December 2022

no edit summary

Line 26: Line 26:

Finally, data classification is achieved using once MLP per cluster of CTD scans. The MLP is trained for binary classification to flag CTD as poor-quality (to be deleted) or good quality (to be preserved). The ground truth used to produce the training data is derived from historical instances of quality control, using the decisions made by oceanographers in the past on which scans should be deleted. The deleted scans are far fewer than the preserved scans, making the training data highly imbalanced. This is often problematic in a machine learning setting as it causes the model to learn in a fashion biased towards the majority class. To address this, we apply oversampling to randomly duplicate deleted scans in the training data in order to balance their numbers with the preserved scans.

−

+

[[File:Ml pipeline.png|Three-step process used in the machine learning pipeline.|992x992px]]

Lee.croft

121

edits