Changes

Towards a Responsible AI Framework for the Design of Automated Decision Systems in DFO: a Case Study of Bias Detection and Mitigation (view source)

Revision as of 12:48, 8 December 2021

1,541 bytes removed , 12:48, 8 December 2021

Line 61: Line 61:

==== '''IBM AI Fairness 360''' ====

−

An open-source toolkit by IBM that helps AI practitioners to easily check for biases at multiple points along their ML pipeline, using the appropriate bias metric for their circumstances. It comes with more than 70 fairness metrics and 11 unique bias mitigation algorithms <ref>IBM Developer Staff, "AI Fairness 360," IBM, 14 November 2018. [Online]. Available: <nowiki>https://developer.ibm.com/open/projects/ai-fairness-360/</nowiki>. [Accessed 28 July 2021].</ref>.

+

An open-source toolkit by IBM that helps AI practitioners to easily check for biases at multiple points along their ML pipeline, using the appropriate bias metric for their circumstances. It comes with more than 70 fairness metrics and 11 unique bias mitigation algorithms <ref name=":1">IBM Developer Staff, "AI Fairness 360," IBM, 14 November 2018. [Online]. Available: <nowiki>https://developer.ibm.com/open/projects/ai-fairness-360/</nowiki>. [Accessed 28 July 2021].</ref>.

== Case Study: Predictive Model for Detecting Vessels’ Fishing Behavior ==

Line 110: Line 110:

==== Bias Detection ====

−

To assess model fairness across different gear types, the data is first partitioned according to the gear type and then False Positive Rate (FPR) disparity ~~[3]~~ is measured. The FPR is given by the percentage of negative instances (of fishing activity) that are mislabeled by the model as being positives instances. The FPR disparity is measured as the greatest difference between the FPR of each grouping of vessels by gear type. The greater this difference is measured to be, the greater the degree of bias and unfairness in the system.

+

To assess model fairness across different gear types, the data is first partitioned according to the gear type and then False Positive Rate (FPR) disparity <ref name=":1" /> is measured. The FPR is given by the percentage of negative instances (of fishing activity) that are mislabeled by the model as being positives instances. The FPR disparity is measured as the greatest difference between the FPR of each grouping of vessels by gear type. The greater this difference is measured to be, the greater the degree of bias and unfairness in the system.

−

In this investigation ~~FairLearn [2]~~, has been applied to implement the FPR disparity measurement experiments. As a point of reference, IBM AI Fairness 360 ~~[3]~~, applies a threshold of 10 on similar disparity metrics as the point beyond which the model is considered to be unfair in an interactive demo [4]. Results from the initial fishing detection model are shown in ~~Table 1~~. Due to an excessively high FPR for the troll gear type, there is an FPR disparity difference of 52.62. These results highlight an unacceptable level of bias present in the model which must be mitigated.

+

In this investigation Fairlearn <ref name=":0" />, has been applied to implement the FPR disparity measurement experiments. As a point of reference, IBM AI Fairness 360 <ref name=":1" />, applies a threshold of 10 on similar disparity metrics as the point beyond which the model is considered to be unfair in an interactive demo [7]. Results from the initial fishing detection model are shown in Figure 6. Due to an excessively high FPR for the troll gear type, there is an FPR disparity difference of 52.62. '''<u>These results highlight an unacceptable level of bias present in the model which must be mitigated</u>'''.

−

+

[[File:Chart 1.png|alt=Results for FPR and accuracy are shown for each gear type. The FPR disparity difference is measured as the difference between the highest and lowest FPR, giving a value of 52.62.|center|thumb|703x703px|'''Figure 6: Results for FPR and accuracy are shown for each gear type. The FPR disparity difference is measured as the difference between the highest and lowest FPR, giving a value of 52.62.''']]

−

~~{| class="wikitable"~~

−

|

−

~~|'''Fixed'~~''

−

|'~~''Longline'''~~

−

~~|'''Pole and Line'''~~

−

~~|'''Purse Seine'''~~

−

~~|'''Trawl'''~~

−

~~|'''Troll'''~~

−

~~|'''Unknown'''~~

−

|-

−

~~|'''False Positive Rate'''~~

−

~~|15~~.98

−

~~|11.78~~

−

|1.41

−

|2.67

−

~~|5.18~~

−

~~|54.03~~

−

|4.62

−

|-

−

~~|'''Accuracy'''~~

−

~~|90~~.25

−

|~~92.95~~

−

|~~98.92~~

−

|~~96.65~~

−

~~|96.63~~

−

|~~90.64~~

−

~~|97.17~~

−

|}

−

~~'''Table 1~~''': Results for FPR and ~~detection~~ accuracy are shown for each gear type. The FPR disparity difference is measured as the difference between the highest and lowest FPR, giving a value of 52.62.

==== Bias Mitigation ====

−

Bias mitigation algorithms implemented in ~~FairLearn~~ and other similar tools can be applied at various stages of the ~~machine learning~~ pipeline. In general, there is a trade-off between model performance and bias such that mitigation algorithms induce a loss in model performance. Initial experimentation has demonstrated this occurrence leading to a notable loss in performance to reduce bias. This can be observed in the results shown in ~~Table 2~~ where a mitigation algorithm has been applied to reduce the FPR disparity to 28.83 at the cost of a loss in fishing detection accuracy.

+

Bias mitigation algorithms implemented in Fairlearn and other similar tools can be applied at various stages of the ML pipeline. In general, there is a trade-off between model performance and bias such that mitigation algorithms induce a loss in model performance. Initial experimentation has demonstrated this occurrence, leading to a notable loss in performance to reduce bias. This can be observed in the results shown in Figure 7 where a mitigation algorithm has been applied to reduce the FPR disparity to 28.83 at the cost of a loss in fishing detection accuracy.

−

{| ~~class~~=~~"wikitable"~~

+

[[File:Chart2.png|alt=Results for FPR and accuracy after bias mitigation via Fairlearn. The FPR disparity difference is 28.83.|center|thumb|691x691px|'''Figure 7: Results for FPR and accuracy after bias mitigation via Fairlearn. The FPR disparity difference is 28.83.''']]

−

|

+

Based on these results, it was determined that efforts on bias mitigation should be focused in the data cleaning and preparation stages of the ML pipeline as this is the earliest possible point where this can be achieved. Through additional exploratory data analysis, notable issues were identified in terms of disproportionate representation across gear types and imbalance between positive and negative instances of fishing activity for some gear types. Through adjustments to the data cleaning process as well as the application of techniques such as data balancing and data augmentation, a new version of the model training data, better suited to the task of bias mitigation, was produced. The results of these modifications can be seen in Figure 8. The FPR has been greatly reduced across all gear types while preserving an acceptable level of fishing detection accuracy. Notably, the FPR disparity difference has been reduced from its original value of 52.62 down to 2.97.

−

~~|'''Fixed'''~~

+

[[File:Chart3.png|alt=Results for FPR and accuracy after improvements were made to the data preparation process. The FPR disparity difference is 2.97.|center|thumb|704x704px|'''Figure 8: Results for FPR and accuracy after improvements were made to the data preparation process. The FPR disparity difference is 2.97.''']]

−

~~|'''Longline'''~~

−

~~|'''Pole~~ and ~~Line'''~~

−

~~|'''Purse Seine'''~~

−

~~|'''Trawl'''~~

−

~~|'''Troll'''~~

−

~~|'''Unknown'''~~

−

|-

−

~~|'''False Positive Rate'''~~

−

~~|13~~.46

−

~~|13.00~~

−

~~|11.68~~

−

~~|10~~.83

−

~~|12~~.33

−

|~~39.52~~

−

|~~10.69~~

−

|-

−

|'''~~Accuracy'''~~

−

~~|93.01~~

−

~~|88.51~~

−

~~|91.75~~

−

~~|91.08~~

−

~~|95.08~~

−

~~|78.15~~

−

~~|95.30~~

−

|}

−

~~'''Table 2'''~~: Results for FPR and accuracy after bias mitigation via ~~FairLearn~~. The FPR disparity difference is 28.83.

−

Based on these results, it was determined that efforts on bias mitigation should be focused in the data cleaning and preparation stages of the ~~machine learning~~ pipeline as this is the earliest possible point where this can be achieved. Through additional exploratory data analysis, notable issues were identified in terms of disproportionate representation across gear types and imbalance between positive and negative instances for some gear types. Through adjustments to the data cleaning process as well as the application of techniques such as data balancing and data augmentation, a new version of the model training data, better suited to the task of bias mitigation, was produced. The results of these modifications can be seen in ~~Table 3~~. The FPR has been greatly reduced across all gear types while preserving an acceptable level of fishing detection accuracy. Notably, the FPR disparity difference has been reduced from its original value of 52.62 down to 2.97.

−

{| ~~class~~=~~"wikitable"~~

−

|

−

~~|'''Fixed'''~~

−

~~|'''Longline'''~~

−

~~|'''Pole~~ and ~~Line'''~~

−

~~|'''Purse Seine'''~~

−

~~|'''Trawl'''~~

−

~~|'''Troll'''~~

−

~~|'''Unknown'''~~

−

|-

−

~~|'''False Positive Rate'''~~

−

|4.90

−

~~|4.59~~

−

~~|1.93~~

−

|2.43

−

|2.33

−

|~~4.44~~

−

|~~1.98~~

−

|-

−

|'''~~Accuracy'''~~

−

~~|88.38~~

−

~~|88.98~~

−

~~|95.65~~

−

~~|96.82~~

−

~~|95.53~~

−

~~|90.22~~

−

~~|95.76~~

−

|}

−

~~'''Table 3'''~~: Results for FPR and accuracy after improvements were made to the data preparation process. The FPR disparity difference is 2.97.

=== Bias Assessment Next Steps ===

Line 217: Line 130:

DFO is in the process of defining guiding principles to guide the development of AI applications and solutions. Once defined, various tools will be considered and/or developed to operationalize such principles.

−

~~== Bibliography ==~~

+

−

~~{| class="wikitable"~~

−

~~|[1]~~

−

~~|A. Jobin, M. Ienca and E. Vayena, "The global landscape of AI ethics guidelines," ''Nature Machine Intelligence,'' p. 389–399, 2019.~~

−

|-

−

~~|[2]~~

−

|S. Bird, M. Dudík, R. Edgar, B. Horn, R. Lutz, V. Milan, M. Sameki, H. Wallach and K. Walker, "Fairlearn: A toolkit for assessing and improving fairness in AI," Microsoft, May 2020. [Online]. Available: <nowiki>https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/</nowiki>. [Accessed 30 November 2021].

−

|-

−

~~|[3]~~

−

~~|IBM Developer Staff, "AI Fairness 360," IBM, 14 November 2018. [Online]. Available: <nowiki>https://developer.ibm.com/open/projects/ai-fairness-360/~~</~~nowiki~~>~~. [Accessed 28 July 2021].~~

−

|-

−

~~|[4]~~

−

~~|"AI Fairness 360 - Demo," IBM Reasearch Trusted AI. [Online]. [Accessed 26 July 2021].~~

−

|-

−

~~|[5]~~

−

~~|V. Fomins, "The Shift from Traditional Computing Systems to Artificial Intelligence and the Implications for Bias," ''Smart Technologies and Fundamental Rights,'' pp. 316-333, 2020.~~

−

|}

−

----[ER1]Earlier you have indicated that the process will follow the 4 previously mentioned steps. I think you missed step 2 here “Identification of protected attributes”. Isn’t it the gear type? Can you add something to elude to that?

Riham.elhabyan

245

edits