Line 8: |
Line 8: |
| The 2018 Data Strategy Roadmap for the Federal Public Service and the ECCC Data and Analytics Strategy emphasized the need to treat data as a strategic asset and to harness the power of data to generate insights. Data science and analytics are data innovation techniques that allow a deep dive into data to uncover hidden patterns, unknown correlations and make explicit predictions considering multiple factors across multiple disciplines. | | The 2018 Data Strategy Roadmap for the Federal Public Service and the ECCC Data and Analytics Strategy emphasized the need to treat data as a strategic asset and to harness the power of data to generate insights. Data science and analytics are data innovation techniques that allow a deep dive into data to uncover hidden patterns, unknown correlations and make explicit predictions considering multiple factors across multiple disciplines. |
| | | |
− | A centre of expertise armed with knowledge of modern data science methodologies creates opportunities to utilize advanced analytics, statistics, machine learning4 and deep learning mechanisms throughout the organization. At ECCC, the Office of the Chief Data Officer (OCDO) under the Strategic Policy Branch is home to the Data Science and Analytics Centre of Expertise (DS&A CoE), which aims to support the department and promote the use of data science and analytics to help extract value from ECCC’s data assets. This charter provides clarity into why the DS&A CoE exists, the service it provides and when or how to engage these services. | + | A centre of expertise armed with knowledge of modern data science methodologies creates opportunities to utilize advanced analytics, statistics, machine learning and deep learning mechanisms throughout the organization. At ECCC, the Office of the Chief Data Officer (OCDO) under the Strategic Policy Branch is home to the Data Science and Analytics Centre of Expertise (DS&A CoE), which aims to support the department and promote the use of data science and analytics to help extract value from ECCC’s data assets. This charter provides clarity into why the DS&A CoE exists, the service it provides and when or how to engage these services. |
| | | |
| ====Mission==== | | ====Mission==== |
Line 73: |
Line 73: |
| * Brainstorm and ideation sessions on how data science and analytics could support the programs or branches. | | * Brainstorm and ideation sessions on how data science and analytics could support the programs or branches. |
| | | |
− | * Development of proof of concepts or prototypes in the areas of data science, data mining and predictive analytics. Examples include machine learning such as neural networks, clustering, and Natural Language Processing7. | + | * Development of proof of concepts or prototypes in the areas of data science, data mining and predictive analytics. Examples include machine learning such as neural networks, clustering, and Natural Language Processing. |
| * Development of dashboards and visualizations. | | * Development of dashboards and visualizations. |
| * Development of methods for computer vision; for example, image classification and Optical Character Recognition. | | * Development of methods for computer vision; for example, image classification and Optical Character Recognition. |
Line 102: |
Line 102: |
| === Examples of DS&A CoE Data Science Projects === | | === Examples of DS&A CoE Data Science Projects === |
| {| class="wikitable" | | {| class="wikitable" |
− | |Project | + | |'''Project''' |
− | |Objective | + | |'''Objective''' |
− | |Methodology | + | |'''Methodology''' |
− | |Business Value | + | |'''Business Value''' |
| |- | | |- |
− | |Soil Moisture QC (MSC) | + | |'''Soil Moisture QC (MSC)''' |
| |Assess the potential of using ML to assist in the QC process of soil moisture data from the Kenaston Network | | |Assess the potential of using ML to assist in the QC process of soil moisture data from the Kenaston Network |
| |Trained Random Forest Classifier models to predict which instances of the dataset should be removed and why | | |Trained Random Forest Classifier models to predict which instances of the dataset should be removed and why |
| |Reduce efforts and time required for quality control | | |Reduce efforts and time required for quality control |
| |- | | |- |
− | |Harmonized System (HS) Code Misuse Detection (EB) | + | |'''Harmonized System (HS) Code Misuse Detection (EB)''' |
| |To detect the misuse of HS Code (an international standard maintained by the World Customs Organization (WCO) that classifies traded products.) | | |To detect the misuse of HS Code (an international standard maintained by the World Customs Organization (WCO) that classifies traded products.) |
| |The model predicts appropriate sets of HS-Code to detect outliers. | | |The model predicts appropriate sets of HS-Code to detect outliers. |
| |Helps reduce the misuse of HS code | | |Helps reduce the misuse of HS code |
| |- | | |- |
− | |Analog Identification Model and IBM Pairs (MSC) | + | |'''Analog Identification Model and IBM Pairs (MSC)''' |
| |MSC working with IBM to develop an Analog Identification Model using AI. | | |MSC working with IBM to develop an Analog Identification Model using AI. |
| |Making predictions by comparing current weather patterns to similar patterns from the past. | | |Making predictions by comparing current weather patterns to similar patterns from the past. |
| |Faster and more accurate weather forecasting using Analogs | | |Faster and more accurate weather forecasting using Analogs |
| |- | | |- |
− | |Automated Data Mining (MSC) | + | |'''Automated Data Mining (MSC)''' |
| |Automate the twitter data extraction using web-scraping tools. | | |Automate the twitter data extraction using web-scraping tools. |
| |Scrape Twitter data using web scraping and data extraction | | |Scrape Twitter data using web scraping and data extraction |
| |Reduce manual workload and improve the overall process | | |Reduce manual workload and improve the overall process |
| |- | | |- |
− | |Propensity for Non-Compliance at NPRI facilities (EB) | + | |'''Propensity for Non-Compliance at NPRI facilities (EB)''' |
| | | |
− | | + | '''<br /> |
− | NOTE: this project was closed due to non-availability of adequate amounts of data. | + | NOTE: this project was closed due to non-availability of adequate amounts of data.''' |
| |Find indicators of non-compliance that are generalizable across industries in NPRI self-reported dataset | | |Find indicators of non-compliance that are generalizable across industries in NPRI self-reported dataset |
| |Regression-Based model to predict propensity percentage for non-compliance | | |Regression-Based model to predict propensity percentage for non-compliance |
Line 137: |
Line 137: |
| Reduce manual workload for data cleaning and analysis | | Reduce manual workload for data cleaning and analysis |
| |- | | |- |
− | |Water Quality Predictor | + | |'''Water Quality Predictor''' |
| |Explore the application of machine learning to predict water quality (fecal coliform) at shellfish sanitation locations. Features considered included precipitation and land use. | | |Explore the application of machine learning to predict water quality (fecal coliform) at shellfish sanitation locations. Features considered included precipitation and land use. |
| |Tested nine different machine learning algorithms. | | |Tested nine different machine learning algorithms. |
Line 201: |
Line 201: |
| The framework below combines the strategic themes with typical project life cycle, demonstrating the metrics relevant to each stage that will facilitate reporting against the MAF criteria. | | The framework below combines the strategic themes with typical project life cycle, demonstrating the metrics relevant to each stage that will facilitate reporting against the MAF criteria. |
| {| class="wikitable" | | {| class="wikitable" |
− | |Strategic Theme / Project Lifecycle | + | |'''Strategic Theme / Project Lifecycle''' |
− | |Intake | + | |'''Intake''' |
− | |Development | + | |'''Development''' |
− | |Testing | + | |'''Testing''' |
− | |Production | + | |'''Production''' |
| |- | | |- |
− | |Committing resources | + | |'''Committing resources''' |
| | colspan="3" |O&M invested | | | colspan="3" |O&M invested |
| | | |
Line 213: |
Line 213: |
| | | | | |
| |- | | |- |
− | |Generating Rigorous Evidence | + | |'''Generating Rigorous Evidence''' |
| |Number of projects onboarded | | |Number of projects onboarded |
| | colspan="2" |Number of projects in flight | | | colspan="2" |Number of projects in flight |
Line 224: |
Line 224: |
| |Number of subsequent business cases | | |Number of subsequent business cases |
| |- | | |- |
− | |Evidence-Informed Decisions | + | |'''Evidence-Informed Decisions''' |
| | colspan="4" |Benefit achieved by phase | | | colspan="4" |Benefit achieved by phase |
| |} | | |} |