Changes

17,875 bytes added ,  09:31, 20 January 2023
Line 3: Line 3:  
[[File:Charter image.jpg|1000px|center]]
 
[[File:Charter image.jpg|1000px|center]]
   −
===Mission===
+
=== Executive Summary ===
 +
The 2018 Data Strategy Roadmap for the Federal Public Service and the ECCC Data and Analytics Strategy emphasized the need to treat data as a strategic asset and to harness the power of data to generate insights. Data science and analytics are data innovation techniques that allow a deep dive into data to uncover hidden patterns, unknown correlations and make explicit predictions considering multiple factors across multiple disciplines.
 +
 
 +
A centre of expertise armed with knowledge of modern data science methodologies creates opportunities to utilize advanced analytics, statistics, machine learning4 and deep learning mechanisms throughout the organization. At ECCC, the Office of the Chief Data Officer (OCDO) under the Strategic Policy Branch is home to the Data Science and Analytics Centre of Expertise (DS&A CoE), which aims to support the department and promote the use of data science and analytics to help extract value from ECCC’s data assets. This charter provides clarity into why the DS&A CoE exists, the service it provides and when or how to engage these services.
 +
 
 +
====Mission====
 
The mission of the DS&A COE is to foster a data-driven culture in the department through the use of data innovation.
 
The mission of the DS&A COE is to foster a data-driven culture in the department through the use of data innovation.
 
<br>
 
<br>
===Vision===
+
====Vision====
 
ECCC generates insights through data science techniques to derive greater public value from data and improve services and operations.
 
ECCC generates insights through data science techniques to derive greater public value from data and improve services and operations.
 
<br>
 
<br>
===Mandate===
+
====Mandate====
 
OCDO’s DS&A CoE works with branches and programs in the areas of business intelligence, evidence-based decision-making, and science-specific analytics to: <br>
 
OCDO’s DS&A CoE works with branches and programs in the areas of business intelligence, evidence-based decision-making, and science-specific analytics to: <br>
 
* Promote the potential uses for data science and analytics to support decision-making and priorities and communicate the business value of data science and analytics prototypes.
 
* Promote the potential uses for data science and analytics to support decision-making and priorities and communicate the business value of data science and analytics prototypes.
Line 36: Line 41:  
<br>
 
<br>
 
[[File:Innovation goals.png|750px|center]]
 
[[File:Innovation goals.png|750px|center]]
 +
 +
== Structure and Governance ==
 +
 +
=== Team structure and interdependencies  ===
 +
 +
==== Core Team ====
 +
The OCDO DS&A CoE is an interdisciplinary team of data scientists with expertise in applying artificial intelligence, machine learning, and other data science methods to support cross-horizontal branch initiatives, allowing the department to become a leader in leveraging its data assets. We have many areas of specialization, including applying computer programming and science knowledge, mathematics, statistics, data integration, data storytelling or visualization and machine learning algorithms. The team consists of EC (economist) and PC (physical scientist) classification groups.
 +
 +
==== Interdependencies ====
 +
As a CoE that promotes and demonstrates innovation, the DS&A CoE has dependencies on others in order to be successful. We rely on other business units for our IT infrastructure needs; for example, the Office of the Chief Information Officer (OCIO) for cloud computing. The DS&A CoE is not responsible for operationalizing successful data science proof of concepts. If a proof of concept is going to be promoted to production, subject knowledge experts in the branches and programs would need to engage OCIO to discuss the necessary steps for operationalization. The DS&A CoE is available to support operationalization by documenting the methodology and approaches taken in a proof of concept. Data readiness and data management is a broader aspect that is not under the direct responsibility of the DS&A CoE.  
 +
 +
=== Identified Challenges ===
 +
We identify the following challenges in the success of the DS&A CoE:
 +
 +
* '''Innovation Risk:''' All innovation requires some level of risk acceptance. Some projects may not be able to proceed through all stages of the lifecycle (development, testing, production).
 +
** '''Mitigation Approach:''' DS&A CoE documents approaches taken and any challenges. This knowledge is applied during the ideation and intake phases of future work.
 +
* '''Process maturity:''' Developed proof of concepts require the client to assume ownership and decide on how best to integrate. The client may not be prepared or able to fully modernize their processes to incorporate the innovation. If a client wants a PoC to be operationalized, subsequent steps may require support from OCIO.
 +
** '''Mitigation Approach:''' Measures are taken at project intake to understand and highlight any limitations from the client’s business environment.
 +
* '''Client maturity:''' The proof of concepts developed by the DS&A CoE require the client to be prepared to support and use the end product. The DS&A CoE provides documentation detailing the methodologies and approaches but does not provide training on data science techniques and computer programming.
 +
** '''Mitigation Approach:''' Measures are taken at project intake to understand any limitations or opportunities to regarding the client’s knowledge, capacity, and resource level. DS&A CoE does support the community of data scientists at the department (GCconnex ECCC Data Science and Analytics Community) and can provide support on terminology for data science recruitment.
 +
* '''Data readiness:''' Not all data is ready for data science and analytics techniques to be applied.  
 +
** '''Mitigation Approach:''' The DS&A CoE will evaluate the state of data readiness (including quality and accessibility) as an element of the intake process for a project request.
 +
* '''Culture-related barriers:''' New techniques to understand and visualize data are sometimes met with resistance.  
 +
** '''Mitigation Approach:''' The DS&A CoE will prioritize explaining the concepts and the performance metrics on a given data science technique to support the discussion and decisions around the innovation.
 +
 +
=== DS&A CoE Services, Project Categorization, and Intake ===
 +
The DS&A CoE provides the following services:
 +
 +
* Brainstorm and ideation sessions on how data science and analytics could support the programs or branches.  
 +
 +
* Development of proof of concepts or prototypes in the areas of data science, data mining and predictive analytics. Examples include machine learning such as neural networks, clustering, and Natural Language Processing7.
 +
* Development of dashboards and visualizations.
 +
* Development of methods for computer vision; for example, image classification and Optical Character Recognition.
 +
* Provide expertise and advice on data science and data analytics best practices and or recruitment.
 +
* Raise awareness on the Directive on Automated Decision Making.
 +
 +
 +
All DSA CoE initiatives are '''categorized''' into the following areas:
 +
 +
* Innovation and Experimental
 +
* Administrative
 +
* Environmental Program
 +
* Data Science and Analytics Capacity Augmentation
 +
 +
 +
For a request to be made to the DS&A CoE, a client can contact the manager of the DS&A CoE and describe their needs. The manager will then assess the following components:
 +
 +
* If the scope of the request fits within the DS&A COE service offering.
 +
* The type of project being requested.
 +
* If the DS&A CoE has available expertise and capacity to undertake the project.
 +
* If there is a quantifiable business impact.
 +
* If the potential solution has technical feasibility as a proof of concept or prototype.
 +
 +
* If the data required is available and of sufficient quality.  
 +
 +
 +
Upon acceptance of the project concept, the client will be asked to review and approve a Project Intake form with an assigned DS&A CoE resource. The resource will discuss options and communicate project details and milestones.  
 +
 +
Project requests that fall under the expertise of the Office of the Chief Information Officer will be referred to an appropriate contact. The DS&A CoE is not responsible for the operational run or maintenance and is not responsible for long term data storage. Proof of concepts and prototypes are delivered to the client in the form of dashboards, programming code and documentation on methods and results. A successful data science and analytics innovation project requesting operationalization will need to contact the OCIO.  
 +
 +
=== Examples of DS&A CoE Data Science Projects ===
 +
{| class="wikitable"
 +
|Project
 +
|Objective
 +
|Methodology
 +
|Business Value
 +
|-
 +
|Soil Moisture QC (MSC)
 +
|Assess the potential of using ML to assist in the QC process of soil moisture data from the Kenaston Network
 +
|Trained Random Forest Classifier models to predict which instances of the dataset should be removed and why
 +
|Reduce efforts and time required for quality control
 +
|-
 +
|Harmonized System (HS) Code Misuse Detection (EB)
 +
|To detect the misuse of HS Code (an international standard maintained by the World Customs Organization (WCO) that classifies traded products.)
 +
|The model predicts appropriate sets of HS-Code to detect outliers.
 +
|Helps reduce the misuse of HS code
 +
|-
 +
|Analog Identification Model and IBM Pairs (MSC)
 +
|MSC working with IBM to develop an Analog Identification Model using AI.
 +
|Making predictions by comparing current weather patterns to similar patterns from the past.
 +
|Faster and more accurate weather forecasting using Analogs
 +
|-
 +
|Automated Data Mining (MSC)
 +
|Automate the twitter data extraction using web-scraping tools.
 +
|Scrape Twitter data using web scraping and data extraction
 +
|Reduce manual workload and improve the overall process
 +
|-
 +
|Propensity for Non-Compliance at NPRI facilities (EB)
 +
 +
 +
NOTE: this project was closed due to non-availability of adequate amounts of data.
 +
|Find indicators of non-compliance that are generalizable across industries in NPRI self-reported dataset
 +
|Regression-Based model to predict propensity percentage for non-compliance
 +
|Create cleaner NPRI data and help generalizable features across industries.
 +
 +
Reduce manual workload for data cleaning and analysis
 +
|-
 +
|Water Quality Predictor
 +
|Explore the application of machine learning to predict water quality (fecal coliform) at shellfish sanitation locations. Features considered included precipitation and land use.
 +
|Tested nine different machine learning algorithms.  
 +
|Synthesize data from various sources and provide predictive capabilities on a water quality parameter.  
 +
|}
 +
 +
=== Future Work and Opportunities ===
 +
With its inception in 2019, the DS&A CoE is relatively new to the department and recognizes there are opportunities for growth. The following topics have been noted in discussions with partners and clients on areas where we could expand our service offerings to the department:
 +
 +
* Advice and expertise on the ethical use of AI.
 +
* Support to the department on the Directive on Automated Decision Making.  
 +
 +
* Setting the standards and best practices on data science and analytics projects.
 +
* Identify business process improvements to automate data flows and analytics capabilities.
 +
* Support to the branches to clean and transform data.  
 +
 +
In addition, governance, processes, and roles / responsibilities in the data innovation space continue to evolve in the department.  
 +
 +
Areas for improvement include:
 +
 +
* Guidance on governance strategy for different data science projects at ECCC to enable scalability.  
 +
* Understand Machine Learning Operations (MLOps) roles and responsibilities with Office of the Chief Information Officer.
 +
* Develop a funding model to support innovation and interactions with the branches and programs.
 +
 +
== Metrics ==
 +
Metrics are a critical component of this CoE because they speak to how we measure success. They give us insight into the growth, impact, and effectiveness of the work we are doing and allows us to adapt and evolve in quick and meaningful ways.
 +
 +
Our metrics strive to define the DS&A CoE by leveraging a comprehensive framework looking at tracking projects and innovation throughout the entire lifecycle, aligning outcomes and the success criteria to the Innovation Area of Management (out of the 2022-23 Management Accountability Framework (MAF)) for visibility and transparency.
 +
 +
The metrics below are used to track progress and are based on industry leading practices.
 +
 +
==== Initiation metrics ====
 +
 +
* Proof of concept or dashboard in a given fiscal year
 +
* Number of projects onboarded.
 +
 +
* Number of projects presented to executive level decision making committees.
 +
* Number of projects categorized as innovation projects.
 +
* Ratio of innovation to total projects.
 +
* Number of stakeholders engaged during ideation.
 +
* Source of funds
 +
 +
* Number of FTEs committed
 +
* Expected benefit
 +
 +
==== Execution metrics ====
 +
 +
* Project Request to Execution Lifecycle Time  
 +
 +
* Time spent at phase
 +
* Number of projects in flight
 +
* Capital invested  
 +
* Actual vs. Budget
 +
 +
==== Closure metrics ====
 +
 +
* Number of projects closed
 +
* Number of projects submitted for operationalization funding
 +
* Achieved benefits / Outcomes for Canadians or public servants
 +
 +
Given existing constraints and governance processes, the metrics below will be prioritized in the short and medium term, with the expectation to increase data collection and tracking of additional measures over time.
 +
 +
The framework below combines the strategic themes with typical project life cycle, demonstrating the metrics relevant to each stage that will facilitate reporting against the MAF criteria.
 +
{| class="wikitable"
 +
|Strategic Theme / Project Lifecycle
 +
|Intake
 +
|Development
 +
|Testing
 +
|Production
 +
|-
 +
|Committing resources
 +
| colspan="3" |O&M invested
 +
 +
Number of FTEs committed
 +
|
 +
|-
 +
|Generating Rigorous Evidence
 +
|Number of projects onboarded
 +
| colspan="2" |Number of projects in flight
 +
 +
 +
Number of projects presented to executive level
 +
 +
 +
Number of projects closed
 +
|Number of subsequent business cases
 +
|-
 +
|Evidence-Informed Decisions
 +
| colspan="4" |Benefit achieved by phase
 +
|}
 +
The qualitative measures will be enriched by qualitative evidence and lessons learned gathered from each initiative, continuously improving the DSA CoE service to branches, directorates, and units, and strengthening the interactions across ECCC.
 +
 +
== Appendix ==
 +
 +
=== Criticality of Terminology ===
 +
Data has never been as important or integrated into our professional and person lives as it is today. The availability of data, as well as tools for visualizing, sharing, and collaborating, present massive opportunities for ECCC. It also presents new and significant challenges. The primary challenge is the lack of a common vocabulary around data. This charter will provide a reference for common terminology around data to help streamline the conversations throughout ECCC around data science and analytics. It will include frequently used terminologies with common and industry aligned language.
 +
 +
=== Terminology  ===
 +
{| class="wikitable"
 +
|Term ID
 +
|Term Name
 +
|Term Description
 +
|-
 +
|ToR.01
 +
|Analytics
 +
|Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns toward effective decision-making. It can be valuable in areas rich with recorded information; analytics relies on the simultaneous application of statistics, computer programming, and operations research to quantify performance. ​(Wikipedia - Analytics, n.d.)​
 +
|-
 +
|ToR.02
 +
|Analytics vs analysis
 +
|Data analysis focuses on the process of examining past data through business understanding, data understanding, data preparation, modeling and evaluation, and deployment. It is a subset of data analytics, which takes multiple data analysis processes to focus on why an event happened and what may happen in the future based on the previous data. Data analytics is used to formulate larger organization decisions.
 +
 +
 +
Data analytics is a multidisciplinary field. There is extensive use of computer skills, mathematics, statistics, the use of descriptive techniques and predictive models to gain valuable knowledge from data through analytics.
 +
|-
 +
|ToR.03
 +
|Data Science
 +
|Data Science is an interdisciplinary field that uses scientific methods and algorithms to extract information and insights from diverse data types. It combines domain expertise, programming skills and knowledge of mathematics and statistics to solve analytically complex problems.​ (Statistics Canada - Data science terminology, n.d.)​
 +
|-
 +
|ToR.04
 +
|Projects
 +
|Proof of concepts, prototypes, or visualizations in the form of shareable code, algorithms, results, and dashboards.
 +
|-
 +
|ToR.05
 +
|Artificial Intelligence
 +
|Artificial intelligence is a field of computer science dedicated to solving cognitive problems commonly associated with human intelligence such as learning, problem solving, visual perception and speech and pattern recognition.​ (Statistics Canada - Data science terminology, n.d.)​
 +
|-
 +
|ToR.06
 +
|Machine Learning
 +
|"Machine learning is the science of getting computers to automatically learn from experience instead of relying on explicitly programmed rules and generalizing the acquired knowledge to new settings."
 +
 +
United Nations Economic Commission for Europe's Machine Learning Team (2018 report)
 +
 +
The use of machine learning in official statistics.
 +
 +
In essence, Machine Learning automates analytical model building through optimization algorithms and parameters that can be modified and fine-tuned.​ (Statistics Canada - Data science terminology, n.d.)​
 +
|-
 +
|ToR.07
 +
|NLP - Natural language processing
 +
|Natural language processing (NLP) is a method to translate between computer and human languages. It is a method of getting a computer to understandably read a line of text without the computer being fed some sort of clue or calculation. In other words, NLP automates the translation process between computers and humans.​ (Techopedia - NLP, 2022)​
 +
|-
 +
|ToR.08
 +
|Centre of Expertise (CoE)
 +
|A centre of expertise (CoE) is a single team that focuses the vision, strategy, and infrastructure for a discipline. This is particularly useful for data science because of the niche skills and technology the discipline requires. The CoE team typically acts as an internal consultant, working with multiple divisions in the organization to identify and exploit opportunities in data science. A centre of expertise may not be the only team wielding data science in an organization, but acts as a leader, innovator, and standards setter. Additionally, a successful CoE team will often serve as a learning resource for individuals practicing data science outside of the team.​ (Logic 2020 - Data Science CoE, n.d.)​
 +
|-
 +
|ToR.09
 +
|Data management
 +
|Data Management is the development, execution, and supervision of plans, policies, programs, and practices that deliver, control, protect, and enhance the value of data and information assets throughout their lifecycles.​ (DAMA-I DMBoK2, n.d.)​
 +
|}