Changes

3,936 bytes added , 16:59, 22 February 2023

m

no edit summary

Line 4: Line 4:

== Information architecture ==

−

Information architecture ~~includes both structured~~ and ~~unstructured~~ data. The best practices and principles aim to support the needs of a business service and business capability orientation. To facilitate effective sharing of data and information across government, information architectures should be designed to reflect a consistent approach to data, such as the adoption of federal and international standards. Information architecture should also reflect responsible data management, information management and governance practices, including the source, quality, interoperability, and associated legal and policy obligations related to the data assets. Information architectures should also distinguish between personal and non‑personal data ~~and~~ information as ~~the~~ collection, use, sharing (disclosure), and management ~~of personal information~~ must respect the requirements of the ''Privacy Act'' and its related policies.

+

Information architecture is defined as the management and organization of data for a business. The best practices and principles aim to support the needs of a business service and business capability orientation. To facilitate effective sharing of data and information across government, information architectures should be designed to reflect a consistent approach to both structured and unstructured data, such as the adoption of federal and international standards. Information architecture should also reflect responsible data management, information management and governance practices, including the source, quality, interoperability, and associated legal and policy obligations related to the data assets. Information architectures should also distinguish between personal and non‑personal data. How personal information is treated such as its collection, use, sharing (disclosure), and management must respect the requirements of the ''[https://laws-lois.justice.gc.ca/eng/ACTS/P-21/index.html Privacy Act]'' and its related policies. Under this paragraph is a model of Data Architecture demonstrating the core components that data flows through in an enterprise to obtain insights and analytics.

+

[[File:Generic Data Architecture Model.png|800px|center]]

+

Data Product: A service or device that collects, processes, and stores data for a business. Data producers also monitor the data obtains to ensure the quality of the data asset.

+

Data Source: A data source is made up of fields and groups. In the same way that folders on your hard disk contain and organize your files, fields contain the data that users enter into forms that are based on your form template, and groups contain and organize those fields.

+

Data Integration: Data integration is the process for combining data from several disparate sources to provide users with a single, unified view.

+

Integration is the act of bringing together smaller components into a single system so that it's able to function as one. And in an IT context, it's stitching together different data subsystems to build a more extensive, more comprehensive, and more standardized system between multiple teams, helping to build unified insights for all.

+

Data Lake: A data lake is a centralized repository that ingests and stores large volumes of data in its original form. The data can then be processed and used as a basis for a variety of analytic needs. Due to its open, scalable architecture, a data lake can accommodate all types of data from any source, from structured (database tables, Excel sheets) to semi-structured (XML files, webpages) to unstructured (images, audio files, tweets), all without sacrificing fidelity. The data files are typically stored in staged zones—raw, cleansed, and curated—so that different types of users may use the data in its various forms to meet their needs. Data lakes provide core data consistency across a variety of applications, powering big data analytics, machine learning, predictive analytics, and other forms of intelligent action.

+

Data Mart: A data warehouse is a system that aggregates data from multiple sources into a single, central, consistent data store to support data mining, artificial intelligence (AI), and machine learning—which, ultimately, can enhance sophisticated analytics and business intelligence. Through this strategic collection process, data warehouse solutions consolidate data from the different sources to make it available in one unified form.

+

A data mart (as noted above) is a focused version of a data warehouse that contains a smaller subset of data important to and needed by a single team or a select group of users within an organization. A data mart is built from an existing data warehouse (or other data sources) through a complex procedure that involves multiple technologies and tools to design and construct a physical database, populate it with data, and set up intricate access and management protocols.

+

Data Consumers: Data consumers are services or applications, such as Power BI or Dynamics 365 Customer Insights, that read data in Common Data Model folders in Data Lake Storage Gen2. Other data consumers include Azure data-platform services (such as Azure Machine Learning, Azure Data Factory, and Azure Databricks) and turnkey software as a service (SaaS) applications (such as Dynamics 365 Sales Insights). A data consumer might have access to many Common Data Model folders to read content throughout the data lake. If a data consumer wants to write back data or insights that it has derived from a data producer, the data consumer should follow the pattern described for data producers above and write within its own file system.

+

Outlined in the points below, are objectives to be fulfilled in order to maintain information architecture standards.

=== Collect data to address the needs of the users and other stakeholders ===

Line 61: Line 80:

* Data quality mechanism

Tools:

−

* Data Foundation – Implement (Leverage the standard definition)

+

* Data Foundation – Implement (Leverage the standard definition)

* Data Catalogue

* Benefits Knowledge hub

Line 118: Line 137:

Tools:

* Target State

−

* Data Foundation – Implement (Leverage the standard definition)

+

* Data Foundation – Implement (Leverage the standard definition)

* Data Catalogue

* Benefits Knowledge hub

Line 125: Line 144:

* contribute to and align with enterprise and international data taxonomy and classification structures to manage, store, search and retrieve data

How to achieve:

−

* Summarize the alignment to departmental/GC:

+

* Summarize the alignment to departmental/GC:

* Data taxonomy structure

* Data classification structure

Tools:

−

* Data Foundation – Implement (Leverage the standard definition)

+

* Data Foundation – Implement (Leverage the standard definition)

* Data Catalogue

* Theoretical Foundation

Line 139: Line 158:

Organizations should be able to adhere to ethical guidelines on data sharing to address and meet emerging standards and legislative requirements. It is an organization’s responsibility to apply transparency and to respect how data is used within the organization. Using and sharing data in an ethical manner can build trust between the public and the organization. Failing to prioritize privacy, security, consent, and ownership of data can negatively harm an organization’s reputation and credibility (and create the risk of extinction). To share data ethically and legally, an organization must request participants’ consent. How personal data will be used and shared must be communicated transparently to avoid misleading anyone. Furthermore, to keep data private and more generic for future sharing purposes, it can be anonymized by removing participant’s tombstone information such as name, address, and occupation. If data anonymization is considered, it is ideal to plan it during the collection phase. It is necessary to inform third party readers when data has been anonymized. This may be done by using markings in the text for contents that have been previously removed. Additionally, an original data repository copy should always be kept separately and secured to keep a record of all data that has been anonymized in the final product. Third party readers should have valid reasons and the right qualifications to access the original data to ensure data is treated in a careful manner. Data must not be shared when: there is a conflict of interest with the need to protect personal identities; when an organization does not have ownership of the data; and when releasing the data presents a security risk.

−

* share data openly by default as per the ''Directive on Open Government and Digital Standards'', while respecting security and privacy requirements; data shared should adhere to existing enterprise and international standards, including on data quality and ethics

+

* share data openly by default as per the ''Directive on [https://www.tbs-sct.canada.ca/pol/doc-eng.aspx?id=28108 Open Government] and Digital Standards'', while respecting security and privacy requirements; data shared should adhere to existing enterprise and international standards, including on data quality and ethics

How to achieve:

* Summarize how the architecture supports sharing data openly by default as per Directive on Open Government and Digital Standards given:

Line 147: Line 166:

* Ethics

Tools:

−

* Data Foundation – Implement (Leverage the standard definition)

+

* Data Foundation – Implement (Leverage the standard definition)

* Data Catalogue

* Benefits Knowledge Hub

Line 159: Line 178:

* ensure data formatting aligns to existing enterprise and international standards on interoperability; where none exist, develop data standards in the open with key subject matter experts

How to achieve:

−

* Summarize how the architecture utilises existing enterprise and international data standards

+

* Summarize how the architecture utilises existing enterprise and international data standards

* Summarize how the architecture has developed any data standards through open collaboration with key subject matter experts and the Enterprise Data Community of Practice.

Tools:

Line 170: Line 189:

* ensure that combined data does not risk identification or re‑identification of sensitive or personal information

How to achieve:

−

* Summarize how the architecture ensures the aggregation and combing of data does not pose a risk to information sensitivity or personal information

+

* Summarize how the architecture ensures the aggregation and combing of data does not pose a risk to information sensitivity or personal information

Iaketepe

16

edits

Changes

GC Enterprise Architecture/Framework/DataGuide (view source)

Revision as of 16:59, 22 February 2023