Data Loss Prevention Initiative
|ESA Program Overview||ESA Foundation||ESA Artifacts||ESA Initiatives||ESA Tools and Templates||ESA Reference Materials||Glossary|
The Government of Canada (GC) relies heavily on information technology to conduct its day-to-day business activities. The information processed and stored on GC networks and systems ranges in importance and in sensitivity, including private information about Canadian citizens, sensitive information dealing with Canada’s economic and political interests, and classified information related to national security. Unauthorized disclosure of sensitive information could not only result in risks to national security, but also put at risk the well-being of Canadian citizens and other individuals and organizations that do business with the GC.
Data Security is concerned with the confidentiality, integrity and availability of data-in-motion (e.g., network actions), data-at-rest (e.g., data storage), and data-in-use. The loss of control over protected or sensitive GC assets is a serious threat to GC business operations and national security. The GC needs to explore methods that can be used to protect the data against accidental threats, such as employees inadvertently releasing sensitive data, or deliberate threats, such as cyber threats that penetrate outer boundary defences and infiltrate internal systems and networks. The GC must take into account the protection of GC data within, as well as outside the GC boundary, especially if data is stored, transmitted, and/or handled by third parties. Protection of electronically stored information is essential for an organization, to meet not only risk management requirements, but also those of compliance and governance. In addition, cyber threats are not limited to malware and other exploits originating from external threat agents, insider threats, whether accidental or malicious, must also be addressed.
As Shared Services Canada and other GC departments move towards virtual and cloud-based environment for e-mail and applications, traditional network/enclave access control mechanisms need to be improved to prevent data generated by one department from being accessed by other departments or groups within a department in order to ensure mission critical data is secure. These risks need to be addressed in today’s network-centric enterprise while enabling the transition to data-centric operations in the future.
Data Loss Prevention (DLP) Initiative
The purpose of Data Loss Prevention (DLP) is to prevent the unauthorized transfer or exfiltration of sensitive or private information from the GC enterprise IT/IS infrastructure to entities external to the GC. DLP controls are based on policy, and include classifying sensitive data, discovering that data across an enterprise, enforcing controls, and reporting and auditing to ensure policy compliance.
The same technology may also be used to control the flow of sensitive information between organizations within the GC, such as departments. DLP tools may also be referred to as data leak prevention, information loss prevention, extrusion prevention, content filtering, insider threat detection, and insider threat prevention. The GC ESA ConOps Annex A: Data Loss Prevention discusses the operational aspects of the DLP capability and provides important context for the information in this section.
There are many products available that claim to provide DLP capabilities, but a true DLP product is characterized by its ability to perform deep content analysis. For example, a product that simply blocks write access to a USB thumb drive is not a true DLP product - a true DLP product analyzes the content of a file being written to a USB thumb drive and, based on its analysis, determines whether the write operation can proceed. Similarly, full disk encryption is not DLP as any authenticated user may have the ability to read and make unauthorized copies of data protected by the full disk encryption. In addition to requiring data-at-rest encryption, a true DLP product analyzes that data for content that is prohibited on that platform or is not properly protected by logical access controls.
DLP protects against three threats:
- Accidental data exfiltration by insiders,
- Deliberate data exfiltration by malicious insiders, and
- Deliberate data exfiltration by malicious outsiders who managed to penetrate Computer and Network Defence (CND) protections.
A distinguishing characteristic of DLP is content-awareness. A data object examined by a DLP sensor consists of content (the actual data that the user wishes to communicate) and context (information about the data, also known as metadata). Both the content and context of a data object are examined to determine whether the presence of sensitive data within the content is authorized. A product that examines only content may be valuable in its own right, but is not considered a DLP product.
The current scope of the DLP Initiative is for an unclassified Protected environment, although the same principles are applicable to a classified environment. By themselves, commercially-available DLP capabilities are not suitable for mediating traffic between unclassified and classified environments, or between different classified environments (e.g. Secret and Top Secret). In these cases, a high-assurance solution such as a cross-domain guard must be deployed that either complements or integrates DLP functionality. Overall, DLP is not a standalone technology - it must be used together with existing technologies to provide a comprehensive confidentiality solution, as shown in the image on the left.
For more information, please read the GC ESA Data Loss Prevention High-Level Design document.
GC Enterprise DLP Strategic Goals
The deployment of automated DLP capabilities is intended to prevent the loss of sensitive data held by the Canadian government. In addition to threats to national security, data breaches may result in loss of confidence in government by the Canadian public, particularly if breaches lead to financial loss and/or significant inconvenience by Canadian citizens due to resulting identity fraud, for example.
The current state of GC DLP is as follows:
- Only an ad-hoc departmental approach to data protection
- Lack of an automated enterprise data loss prevention capability
- Processes are manual and reactive
- Lack of an enterprise labelling capability
- Lack of an enterprise rights management capability
- Need for consideration of data encryption, access control mechanisms, and/or information management approaches
From experience, it is clear that the existing manual approach to preventing data loss is of limited effectiveness. While upgrading existing CND capabilities may help, there is no realistic alternative to deploying DLP tools that perform deep content analysis. Ultimately, DLP tools that examine data-in-transit, data-at-rest, and data-in-use are required, but the rollout may be prioritized and performed in phases. The highest priority should be DLP of data-in-transit since that is the last line of defence before data leaved the GC. Depending on its effectiveness, the deployment of other types of DLP may be delayed or even abandoned based on a cost/risk assessment.
The GC Enterprise DLP Strategic Goals are:
- Mitigate the impact of threats, such as accidental or deliberate leaks by insiders, or theft by malicious outsiders
- Protect the confidentiality of Data-at Rest, Data-in-Use, and Data-in-Motion
- Develop and implement a target architecture and high-level design that meets GC business needs and requirements
- Policies and procedures for managing information governance
The goal state or outcome is as follows:
- A whole-of-government, automated approach to preventing the unauthorized disclosure of sensitive GC data in compliance with GC legislation and policies
- Provide the ability to discover where sensitive data is stored and where it flows - reducing the number of storage places and egress points reduces the risk that undetected exfiltration can occur
- Provide the ability to block unauthorized flows of sensitive data, both from within the GC to the outside, and between GC departments
- Provide the ability to centrally track and report authorized flows of sensitive data for reporting and auditing purposes
- The combination of network DLP, endpoint DLP, and physical security supports a defence-in-depth multi-layered approach to detecting and blocking attempts to exfiltrated sensitive data
For more information about the GC Enterprise DLP Strategic Goals, please review the GC ESA Data Protection Strategy - DLP Initiative.
DLP Concepts and Architecture
DLP Sensor Types
The image on the right depicts the high-level architecture of a DLP system, including the technical capabilities and the primary classes of users (actors) who interact with the system. There are three types of DLP sensors that may be deployed in a DLP subsystem:
- Network DLP sensors are deployed in network perimeters to analyze data flowing over, and out of, an enterprise network. The type of data processed by a network DLP sensor is known as "Data-in-Motion" (DIM) or "Data-in-Transit" (DIT).
- Storage DLP sensors are deployed on dedicated enterprise data storage devices such as Network Attached Storage (NAS) devices, Storage Area Networks (SAN), and Database Management Systems (DBMS) to detect the presence of sensitive data in files and database records. Unlike endpoints that are capable of running a variety of applicable software, storage devices typically only include software or firmware for performing backup, recovery, and other specialized data management functions. The type of data processed by a storage DLP sensor is known as "Data-at-Rest" (DAR).
- Endpoint DLP sensors are deployed as agents on general-purpose computers. Endpoint DLP sensors monitor data leaving the endpoint over wired and wireless interfaces such as Ethernet, USB, Wi-Fi, Bluetooth, and Near Field Communications (NFC). General-purpose computers include end-user devices (e.g. desktops, laptops, tablets) and application servers. Data actively being processed, including moving to and from external interfaces (e.g. portable storage devices), is known as "Data-in-Use" (DIU). Data resident in non-removable storage on an endpoint (e.g. hard drive) is a form of "Data-at-Rest" (DAR). Capabilities for analyzing DAR on an endpoint may be a subset of those on an enterprise storage device.
- With respect to Endpoint DLP, there is an important difference between data being written to removable storage and data being written to non-removable storage. Data written to a removable storage device (such as a USB thumb drive) must be examined while being written (as data-in-use) on the assumption that the storage device will be removed by the user as soon as the write is complete. In contrast, data being written to a non-removable storage device such as an internal hard drive can be examined after the write is complete (as data-at-rest) as there is little short-term risk of the device being removed. Given the expected high rate of writes to non-removable storage, scanning new and updated files on a periodic basis (e.g. when the endpoint is idle) also minimizes degradation of performance.
The section below can be expanded by clicking on 'Expand' on the far right for more information about the DLP sensor design and components.
The primary user classes (actors) who participate in the DLP process are shown in the image on the right and they consist of:
- End Users create, use, delete, modify, and initiate processing of the information examined by the DLP subsystem.
- The Information System Owner is responsible for the deployment, operation, and maintenance of the DLP subsystem. The Information System Owner does not create DLP policies or handle DLP incidents unless those responsibilities have been explicitly delegated by the Information Owner or Information Steward.
- The Information Owner or Information Steward is responsible for the creation of DLP policies and handling DLP incidents. These responsibilities may be performed with the assistance of the Information System Owner.
- The Chief Information Officer, Human Resources, Legal, Auditor, and other executive personnel provide support and oversight of DLP operations. They may not directly interact with the DLP subsystem. Incidents are escalated to these personnel for review if employee discipline and/or legal action may be warranted.
DLP Sensor Deployments
The implementation process for DLP capabilities takes a phased approach and the notional DLP deployments for different implementation phases are defined in the GC Enterprise Data Loss Prevention Implementation Strategy. Corresponding images illustrate deployment of DLP capabilities within an unclassified Protected environment, although the same principles are applicable to a classified environment. Please read the GC Enterprise Data Loss Prevention Implementation Strategy for details about the five implementation phases of DLP capabilities.
DLP Target Environment
In order to create a series of steps or phases to achieve the DLP goal, the target state must be defined first. In addition to identifying the DLP specific characteristics of this target state, the completion of several initiatives and projects currently underway has an impact on the target state, as do principles and goals shaping the future of the GC IT/IS infrastructure.
Consolidation of IT assets is a significant current thrust of the GC. The initial steps in this process are to co-locate data centres, reduce the number of Internet access points (IAPs), and collapse the myriad of GC networks into a consolidated core to reduce costs. Additional next steps include the deployment of common enterprise applications, further reduction of data centre resources through cloud computing, and adoption of standard end user devices for GC workers. The vision for the consolidated enterprise is a "colourless" core network and centralized enterprise services serving multiple departments, sensitivities, and geographically dispersed data centres and end users.
This includes the availability of higher trusted endpoints that employ hardware roots of trust and information-centric security mechanisms that enforce data access and usage policies at the data level rather than at the data container level.
For example, a long-term goal is to transition the GC Enterprise IT/IS infrastructure from a network-centric architecture to an information-centric architecture with the following key characteristics:
- Consolidation of multiple networks operating at different sensitivity levels and classifications into a single unclassified "core" network,
- Mandatory policy-driven enterprise-wide encryption and selective access to data (data objects and data streams) based on the sensitivity/classification of the data, endpoint attributes, and recipient attributes.
The transition steps outlined in ESA documentation and the associated DLP artefacts are intended to show a superset of the capabilities present over time within the architecture. It is not a mandate to adopt futuristic technological concepts, but to allow for them over time. If robust end points, self-protecting data, and other mechanisms never come to fruition in the GC consolidated IT/IS enterprise, the DLP system described in the HLD and the target architecture shown in the GCIS will provide state-of-the-art protection against unauthorized exfiltration of sensitive GC data. There "futuristic" technological concepts only serve to make the GC enterprise more secure and provide positive control of the data beyond the borders of the GC enterprise.
DLP Target Architecture
The image on the right provides a legend describing the meaning of various symbols in the image below and to the left. The DLP Target Architecture overlay is illustrated in the image on the below. This architecture considers the current functionality in the GC IT/IS baseline as a departure point and incorporates a security architecture evolution with future enterprise technologies based on industry trends and GC business needs.
The image on the left shows the DLP Target Architecture with a GC Core Network connecting an information-centric private (or community) cloud and a set of network-centric enclaves and clouds. This diagram is not intended to be a network diagram, but an abstracted view of the GC IT/IS infrastructure to highlight the relationships between GC and non-GC entities, as well as the support for security enclaves and the evolution to information-centric approaches in the future. The GC Core Network provides a single controlled point of access to the Open Network (the Internet) and public cloud services. The ESA DLP Initiative is focused on the Protected/Unclassified aspects of the GC Enterprise. The SECRET enclaves are shown for competencies and as a likely source of net-centric processing for many years.
All traffic between enclaves and clouds is routed via the GC Core Network. Network DLP sensors are located in the perimeters that surround the GC Core Network. EUDs and servers that process and/or store user data are deployed with Endpoint and Storage DLP capabilities as appropriate. In the image on the right, the Application Services icon is intended to encompass not only application servers, but also file servers, database servers, web servers, network-attached storage, and any other server-like capability that processes user data.
The end state of the target architecture is a robust DLP capability and framework with the hooks to add capabilities as they emerge, evolve data analysis capabilities to support new formats and data sensitivities over time (including information-centricity), and the inclusion of powerful DLP policy management tools to tune system behaviour and response.
The flexibility of the DLP framework is required to handle evolving technology and special cases. Evolving areas of DLP technology include support for cloud computing, unstructured data analysis, mobile device support, hybrid endpoint DLP solutions with profiles for docked and remote operations, information-centric support through automated information rights management, and encrypted traffic analysis. Special cases for the GC include cross security domain aspects of DLP, as some event and event combinations occurring on Protected assets will be classified, resulting in data spills. DLP functions need to support additional functions for the analysis of Protected traffic based on classified data inputs to include prohibited word checks, text matchers, and classified signatures.
Several challenges with the current state of the DLP supply chain include a lack of standards, proprietary approaches resulting in vendor lock-in, lack of coverage for all end device types, and the slow evolution of data formats, types and contents supported by the tools.
DLP Subsystem Context and Dependencies
The DLP subsystem consists of Endpoint, Network, and Storage DLP sensors together with a set of DLP policy, DLP incident, and enterprise management capabilities. The DLP subsystem interfaces with multiple ESA Security Focus Areas (ESFA) components, as shown in the image below on the right-hand side (green boxes).
DLP Subsystem Design and Rationale
DLP Lifecycle Phases
The primary objective of the DLP subsystem is to detect and respond to the unauthorized exfiltration of potentially sensitive data. Enforcing DLP policy by blocking or deleting data in response to an apparent violation is a drastic action that can have significant negative impacts on business operations. DLP products should implement capabilities that enable phased introduction of DLP discovery, detection, and response capabilities that match the DLP data lifecycle shown in the image on the right. The goal of a phased introduction is to refine the configured policy rules over a period of time to minimize the rates of false positives (allowed content incorrectly identified as prohibited) and false negatives (prohibited content incorrectly identified as allowed). Products will ideally support different modes of operation for different classes of data identified by the policy rules:
- Define: An initial set of policy rules are defined that enable collection of potentially sensitive data in the Discover phase. This is initially a manual process that requires business owner and information owners to define sensitive data, agree on marking/labelling standards, and consider the legislative environment for protection of privacy. The process should also encompass incident escalation and workflow rules. Once the rules are complete, they are converted to digital policy statements and configuration data in formats compatible with the selected DLP technical solution.
- Discover: The DLP subsystem is configured to identify and collect information about data, but not report policy violations as DLP incidents. Reported information includes both positive and negative matches. During this phase, the deployment team gains an understanding of the types of data present in different locations within the enterprise, the effectiveness of the DLP digital policy, and refines the DLP digital policy to reduce false positives and false negatives.
- Respond: Once DLP policies are refined to reduce the rate of false positives to an acceptable level, the DLP subsystem is configured to automatically respond to selected incidents. A number of response actions are possible when sensitive or potentially sensitive data is detected. Response actions can be automated or manual. For data that is moving in near real-time (e.g. flowing across a network) where any significant delay is detrimental, response actions should be automated. For data that is not time critical, such as an email passing through a message transfer agent (MTA), a human user can evaluate the incident and determine the response. Three main categories of response action exist. In all cases, the incident must be reported to the DLP incident management function and, if configured, to the originator of the data:
- Do nothing beyond reporting the incident: The transfer of data-in-motion or data-in-use is allowed to complete as normal, and data-at-rest is allowed to remain with existing access controls. A variation of this approach is to ask the originating user to confirm that the data is being used for an authorized purpose in accordance with applicable policy instruments. If the user responds negatively, the data is blocked or erased. The availability of this variation depends on the characteristics of the data, where it is analyzed, and the ability to identify and interact with the originator.
- Block or erase the sensitive data: The transfer of data-in-motion or data-in-use is blocked. It may be deleted or quarantined for manual evaluation. Data-at-rest is either deleted or quarantined for manual evaluation. Quarantining of data-in-motion or data-in-use is only appropriate for certain types of data, such as data that is transferred using a store-and-forward model (e.g. email).
- Protect the sensitive data: The transfer of data-in-motion or data-in-use is allowed to complete, and/or data-at-rest is allowed to remain, after protection has been applied to prevent unauthorized access. A variety of protection mechanisms are possible that can be applied to the sensitive data item(s) only, to the entire content that contains the sensitive data items, or to the content container.
The section below can be expanded by clicking on 'Expand' on the far right for more information about the different types of mechanisms for protecting sensitive data.
Data Analysis Techniques
This section identifies various types of analysis techniques implemented by the DLP industry. Analysis techniques range in complexity from simple mechanical matching to advanced assessment of business intent and behavioural patterns that incorporate artificial intelligence.
The initial definition and validation of DLP digital policy rules that incorporate analysis techniques, and the subsequent refinement of those rules during the Discover and Detect phases of the DLP lifecycle, to minimize false positives and false negatives, are critical to successful DLP deployment. Different analysis techniques are best suited to different types of data, so a thorough understanding of the organization's data is vital in identifying the right combination.
Each type of DLP sensor can use the same techniques for identifying prohibited content. The following techniques are not necessarily mutually exclusive and may overlap, but they provide a general guide to the types of content analysis available. They can be grouped into two basic categories. Fingerprinting is the process of creating a hash value over known sensitive data and registering the hash value with the DLP subsystem. Fingerprinting identifies single instances of known prohibited data. The other techniques rely on rules that essentially search for patterns indicative of sensitive content. They are easier to manage (rules can be defined once, whereas new fingerprints need to continuously registered) but result in higher rates of false positives. For example, an arbitrary nine-digit number that matches the fingerprint of a known SIN is likely to be that SIN. Data identified using fingerprints may be referred to as registered data and data identified using rules as described data.
The defining characteristic of true DLP is the ability to perform deep content analysis, but it is also necessary to examine the environment in which a suspect data item exists to make a determination of whether the content is prohibited. This examination of the environment is known as context analysis of which there are several types.
For more information about different types of content and context analysis, please read the GC ESA Data Loss Prevention High-Level Design.
DLP Target Organizational Constructs
The components of a holistic approach to an enterprise DLP capability are shown in the image on the right. A balance of Policy, Technology, and People & Process is required to provide the supporting framework around the DLP technology to garner the best results:
- Policy: Guidance documentation to drive DLP into the technology, operational, and people aspects of the GC in concert with related areas of IM, IT, Security, and Privacy
- Technology: The architecture and IT components required to provide automated DLP operation in GC IT/IS infrastructure
- People & Process: Definition of DLP related roles and responsibilities, authorities, governance, boundaries, and operational procedures.
For more information about the DLP target environment, architecture, and organizational constructs, please read the GC Enterprise Data Loss Prevention Implementation Strategy.