CCDIC Refining Strategic Data Management Initiative

From wiki
Revision as of 11:33, 23 March 2020 by William.taylor (talk | contribs) (it to them)
Jump to navigation Jump to search

Welcome to the CCDIC Refining Strategic Data Management Initiative page

What follows is the story of the initiative’s investigations, experiments and findings.

I, William (Bill) Taylor, BA joined the PPMG section of the Strategic Issues and Integrated Management division of the Center for Communicable Disease and Infection Control in October of 2019.

October

During October, I attempted to introduce the team to a User Story / Scrum / Sprint process.  However, it soon emerged, that the Team did not have the time to contribute to this issue solution process.

Hoping that the workload would diminish and allow the staff to get involved I continued the process alone eventually completing 17 Sprints successfully and generating 30+ Business Analyst artifacts: Reports. Presentations, spreadsheets, databases, studies and recommendations.

The User Stories captured many of the issues, pain points and suggestions for improvements from staff as well as contributions from Audits and Evaluations. During the course of the initiative, I went through all of the issues: investigating and developing solutions as well as carefully tracking their development types (User eXperience, Integration, Data Management, Data Gathering and/or Reporting).

During the month I completed Systems, Software and Data inventories in order to determine what was in place.  This set of facilities was all we had to work with. 

I also completed a Reports inventory and data model study building on existing work generated by the Team’s exposure to Lean concepts and methods. This work allowed me to move quickly along the lines of suggestion included in the work towards an integrated, local data repository.

I created a presentation extoling the virtues of “decoupling” the data gathering function from the reporting function.  The idea needed to be combined with other solutions in order to meet the local need.  But, it seemed like a move in the right direction with respect to improving the culture.

I began a search for a repository capable of providing the data integrity required by the Performance Measurement System.  I met with the Data Hub team to gather more information about their goals, standards and views on Protected “B” storage issues and future solutions.  They could not provide much information about the facilities which would be in place in the future I was attempting to move towards in any of our solution designs.

I explored the availability of Protected “B” storage options in PHAC.  I found that decisions had not been made about facilities yet.  I explored cloud options by reviewing Treasury Board bids and found there was a bid in progress that would define a group of providers and services to be delivered. This implied that these cloud services in the future would be compliant easing our improvement design concerns about privacy.

November

I began to explore more obscure data gathering opportunities based on the facilities available in the inventories.  One method was based on ODBC connections established within workbooks. These “connections” appear as “live” sheets in the workbook on the connecting PC.  Inputs are received centrally without emailing worksheets. My testing did not include large scale tests, but, the method works well and in an intuitive manner.  I produced a presentation demonstrating its value to data collection efforts.

I reviewed the validation performed in the data gathering spreadsheets sent to PPMG staff.  The current solution only performed 20% of the tests that would minimize data cleaning before the data can be imported into a database.  Something better was required.  I began to investigate Visual Basic solutions.

On Nov 11, submitted a draft of the plan going forward.  The plan contained possible configurations and data collection methods based on the current facilities.  By way of contrast I developed the “Big” solution.  This contained all of the facilities I thought would be required to support a PHAC side RBM effort.  By analysing the gap, I was able to deduce what changes would have to be put in place over the next few years, who the likely providers of facilities would be and what form the facilities would take.  I began basing all of my improvement plans on this information in lieu of information from PHAC which was said to be “coming”.

It was becoming apparent that integration and harmonization was going to be important over the next year as silos prepared for their exposure to the new “open”, “sharing” world in which no one is a neighbour, rather, we all share the same “room”.

I had identified a local requirement to manage planning, financial and operational data in PPMG.  Now, how could the planning system be made to appear more like the financial system for which the Team had an intuitive understanding based on its hierarchical, cascading model?

To that end, I carefully attempted to inject information into a discussion intended to resolve the nomenclature issues between the CCDIC PMR planning reporting system and the Branch level BOP reporting system.  But, I found a lively organic discussion was already in place driven by a desire to include local planning information in the BOP data gathering spreadsheet.  I stepped back realizing that as long as we treated the data objects appropriately, it did not matter what we call each.  The team resolved the issue and moved on.  However, discussions of the two data models together can be very difficult because the equivocation uses the project management objects names in a manner that is inherently confusing.  One has to carefully indicate whether one is speaking with one’s BOP or PRM hat on or risk not being understood.  This is a soft barrier to further development.

December

I extended my data modeling scope to include reports that use data provided by PPMG, any PPMG data submitted to planning or financial reporting authorities and any other data created by PPMG.

By this time, my intention was clear: integrate BOP and PRM planning in order to achieve time savings.  If sufficient data were collected in PPMGs planning phase, it could have most, if not all, of the data on hand required by the BOP data gathering process.  I felt that this time-saving incentive would also lead to a resolution of nomenclature issues as well.  Further, it is sensible to provide data gatherers with requirements.  I recommend that PPMG require that BOP activity names be selected soon enough to ensure none of the section and division planning processes are “negatively influenced”.

In addition, if PPMG preserved its inputs to the financial process in such a data collection, that un-aggregated data would be available to those investigating data shown in variance reports from a financial system incapable of providing the original un-aggregated inputs back to the area of concern when asking them to answer variance questions based on their inputs.  Truly a “keep your receipts” kind of a situation.  I called this data collection the CCDIC Planning System.  Perhaps not all of the data contributes to nor is created by the planning process but, its availability enables solutions to many of the issues in our User Stories.

I investigated the current Balanced Score Card planning initiative based on the Lean training in 2019.  I found the artifacts created by the team were quite useful as inputs to my work on inventories and data models.  But, PPMG performs much of its planning in an Ad Hoc manner driven by the Branch request for planning data for it’s BOP reporting. The opportunity to practice new methods, share new concepts and learn about the strengths of the organization on which they hope to leverage their new plans never materializes. 

I investigated the current state of the PMT.  I found a survey prototype which exceeded its obvious technical requirements in data gathering but did not do so well with respect to ease of back office processing and meeting cost requirements.  I fear, that no requirements were gathered for the prototype in the first place, so, there was no “course correction” guidance during the design stage. 

The future of surveys, which score well on the UX side, required investigation.  Was the Team going to take on all the back office processing requirements in order to avoid the costs?  Were the skills and experience present, did job descriptions allow?  The Team was already pitching in with skills and experience not normally associated with their job classifications.  Could they really be expected to increase that participation in the face of their own operational requirements?

While there are exceptional tools coming in the near future enabling us to take this function in house, its activities will grow eclipsing the available bandwidth in a heart beat.

Another way must be found to continue to develop the value of all of our systems without enslaving our current staff to them – they have work to do.

I reflected that I was seeing this problem in all of the issues I had investigated.  They all had the following characteristics:

  • Represented a point at which the “out of the box” solutions could go no further.
  • The system in a state of paralysis and its value was being questioned
  • The system was in that development phase in which the report outcomes were driving changes not the appearance or expected value of the reports.

I experimented with code which could be used to import data sent in email directly into a database table. The code works nicely importing data as it arrives at the proposed CCDIC.DataInput@canada.ca address, but, needs to be refined for the various body content expected.

January

In January, I began to develop a repository for incoming data, as a communal work area for reports under construction, local data, etc.  The goal was to create a working environment which was a “preview” of the groupware the Team will use in the future.

I built the tool as an Excel Workbook with the sharing function turned on.  Right away, I began to find that the sharing feature was flakey. First, there were unadvertised constraints like a prohibition on the use of tables.  That is a huge loss – pivot tables are really the only way of “creating” reports other than creating a separate data page and mapping values to cells in that sheet in order to distribute data on a page like a report.

Furthermore, the workbook behaved in unexpected ways (halting, freezing or closing the workbook right in front of you) as the number of users increased.  The idea will have to wait for the new tools in PHAC’s future.

I created an access database version of the repository to hold all of the “helpful” database solutions I came up with or for which I had been given a suggestion.  That collection of unimplemented tools are planned for implementation on the timeline into the future that I leave with the final documentation.

As a very quick, Lean experiment, tried to embed a survey form as a watermark in an email.  The idea was clever and worthy of testing in case it actually worked, but the outcome was wildly different for most receivers based on things I could not control, like the number of lines in a signature or the number of lines used by the list of addresses in the To:.  Lesson learned: cheap shortcuts flirting with the edge of the envelope are simply too risky and so unlikely to work out that one ought not spend too much time on them.

Now, I began to build the CCDIC Planning System as an example of how the two planning world could co-exist as a PMR-BOP integrated system.  My intention was to break the mindset of the old excel spreadsheet data gathering world.  That data from any source could be collected and added to any of our reports without having to gather that new data when the PMR data is gathered.  The data can be pre-existing or can be gathered later prior to the report generation.  In this way CCDIC could expand its RBM reporting.

The response was good.  Managers knew that things were missing and now they could bring those things into the reporting.  So, by expanding the scope of the reporting, we able to meet some of our more sophisticated user stories like those from Audits and Evaluations.

Now, I needed to socialize the data models I had been capturing with the Team and introduce them to the common planning paradigm and the value of our local data to our work. I planned to deliver a new form built in Excel to demonstrate that a good user experience is possible in spreadsheets.  The form was capable of sending the data it collected to any drive. 

The transmission idea makes it possible for the data provider to transmit and the data gatherer to receive data in an asynchronous manner.  This opens the door to continuous reporting and it effectively decouples data entry from collection.  Data is entered when environmental conditions dictate, not to meet the timelines of the reporting authority.

I met with the TB Folks to plan the creation of a repository (

Worksheet) capable of holding incoming indicator data for reports. Regrettably, the pandemic intervened.

February

I gathered some information about the Departmental Plan approval process when I heard PPMG was not included at a critical stage.  I created a process chart from the information I had gathered.  The process was flawed by a discontinuity which the team had noticed.  They were able to present the information to others in order to correct a process step.

Any discussion of a process is improved by sharing a chart of that process.

I refined the PMR data input form to include editing records and added a professional services and resources data input sections.

I had an important experience helping out in the effort to sort out financials records in order to enable the cuts requested.  I imported spreadsheet data into a database in order to perform reporting that was difficult in Excel.  I also explored the Power Pivot feature.  My hopes that this Excel feature would be more stable than the standard pivot table tool, were undermined by our continuing negative experiences.  A search on the net of Excel issues and power pivot help sites produced hundreds of stories from individuals like us, who felt the software was a bit shaky when you push it too far.  The trouble is that you will lose work and repeatedly return to step one of your last backup of the data.  This means that you must keep backups of each stage of your effort to develop the data.  Analysts are able to build a path or strategy of data development in their minds, but are disappointed when Excel “lets them down” as they attempt to follow their strategy. This is one of the reasons groups move to SAS or other proven applications for this kind of sophisticated data manipulation.

I was asked to create a Publications SOP in a process diagram.  During the data gathering process, I participated in several discussions about blending the Publications process tracking with the current RBM reporting.  I explained what was possible.  In an attempt to organically expand the scope of the current RBM, I planted the seeds the CCDIC data collection model.  I explained that the content of RBM reports depend on the data in the collection not in the only form used to gather RBM data.  People tend to created data concepts from the existing tools – like data input forms.  A slide explaining the data model of the collection was enabling.  Ideas for data inclusion in the RBM reporting were quickly incorporated into the plan: MRAP commitments, Publications etc.

In the email study, I assessed the value of an attachments management user story.  The concept was to drop attachments into a common repository in CCDIC rather than “attach” them to email or use common drive space with its confusing directory structure.  The calculated impact on email volumes seemed impressive but I did not have estimates of the total volume of email in CCDIC.   I created the beginnings of a database in the Access Repository for this purpose.

CCDIC Privacy Queries: A tracker of all of the questions of concern folks have about the data they work with and the places they might like or need to send data.  Added to the Access database Repository.

I was asked if I could bring some refinements into a tracking spreadsheet for human resources in place, vacant positions, changed positions, the current status of each position. I created the spreadsheet.  I offered to show staff how to build an Excel Data input form as a data input, viewing and editing tool.  However, the pandemic intervened.

March

I built the Common Planning Dbs using related BOP and PMR tables as well as a PMR to publications relationship in order to demonstrate the data behaving as it was all in one “spreadsheet” so folks could begin to change their mindsets about the value of the data around us and witness new strategies towards data management.

It is from this old mindset that the concept of a “repository” came as the solution to the problems faced by the RBM system came.  It would have the following characteristics:

  1. A one-stop-shop for all the data required for operational functions, reporting and course correction.
  2. Ease of interaction, a good user experience.
  3. A system that would lead to an easing of the tension between data producers and data gathers.

The requirements are easy to elicit.  However, it is the path to the solution which is in jeopardy.  How do we get there?  How do we improve once we do get there? Will improvements discovered by the BA be implemented?  Will the improvements put in place survive?

The current situation is difficult to understand when working with only one section. PPMG. However, I had several outstanding pieces of empirical evidence from over a dozen solutions.

1) All of my solutions utilized Visual Basic

2) Each of the solutions can be associated with higher levels of development in the maturity descriptions for systems: Integration, Harmonization, User Experience (UX) and Data Gathering/Transmission/ETL.

These are not the immediate goals of system designers who start out in a modest way building an “out-of-the-box” system.  Indeed, they are not achievable using “out-of-the-box” strategies.

Out-of-the-Box: Any functionality that comes shipped directly from the software vendor or can be configured easily (where “easily” means configured by a business, not IT user) with built-in workflow tools, templates, and/or best practices provided directly by the vendor

Typically, “out-of-the-box” systems which meet a need initially, will attract attention and collaborators with similar requirements.  Then, a period of improvement begins.  Requests for improvements fall into two classes: i) those that can be implemented using “Out-of-the-Box” capabilities, and, ii) those that cannot be implemented without configuration (using VB code).  Now there are some requests that cannot be met.  Those requests begin to accumulate with no sign of solution.  This is a type of developmental paralysis.

In the meantime, those who volunteered this overlay of time and effort in order to achieve the current result can make an heroic effort to achieve what is asked of them, however, they will discover that they do not have the skills as well as performing tasks outside of their current job description and classification.

As well, there is frustration at management levels based on the facts: the current system has achieved so much so quickly and, now, these few improvements cannot be achieved?.

Most of the solutions I contrived for the issues in the User Stories, were achieved by using Visual Basic and/or database technology to configure or attempt to “shape the application to fit the business”.   Excel and Access, both Windows Office products, are sold as and widely recognized as “Configurable”.

Configurable: Any functionality that can be created using built-in workflow tools shipped by the vendor. To be considered configurable, functionality should be forward-compatible with future releases.

This scenario can be understood further using maturity modeling.  In short, the paralysis of the current situation will persist until CCDIC brings the capability to address the demands of the next level of system maturity. IE. those improvements not yet implemented due to the paralysis and more.

These local systems have achieved a maturity level that requires configuration solutions.   Out-of-the-Box solutions are appropriate for basic systems to which the business has typically conformed.  

Currently, the cultural attitude towards spreadsheets is, “Not another spreadsheet!”.

This suggests that there is probably one negative characteristic that most do not enjoy.  Typically, this attitude is driven by the “Out-of-the-Box” user experience. And this negative experience is triggered by every spreadsheet.  If more than about a dozen inputs are required by the spreadsheet, the degree of negativity begins to grow disproportionally.  Symptoms include late input, incomplete inputs, disproportionate resistance to adding one new data item as if the sheet were at a “tipping point”, procrastination and ever-present questions like, “What are we doing this for?”

To correct this attitude, a breakthrough is required in user experience.  Not an incremental improvement, but, a breakthrough likely to bring unexpected benefits as well as addressing the issue.  I felt I had such a breakthrough in ActiveX Object embedded and Visual Basic managed forms.  The forms have had an instant approval reaction from everyone who has seen them. These single record based forms are a full screen design using color to designate groups of data, employing drop down menus wherever possible and sensible validation that will turn red the background of any data item entered without respecting the input rules. There is navigation and record editing as well.

I was quite confident we could perform a trial of the form in March, however, the pandemic intervened. But the form is ready to go and is documented.  Similarly, I have created forms for Publications and for Travel information. These forms can be used on any PC as the forms sheet to a data sheet in a Workbook.  The data input can be saved to any drive location in an automated fashion using the file type that meets the requirements of the receiving workbook or database.

So the form breakthrough pushes back on the negativity, but, more importantly, it is badly needed by at least 50% of the spreadsheet initiatives in the Agency.  Furthermore, it is a method of providing an Excel data-gathering and data-transmitting front-end to a database system in an organization in which database licences are rarely granted.  This is an intermediate solution that will stand up well until new tools are procured which deliver database technology to each machine meaning input forms will be created by the database, sent anywhere, filled in by users and the data gathered transparently.

.

William (Bill) Taylor,

william.taylor@canada.ca