“Organizations today need to have both lawyers and engineers involved in privacy compliance efforts” (Dennedy, Fox, and Finneran, The Privacy Engineer’s Manifesto: Getting from Policy to Code to QA to Value, 2014, p90. This is echoed in Detterman’s Field Guide to Data Privacy Law, (2nd ed., Elgar, 2015) at 6.114 by Professor Detterman’s recommendation that Counsel be integrated into the PbD process.
In the past, every motivated enterprise has had to “reinvent the wheel” of privacy engineering. However, for some time I’ve been working on a project integrating privacy governance (Legal/Compliance) and privacy engineering (IT) for any enterprise, into a relatively “hard” template with software support. The objective is to simplify, structure, facilitate, and (so far as possible) automate a collaborative law/compliance/IT multi-disciplinary approach to Privacy-By-Design (“PbD”) engineering, putting multi-jurisdictional privacy impact / risk assessments at the heart of the architecture.
In this post I’ll work through a topical case study, define “privacy architecture” schematically and by reference to capability, explore practical usage, and define the architecture’s “privacy metadata” by reference to a linked data model document.
Don’t worry if this document takes you a little out of your own professional field’s comfort zone – being multi-disciplinary, that’s inevitable. The first section “Case Study” should be reasonably accessible to all – bear in mind difficult components are explained later.
Case Study – with breach table output
A simple but topical case study may be found from late last week, in the breaking news on the Xora fiasco (and accompanying USD $500k Californian lawsuit for intrusion upon seclusion and consequential losses) http://arstechnica.com/tech-policy/2015/05/worker-fired-for-disabling-gps-app-that-tracked-her-24-hours-a-day/ . Even just looking into the supplier’s own marketing(!) video (helpfully supplied in the link) at the 16-second mark, there seems sufficient information right there to populate PrivacyImpactAssessment (“PIA”) metadata triggering a risk assessment “failure” with an estimated cost of breach. (As a trial lawyer I need hardly point out the same frame’s possible use as evidence for the case – and doubtless for the supplier’s interesting marketing future).
Let us assume our enterprise is in a similar scenario but lacks last week’s hindsight. We wish to avoid similar fates by proactively defining a “privacy architecture” for the enterprise and conducting a risk assessment upon it. For simplicity, we store just one PII dataset, the Xora employee information; and one process/dataflow – the Xora web application itself. All we really need do is think carefully and identify the salient point: that the “data scopes” of the dataflow (in legal terms, subject matter) include information on “logged-out” employees – by contextual inference, persons who are not ordinarily acting in the capacity of employees. (For completeness, later the video also identifies tracking of employees’ statutory breaks – it’s unclear to me whether the surveillance applies to those periods as well, or whether surveillance within breaks would be legitimated by Californian statute, in any event I haven’t yet encoded any metadata for Californian statutes, so we’ll ignore that aspect)
For detailed table and field definitions and annotations, as required it may be useful to refer to subheading “Metadata Schema” below.
In the PrivacyImpactAssessment metadata table (picking out the most important fields for current purposes), from the information in the Xora marketing video I probably would classify the “data scopes” of the dataflow (chosen from a long overlapping list of legal subject matter) as “employment, surveillance, location, tracking” (at a stretch, possibly “eavesdropping” as well). I would enter “US-CA” (i.e. California) as my best-guess pseudo-ISO coding of the data subjects’ domicile(s), and nominally “US-CA” for the data storage jurisdiction(s). There are no transfer jurisdiction(s) as I assume we are not transferring the data outwith California.
Then the “application programming interface” (API) is run, requesting a risk assessment of the dataflow. By “API is run” I mean the project software’s “risk assessment component” is invoked, by way of being embedded into any software product, web service, etc. (It has to be made available in API form because only in that form can the enterprise directly inject its transactional-level breach reporting into the enterprise’s Operations IT systems by way of a PbD “layer”)
Commencing the risk assessment, the API determines from the metadata we entered that there is only one jurisdiction of immediate interest, US-CA. The API now wants to check through all the laws matching California or its “super-jurisdictions” to see if any of them have anything interesting to say about our dataflow. It determines the full list of jurisdictions by first checking the super-jurisdictions registered against the Jurisdiction table’s entry for US-CA. As it happens California has only one super-jurisdiction registered: the United States. So, initially, the API selects all Federal and Californian statutes and torts as theoretical “candidate” laws for relevance to this dataflow.
As luck would have it, the US variant of the tort of intrusion upon seclusion is entered in the ApplicableLaw metadata table under the less-than-imaginative code “US-SECLUSION”, unsurprisingly registered against the jurisdiction-code “US”. It therefore is a candidate. Unlike some non-US variants of the tort, there is only one sub-jurisdiction excluded from this law, Louisiana (this is because until someone corrects me on the point, with the a priori exception of Louisiana as a civil law jurisdiction I am unaware of any State judiciaries that have excluded the tort).
As US is US-CA’s super-jurisdiction from which US-CA is not excluded in respect of intrusion into seclusion, the API therefore will recognize US-SECLUSION as a candidate law (if you prefer, metalaw) to be tested against this dataflow’s metadata “facts” (the opposite of the way lawyers normally think, but bear with me). One of the first (of many) tests the API applies for each candidate law is to match the “data scopes” (aka legal subject matter) coverage of the dataflow, against the data scope coverage of the candidate law.
As it happens US-SECLUSION’s data scopes are classified (ultimately modelled on Professor Prosser’s Second Restatement) as “eavesdropping, film, surveillance, photographic, privatecorrespondence, sexualpractices, sexualorientation”. The API compares the law’s scopes with the PIA’s scopes determined by us earlier, and discovers one common element: “surveillance”. This causes the API to recognize the existence of a subject matter overlap between the dataflow and the tort of intrusion upon seclusion, so it provisionally decides the law may be applicable to this dataflow. (if no match was found, the API simply discards the tort as a candidate law applicable to this dataflow, and moves on to check the next candidate law).
For brevity, I’ll stop there. Of course the testing process from end-to-end is a whole lot more complicated than that, as you will infer from the metadata schema set out below, but I trust you get a high-level feel of how the architectures are processed by the API.
The API’s breach table delivery from this particular exercise is just one record, that may be found at https://www.dropbox.com/s/w1yjfpo61tfv1sv/Breach16.xls (I’ve inverted / ”transposed” the spreadsheet’s columns to rows so you can see each field of the record more clearly on Dropbox)
In summary, the idea is that the risk assessment process is a sausage machine. You just plug in your architecture, crank the handle, and then decide to which jurisdictions’ lawyers (if any) you need to run screaming – or not.
“Of particular importance to the engineers, lawyers, other privacy professionals and other disciplines is the terminology and establishing a common framework which allows disparate teams of people to operate” – Dr Ian Oliver, Privacy Engineering: a Dataflow and Ontological Approach, 2014, p244.
Anyone wishing to view EGPLib’s current metadata model (computer-generated, refreshed occasionally) can look at https://www.dropbox.com/s/npy1jrmxbpvox63/PBDDataModel-Logical-Annotated.txt. This comprises one part of the project’s “common language” – its frame of reference to objects both within and outwith the enterprise. Most other parts of the “common framework” are taxonomies of atomic elements, for example the “data scopes” encountered above.
Metadata quality (essential) is guaranteed by the API which will decline to operate against non-compliant metadata (giving extremely specific contextual feedback as to which metadata it doesn’t like and precisely why – usually typos or user misunderstanding). For that reason the enterprise, auditors, insurers, regulators, etc can be confident that any relied-upon published compliance reports generated by the API are founded upon a semantically consistent and conceptually coherent privacy architecture.
Regulators and insurers may be particularly interested in the enterprise-specific PrivacyImpactAssessment table, which allows the enterprise formally to classify dataflows of privacy interest and “drives” the rest of the architecture including transactional PbD.
Compliance and Audit professionals may be reassured by the AcceptedRisks and CustomRules tables – custom rules being particularly useful for simply “dis-applying” laws or analytics with whose effects you disagree (at the price of “dis-apply” rules etc being automatically written into audit documents)
Legal and IT geeks may wish to pore over the Jurisdiction, ApplicableLaw, and Analytic tables, which model the (multi-jurisdictional) legal context against which the privacy architecture is evaluated, for both architectural risk and transactional breach.
The linked metadata model prima facie looks like a hybrid data architecture / design document, indeed others theoretically could use it as such. However it emerged as an internal “sanity-check” API deliverable to verify the source code remains in sync with the live metadata tables, dynamically auto-generated thus always up to date (providing I remember to refresh the web copy).
Privacy Architectures – EGPLib Schematic Definition
An enterprise-wide “Privacy (Governance) Architecture”:
- Is expressed in and distinguishable by formal “privacy architecture metadata”
- Populating tables PrivacyImpactAssessment, AcceptedRisks, and CustomRules*;
- Engaging material parts of the enterprise (typically Compliance/IT/Legal)
- Building on (or creating) IT’s pre-existing Data/Information Architectures
- Articulating the enterprise’s privacy architecture to stakeholders
- Is formally validated and used by Governance IT (built, web service, or off the shelf)
- Against customizable metadata tables Jurisdiction, ApplicableLaw, Analytic*, thus
- forwards-compatible with emerging statutes/torts (eg GDPR, coming-into-force sections of PIPEDA, etc)
- Embedding the architecture as PbD into enterprise transactional level IT
- Implemented by IT via non-intrusive calls to “Application Programming Interface” (API) wrappers
- Dynamically providing transaction-by-transaction breach identification;
- Retrofitting legacy IT as easily as embedding into new systems
- Providing “notification list” capability for external data breaches (theft etc).
- Providing on-demand quantified Privacy Architecture Risk/ Impact Assessments
- Predicting cost of anticipated breaches across multiple jurisdictions
- Current, future, and legacy IT projects and systems, against law
- Effect of future (changes to) law, on current and legacy projects
* note that the asterisked (*) metadata and other tables are particularized under subheading “Metadata Schema” above, by reference to a hyperlinked data model
Don’t Panic! This sounds more complicated than it is. For example the metadata tables (where not gold-plated as SQL or XML) are articulated by default as ordinary spreadsheets which are then read as input (and tightly validated) by the API.
(Intentionally, this approach renders privacy architectures accessible to relatively small enterprises. Even those altogether lacking an IT department can set out their architecture in spreadsheet(s), and then can conduct risk assessment on that architecture without even needing to procure software: I or anyone else using the API could provide that as an automated web service)
Practical usage by the enterprise
“Typically, it is too difficult and time-consuming to determine the exact nature and details of formal and substantive compliance obligations in other countries, where laws may be presented in unfamiliar formats and languages.” – Lothar Determann, op. cit., 2.05.
The project and its API cannot provide legal advice – as a logical category mistake, no metadata codification of law could ever do that. What it can do is give heads-up “canary-in-the-coalmine” quantified risk warnings against specific dataflows relative to specific jurisdictions. In turn, this obviously implies that the enterprise should consider consulting Counsel in the identified jurisdiction(s) on data flows whose risk assessment deliverables for those jurisdictions exhibit significant financial risk, whether from public or private risk (the latter growing more and more prominent in proportion with class actions and the gradual de-coupling of tort remedy from proof of material damage). With or without legal advice, the enterprise logically then has about five clear non-exclusive options:
- To re-engineer the architecture (together with dependent IT systems), and re-assess;
- To alter relevant metadata on which the assessment is based, and re-assess; *
- To “disapply” jurisdiction(s), law(s), or analytic(s) (in response to legal advice); *
- Formally to authorize acceptance of specified risks in context; * or
- (ignore any insights and carry on regardless).
The API may record (depending on exact context) in audit documentation any asterisked (*) options chosen. This provides maximum transparency for auditors and stakeholders, including regulators. Ultimately the API’s production of such documents, with notes recording the embedded “due diligence” enterprise derogations from the standard, is designed to promote minimization of Court risk (in the context of aggravation versus mitigation and thus quantum of awards).
Forum-shopping – race to the bottom?
Going beyond trivial examples such as the case study set out above, I would be remiss if I did not mention the opportunities for savagely anti-social misuse of such risk reduction technology. Want to plan the best times to shift your health data between jurisdictions, ahead of citizens acquiring a private right of action against you? No problem. When hiring Canadians, want to know how and when (currently lawfully) to discriminate against them based on their province of birth? No worries: something for legislators, regulators, and judges to ponder. Sadly, facilitating such arbitrage / risk reduction techniques necessarily facilitates such externalization / rent-seeking behaviors.
Governance framework and extensibility
Though an “parent” enterprise governance framework sits at the conceptual center of the project, for current purposes its only significance is that privacy governance is one of its “child” architectures (hence “EGPLib” – Enterprise Governance – Privacy Library).
For example, a possible next-step complementary risk assessment / transactional analysis library, indeed inheriting and reusing most of the same metadata tables and common reference framework with suitably altered taxonomies, might be Basel III-compliant financial governance (which on any level seems far simpler to implement than privacy).
Alternative to “schematic definition” view – capabilities
In terms of the API’s technical capabilities, if they prefer people can think of this project as a multi-jurisdictional law-based Java API facilitating
- semantically validated specification of enterprise Privacy by Design architectures
- automated compliance/audit reporting and financial risk assessments for specified PbD architecture(s)
- seamless embedding of specified PbD architectures (including dynamic compliance reporting) into enterprise IT systems, including legacy
- automated preparation of breach reports/notification lists
- web-publishable standardized presentation of privacy architectures to regulators and consumers alike
- extensibility by enterprises or regulators to other purposes as they arise
The IT has been coming together for some time. Currently I hope to release the initial API code into open source in circa six weeks (such release is predicated on “critical-mass” interest from IT developers) to get it off my hands. The default-populating of ApplicableLaw and Analytic metadata tables for the many jurisdictions on which I haven’t yet started obviously has to be an ongoing activity, but I’m happy to continue with that (my primary two drivers being “interesting” laws and course delegate jurisdictions).
Driven by the imminent EU General Data Protection Regulation, Preterlex www.preterlex.com last year commissioned what they believe to be the world’s first in-depth course on an architectural approach to corporate privacy governance. The preferred audience profile is a multi-disciplinary mix of IT, compliance, and legal professionals (of necessity the course imparts to each of them an improved understanding of the others’ fields and concerns). EGPLib is used to illustrate the case studies and a practical approach to privacy governance. Currently the course is available privately to companies, but Preterlex plans a series of publicly available sessions worldwide. I declare an interest as the course developer and initial primary presenter (any enquiries should be addressed to firstname.lastname@example.org please, rather than myself).
As this is a cross-cutting project, comments or clarification requests are expected as well as welcome – please respond to the linkedin copy at .