Privacy scholars and practitioners the world over have now noted that the current regulation of privacy simply does not work well in a big data world.[1] Thus to the extent that they are openly welcoming or at least acknowledging the inevitability of such a world, many of them (us) are beginning to seek new approaches. Among the major concerns regarding the application of current privacy law to big data are the following:
- Current privacy law focuses on data minimization in the collection, while big data extracts unpredictable value from combinations of data the collection of which might not have appeared necessary.
- Current privacy law often requires that data be destroyed when no longer needed for the purpose for which it was collected, while big data looks for derivative uses and opportunities.
- Current privacy law often relies solely on notice and consent at the time the data is collected, (although this is changing with privacy by design, that often emphasizes “just in time” notices for particular uses and disclosures of data), and big data uses are generally not known at the time of collection.
- Current privacy law allows for free alienation of all personal rights at the time of consent or authorization, which makes no sense when the uses of the data are not known.
- Current privacy law exempts information that has been anonymized or de-identified, but big data facilitates the reidentification of anonymized data.[2]
- The White House Privacy Bill of Rights, drawing on the work of scholar Helen Nissenbaum, made “respect for context” one of its core principles, while big data companies like Google take big data in precisely the opposite (context-disruptive) direction.
If big data leads us to (a) give up on data minimization and destruction as soon as the primary use has been completed, (b) limits reliance on complete alienation of rights based on an initial notice and consent, and (c) undermines to some extent reliance on the effectiveness of de-identification, then the protection of privacy must have what are called in information security “compensating controls.” Those controls are particularly important to emphasize here, because in my view they go to the very heart of using cloud computing in big data:
- EXTREMELY good information security. Insofar as cloud computing may raise both regulatory- and risk-based information security issues that, say, a DoD-certified facility does not, then I would suggest that cloud-based big data providers hold themselves to a strong service organization-oriented assurance standards such as a SOC 2 Report on Controls at a Service Organization Relevant to Security, Availability, Processing Integrity, Confidentiality or Privacy. Big data cloud repositories are already — and will be increasing – targets for hackers, particularly as the value of the data increases.
- A focus on big data company accountability for appropriate use of the data, (to complement in my view some continued reliance on informed end-user notice and consent). This point is made well in two ways by Mayer-Schonberger and Cukier.[3] First, they stress the need for a formal big data use assessment and plan based on regulatory ground rules. The plan would incorporate “differential privacy” (now being explored by Microsoft and others) that deliberately obscures or masks the data, and maximum retention periods prior to secure data destruction. The problem posed by this idea is big data’s black box problem stressed in Section 2 above; the complexity of big data analysis and proprietary innovation make public accountability difficult. I believe their call for quasi auditor s– both independent/external and employees internally — given the infelicitous name of “Algorithmist,” will be a necessary and likely development, particularly in the absence of changing individual rights like those discussed below.
These new professionals would be experts in the areas of computer science, mathematics, and statistics; they would act as reviewers of big-data analyses and predictions. Algorithms would take a vow of impartiality and confidentiality, much as accountants and certain other professionals do now. They would evaluate the selection of data sources, the choice of analytical and predictive tools, including algorithms and models, and the interpretation of results.[4]
These two “compensating controls” — as substitutes for the more traditional privacy regulatory requirements at the beginning of this section — would put a great deal of regulatory and auditing pressure on both big data firms and cloud providers to become both less messy and more transparent.
A third force that would have a similar impact might come from Europe in the next year. Among the sets of ideas under consideration in the transformation of European data protection regulation that is now underway are many that would put more power in the hands of individuals. In the US, Tene and Polonetsky made the case for such a shift in control in a popular paper,[5] and Rubinstein extended their thinking, incorporating Doc Searls’ work on Vendor Relations Management (VRM).[6] If the European Union proceeds in this direction, it will render the creation of aggregators that lower transactional costs likely, opening the door to individuals to play an active role in the big data economy. And where would individuals store their big data, but in their “personal clouds” that many of them already have and others will soon given the consumerism of IT?
One way or another, value will be delivered to the individual as an individual, thanks to big data. The question as between an American approach and a European approach may be how much the individual will be consciously involved in the creation of that value.
[1] For two good looks at where big data may lead privacy regulation, see, Christopher Kuner, Fred H. Cate, Christopher Millard and Dan Jerker B. Svantesson, The challenge of ‘big data’ for data protection, Oxford Journal of International Data Privacy Law, Volume 2, Issue 2, (Pp. 47-49) and Ira Rubinstein, “Big Data: The End of Privacy or a New Beginning,” International Data Privacy Law (2013)
[2]But see, Ann Cavoukian & Khaled El Emam, Info. & Privacy Comm’r of Ont., Dispelling the Myths Surrounding De-identification: Anonymization Remains a Strong Tool for Protecting Privacy 7 (2011), available at http://www.ipc.on.ca/images/Resources/anonymization.pdf.
[3] Mayer-Schonberger and Cukier, op cit., pp. 172-184.
[4] Ibid., pp. 179-182.
[5] Omer Tene and Jules Polonetsky, ‘Big Data for All: Privacy and User Control in the Age of Analytics,’ (forthcoming) Northwestern Journal of Technology and Intellectual Property.
[6] Rubinstein, op. cit., at p.8.