Just a Theory

By David E. Wheeler

Posts about Privacy

Borderline

In just about any discussion of GDPR compliance, two proposals always come up: disk encryption and network perimeter protection. I recently criticized the focus on disk encryption, particularly its inability to protect sensitive data from live system exploits. Next I wanted to highlight the deficiencies of perimeter protection, but in doing a little research, I found that Goran Begic has already made the case:

Many organizations, especially older or legacy enterprises, are struggling to adapt systems, behaviors, and security protocols to this new-ish and ever evolving network model. Outdated beliefs about the true nature of the network and the source of threats put many organizations, their information assets, and their customers, partners, and stakeholders at risk.

What used to be carefully monitored, limited communication channels have expanded into an ever changing system of devices and applications. These assets are necessary for your organization to do business—they are what allow you to communicate, exchange data, and make business decisions and are the vehicle with which your organization runs the business and delivers value to its clients.

Cloud computing and storage, remote workers, and the emerging preference for micro-services over monoliths1 vastly complicate network designs and erode boundaries. Uber-services such as Kubernetes recover some control by wrapping all those micro-services in the warm embrace of a monolithic orchestration layer, but by no means restore the simplicity of earlier times. Once the business requires the distribution of data and services to multiple data centers or geographies, the complexity claws its way back. Host your data and services in the cloud and you’ll find the boundary all but gone. Where’s the data? It’s everywhere.

In such an environment, staying on top of all the vulnerabilities — all the patches, all the services listening on this network or that, inside some firewall or out, accessed by whom and via what means — becomes exponentially more difficult. Even the most dedicated, careful, and meticulous of teams sooner or later overlook something. An unpatched vulnerability. An authentication bug in an internal service. A rogue cloud storage container to which an employee uploads unencrypted data. Any and all could happen. They do happen. Strive for the best; expect the worst.

Because it’s not a matter of whether or not your data will be breached. It’s simply a matter of when.

Unfortunately, compliance discussions often end with these two mitigations, disk encryption and network perimeter protection. You should absolutely adopt them, and a discussion rightfully starts with them. But then it’s not over. No, these two basics of data protection are but the first step to protect sensitive data and to satisfy the responsibility for security of processing (GDPR Article 32). Because sooner or later, no matter how comprehensive the data storage encryption and firewalling, eventually there will be a breach. And then what?

“What next” bears thinking about: How do you further reduce risk in the inevitable event of a breach? I suggest taking the provisions of the GDPR at face value, and consider three things:

  1. Privacy by design and default
  2. Anonymization and aggregation
  3. Pseudonymization

Formally, items two and three fall under item 1, but I would summarize them as:

  1. Collect only the minimum data needed for the job at hand
  2. Anonymize and aggregate sensitive data to minimize its sensitivity
  3. Pseudonymize remaining sensitive data to eliminate its breach value

Put these three together, and the risk of sensitive data loss and the costs of mitigation decline dramatically. In short, take security seriously, yes, but also take privacy seriously.


  1. It’s okay, as a former archaeologist I’m allowed to let the metaphor stand on its own.

The Problem With Disk Encryption

Full disk encryption provides incredible data protection for personal devices. If you haven’t enabled FileVault on your Mac, Windows Device Encryption on your PC, or Android Device Encryption on your phone, please go do it now (iOS encrypts storage by default). It’s easy, efficient, and secure. You will likely never notice the difference in usage or performance. Seriously. This is a no-brainer.

Once enabled, device encryption prevents just about anyone from accessing device data. Unless a malefactor possesses both device and authentication credentials, the data is secure.

Mostly.

Periodically, vulnerabilities arise that allow circumvention of device encryption, usually by exploiting a bug in a background service. OS vendors tend to fix such issues quickly, so keep your system up to date. And if you work in IT, enable full disk encryption on all of your users’ devices and drives. Doing so greatly reduces the risk of sensitive data exposure via lost or stolen personal devices.

Servers, however, are another matter.

The point of disk encryption is to prevent data compromise by entities with physical access to a device. If a governmental or criminal organization takes possession encrypted storage devices, gaining access to an of the data presents an immense challenge. Their best bet is to power up the devices and scan their ports for potentially-vulnerable services to exploit. The OS allows such services transparent access the file system via automatic decryption. Exploiting such a service allows access to any data the service can access.

But, law enforcement investigations aside,1 who bothers with physical possession? Organizations increasingly rely on cloud providers with data distributed across multiple servers, perhaps hundreds or thousands, rendering the idea of physical confiscation nearly meaningless. Besides, when exfiltration typically relies on service vulnerabilities, why bother taking possession hardware at all? Just exploit vulnerabilities remotely and leave the hardware alone.

Which brings me to the issue of compliance. I often hear IT professionals assert that simply encrypting all data at rest2 satisfies the responsibility to the security of processing (GDPR Article 32). This interpretation may be legally correct3 and relatively straight-forward to achieve: simply enable [disk encryption], protect the keys via an appropriate and closely-monitored key management system, and migrate data to the encrypted file systems.4

This level of protection against physical access is absolutely necessary for protecting sensitive data.

Necessary, but not sufficient.

When was the last time a breach stemmed from physical access to a server? Sure, some reports in the list of data breaches identify “lost/stolen media” as the beach method. But we’re talking lost (and unencrypted) laptops and drives. Hacks (service vulnerability exploits), accidental publishing,5 and “poor security” account for the vast majority of breaches. Encryption of server data at rest addresses none of these issues.

By all means, encrypt data at rest, and for the love of Pete please keep your systems and services up-to-date with the latest patches. Taking these steps, along with full network encryption, is essential for protecting sensitive data. But don’t assume that such steps adequately protect sensitive data, or that doing so will achieve compliance with GDPR Article 32.

Don’t simply encrypt your disks or databases, declare victory, and go home.

Bear in mind that data protection comes in layers, and those layers correspond to levels of exploitable vulnerability. Simply addressing the lowest-level requirements at the data layer does nothing to prevent exposure at higher levels. Start disk encryption, but then think through how best to protect data at the application layer, the API layer, and, yes, the human layer, too.


  1. Presumably, a legitimate law enforcement investigation will compel a target to provide the necessary credentials to allow access by legal means, such as a court order, without needing to exploit the system. Such an investigation might confiscate systems to prevent a suspect from deleting or modifying data until such access can be compelled — or, if such access is impossible (e.g., the suspect is unknown, deceased, or incapacitated) — until the data can be forensically extracted.

  2. Yes, and in transit.
  3. Although currently no precedent-setting case law exists. Falling back on PCI standards may drive this interpretation.

  4. Or databases. The fundamentals are the same: encrypted data at rest with transparent access provided to services.

  5. I plan to write about accidental exposure of data in a future post.

The Farce of Consent

Cornell Tech Professor of Information Science Helen Nissenbaum, in an interview for the Harvard Business Review:

The farce of consent as currently deployed is probably doing more harm as it gives the misimpression of meaningful control that we are guiltily ceding because we are too ignorant to do otherwise and are impatient for, or need, the proffered service. There is a strong sense that consent is still fundamental to respecting people’s privacy. In some cases, yes, consent is essential. But what we have today is not really consent.

It still feels pretty clear-cut to me. I chose to check the box.

Think of it this way. If I ask you for your ZIP code, and you agree to give it to me, what have you consented to?

I’ve agreed to let you use my ZIP code for some purpose, maybe marketing.

Maybe. But did you consent to share your ZIP code with me, or did you consent to targeted marketing? I can combine your ZIP code with other information I have about you to infer your name and precise address and phone number. Did you consent to that? Would you? I may be able to build a financial profile of you based on your community. Did you consent to be part of that? I can target political ads at your neighbors based on what you tell me. Did you consent to that?

Well this is some essential reading for data privacy folx. Read the whole thing.

I myself have put too much emphasis on consent when thinking about and working on privacy issues. The truth is we need to think much deeper about the context in which consent is requested, the information we’re sharing, what it will be used for, and — as Nissenbaum describes in this piece — the impacts on individuals, society, and institutions. Adding her book to my reading list.

(Via Schneier on Security)

Token Dimensions

C’est mois, in my second post on Tokenization for the iovation blog:

These requirements demonstrate the two dimensions of tokenization: reversibility and determinism. Reversible tokens may be detokenized to recover their original values. Deterministic tokens are always the same given the same inputs.

The point is to evaluate the fields private data fields to be tokenized in order to determine where in along these dimensions they fall, so that one can make informed choices when evaluating tokenization products and services.

GDPR and the Professionalization of Tech

Happy GDPR day.

The GDPR is a big deal. It encodes significant personal and private data rights for EU subjects, including, among others:

Organizations that process personal data, referred to as “data controllers,” accept serious responsibilities to respect those rights, and to protect the personal data they process. These responsibilities include, among others:

The regulations have teeth, too; fines for non-compliance add up to a considerable financial penalty. Failure to notify in the event of a breach, for example, may result in a fine of up to €20 million or 4% of global revenue, whichever is greater.

There’s a lot more, but the details have been extensively covered elsewhere. In contrast, I want to talk about the impact of the GDPR on the internet products and services.

Impacts

In my GDPR advocacy for iovation, I’ve argued that the enshrinement of personal data rights marks a significant development for human rights in general, and therefore is not something to be resisted as an imposition on business. Yes, compliance requires a great deal of work for data controllers, and few would have taken it on voluntarily. But the advent of the GDPR, with application to over 500 million EU subjects, as well as to any and all organizations that process EU subject personal data, tends to even out the cost. If the GDPR requires all companies to comply, then no one company is disadvantaged by the expense of complying.

This argument is true as far as it goes — which isn’t far. Not every company has equal ability to ensure compliance. It might be a slog for Facebook or Google to comply, but these monsters have more than enough resources to make it happen.2 Smaller, less capitalized companies have no such luxury. Some will struggle to comply, and a few may succumb to the costs. In this light, the GDPR represents a barrier to entry, a step in the inevitable professionalization3 of tech that protects existing big companies that can easily afford it, while creating an obstacle to new companies working to get off the ground.

I worry that the GDPR marks a turning point in the necessary professionalization of software development, increasing the difficulty for a couple people working in their living room to launch something new on the internet. Complying with the GDPR is the right thing to do, but requires the ability to respond to access and deletion requests from individual people, as well as much more thorough data protection than the average web jockey with a MySQL database can throw together. For now, perhaps, they might decline to serve EU subjects; but expect legislation like the GDPR to spread, including, eventually, to the US.

Personal data rights are here to stay, and the responsibility to adhere to those rights applies to us all. While it might serve as a moat around the big data controller companies, how can leaner, more agile concerns, from a single developer to a moderately-sized startup, fulfill these obligations while becoming and remaining a going concern?

Tools

Going forward, I envision two approaches to addressing this challenge. First, over time, new tools will be developed, sold, and eventually released as open-source that reduce the overhead of bootstrapping a new data processing service. Just as Lucene and Elasticsearch have commoditized full-text search, new tools will provide encrypted data storage, anonymous authentication, and tokenization services on which new businesses can be built. I fear it may take some time, since the work currently underway may well be bound by corporate release policies, intellectual property constraints, and quality challenges.4 Developing, vetting, releasing, and proving new security solutions takes time.

Commercial tools will emerge first. Already services like Azure Information Protection secure sensitive data, while authentication services like Azure Active Directory and Amazon Cognito delegate the responsibility (if not the breach consequences) for secure user identities to big companies. Expect such expensive services to eventually be superseded by more open solutions without vendor lock-in — though not for a couple years, at least.

Ingenuity

I’m into that, even working on such tools at work, but I suspect there’s a more significant opportunity to be had. To wit, never underestimate the ingenuity of people working under constraints. And when such constraint include the potentially high cost of managing personal data, more people will work harder to dream up interesting new products that collect no personal data at all.

Internet commerce has spent a tremendous amount of time over the last 10 years figuring out how to collect more and more data from people, primarily to commoditize that information — especially for targeted advertising. Lately, the social costs of such business models has become increasingly apparent, including nonconsensual personal data collection, massive data breaches and, most notoriously, political manipulation.

So what happens when people put their ingenuity to work to dream up new products and services that require no personal data at all? What might such services look like? What can you do with nothing more than an anonymized username and a properly hashed password? To what degree can apps be designed to keep personal data solely on a personal device, or transmitted exclusively via end-to-end encryption? Who will build the first dating app on Signal?

I can’t wait to see what creative human minds — both constrained to limit data collection and, not at all paradoxically, freed from the demand to collect ever more personal data — will come up with. The next ten years of internet inventiveness will be fascinating to watch.


  1. This requirement has largely driven the avalanche of “We’ve updated privacy policy” messages in your inbox.

  2. Or to mount legal challenges that create the legal precedents for the interpretation of the GDPR.

  3. This Ian Bogost piece isn’t specifically about the professionalization of tech, but the appropriation of the title “engineer” by developers. Still, I hope that software developers will eventually adopt the Calling of the Engineer, which reads, in part, “My Time I will not refuse; my Thought I will not grudge; my Care I will not deny toward the honour, use, stability and perfection of any works to which I may be called to set my hand.” Ethical considerations will have to become a deep responsibility for software developers in the same way it has for structural and civil engineers.

  4. Like the old saw says: “Never implement your own crypto.” Hell, OpenSSL can’t even get it right.

iovation Tokenization

C’est mois, in the first of a series for the iovation blog:

Given our commitment to responsible data stewardship, as well as the invalidation of Safe Harbor and the advent of the GDPR, we saw an opportunity to reduce these modest but very real risks without impacting the efficacy of our services. A number of methodologies for data protection exist, including encryption, strict access control, and tokenization. We undertook the daunting task to determine which approaches best address data privacy compliance requirements and work best to protect customers and users — without unacceptable impact on service performance, cost to maintain infrastructure, or loss of product usability.

The post covers encryption, access control, and tokenization.

A Porous “Privacy Shield”

Glyn Moody, in Ars Technica, on the proposed replacement for the recently struck-down Safe Harbor framework:

However, with what seems like extraordinarily bad timing, President Obama has just made winning the trust of EU citizens even harder. As Ars reported last week, the Obama administration is close to allowing the NSA to share more of the private communications it intercepts with other federal agencies, including the FBI and the CIA, without removing identifying information first.

In other words, not only will the new Privacy Shield allow the NSA to continue to scoop up huge quantities of personal data from EU citizens, it may soon be allowed to share them widely. That’s unlikely to go down well with Europeans, the Article 29 Working Party, or the CJEU—all of which ironically increases the likelihood that the new Privacy Shield will suffer the same fate as the Safe Harbour scheme it has been designed to replace.

So let me get this straight. Under this proposal:

  • The NSA can continue to bulk collect EU citizen data.
  • That data may be shared with other agencies in the US government.
  • Said collection must fall under six allowed case, one of which is undefined “counter-terrorism” purposes. No one ever abused that kind of thing before.
  • The US claims there is no more bulk surveillance, except that there is under those six cases.
  • The appointed “independent ombudsman” to address complaints by EU citizens will be a single US Undersecretary of State.
  • Complaints can also be addressed to US companies housing EU citizen data, even though, in the absence of another Snowden-scale whistle-blowing, they may have no idea their data is being surveiled.

Color me skeptical that this would work, let alone not be thrown out by another case similar to the one that killed Safe Harbor.

I have a better idea. How about eliminating mass surveillance?

Anthem Breach Harms Consumers

Paul Roberts in Digital Guardian:

Whether or not harm has occurred to plaintiffs is critical for courts to decide whether the plaintiff has a right – or “standing” – to sue in the first place. But proving that data exposed in a breach has actually been used for fraud is notoriously difficult.

In her decision in the Anthem case, [U.S. District Judge Lucy] Koh reasoned that the theft of personal identification information is harm to consumers in itself, regardless of whether any subsequent misuse of it can be proven. Allegations of a “concrete and imminent threat of future harm” are enough to establish an injury and standing in the early stages of a breach suit, she said.

Seems like a no-brainer to me. Personal information is just that: personal. Organizations that collect and store personal information must take every step they can to protect it. Failure to do so harms their users, exposing them to increased risk of identity theft, fraud, surveillance, and abuse. It’s reasonable to expect that firms not be insulated from litigation for failing to protect user data.

Apple Challenges FBI Decryption Demand

Incredible post from Apple, signed by Tim Cook:

The government is asking Apple to hack our own users and undermine decades of security advancements that protect our customers — including tens of millions of American citizens — from sophisticated hackers and cybercriminals. The same engineers who built strong encryption into the iPhone to protect our users would, ironically, be ordered to weaken those protections and make our users less safe.

We can find no precedent for an American company being forced to expose its customers to a greater risk of attack. For years, cryptologists and national security experts have been warning against weakening encryption. Doing so would hurt only the well-meaning and law-abiding citizens who rely on companies like Apple to protect their data. Criminals and bad actors will still encrypt, using tools that are readily available to them.

I only wish there was a place to co-sign. Companies must do all they can to safeguard the privacy of their users, preferably such only users can unlock and access their personal information. It’s in the interest of the government to ensure that private data remain private. Forcing Apple to crack its own encryption sets a dangerous precedent likely to be exploited by cybercriminals for decades to come. Shame on the FBI.

Tune In, Opt-Out

Mindful of the fact that Ovid has suffered identity theft, and more than once, at that, and that it’s incredibly easy for someone to get a credit card in your name, I was happy to hear from a friend that you can have your name removed from the pre-approved credit card application database, so to speak. In the U.S., all you need to do is call +1 (888) 5-OPT-OUT, provide some information (including your Social Security number–yikes!), and that’s it. Given that it requires your SSN, I wasn’t sure about it, but a quick Googling yielded this FTC page legitimizing the number. And it has links to get me off of direct mail and direct email lists, too. Sweet!

You should visit the FTC page right now and get off those lists, too.

Looking for the comments? Try the old layout.