Is Data Hoarding Necessary for Lawful Surveillance?

The NSA's mass surveillance activities, including the collection of billions of U.S. cell phone records every day, have sparked vigorous debate about whether such surveillance is legal, consistent with democratic principles, or effective in catching the terrorists it ostensibly targets. One essential question has received little attention, however: Is amassing mountains of privacy-sensitive "metadata" technically necessary for effective, lawful electronic tracking and surveillance of legitimate targets?

The answer is emphatically no. Well understood cryptographic techniques can enable lawful intercept and surveillance without the creation of centralized hoards of personal information. This is not a geeky footnote in the mass surveillance saga. Such hoards are dangerous as well as unnecessary; they could be leaked or sold to a foreign state or criminal gang by a future, more venal incarnation of Edward Snowden.

The FBI is already adept at catching criminals without hoarding the cell phone metadata of all Americans. The High Country Bandits were two men who robbed 16 rural banks in Arizona and Colorado before being caught. After one bandit was observed using a cell phone near a robbery site, the FBI obtained cell tower dumps -- records from cellular providers listing all cell phones that had electronically "checked in" around the locations and times of three past robberies. This request yielded three sets of phone numbers, one from each cell tower, containing approximately 150,000 numbers in total. However, only one phone number appeared in the intersection of these sets, i.e., in all three: that of the phone one bandit had carried during the robberies. The bandit need not have made any calls; his phone merely needed to have been powered on and communicating with the cell towers.

In computer security, this is known as an intersection attack, with the FBI in this case playing the role of "attacker." Intersection attacks are a powerful, general, and in this case effective method of answering questions of the form, "What is common to several large heaps of otherwise meaningless-looking data?"

Intersection attacks are also evidently the foundation of the NSA's CO-TRAVELER program. To find unknown associates of a known target, the NSA collects cell tower dumps of all users carrying cell phones near the target at different locations and times. The NSA then identifies previously unknown cell phone numbers common to several of these sets, representing people who may be "traveling with" the target. Although the U.S. government has offered scant evidence of this claim, let us assume for the sake of argument that location-tracking methods like CO-TRAVELER are effective at catching terrorists.

The FBI did not need to hoard the cell phone records of all Americans to catch the High Country Bandits, but they still swept 149,999 other phone numbers into their intersection attack: numbers probably belonging to innocent people who happened to be in the vicinity of one robbery site but not all three. Did the FBI immediately delete the rest of these phone numbers, or were they stashed for possible use in future investigations? Part of the widespread opposition to New York City's "stop-and-frisk" regime was the NYPD's policy of retaining the names, addresses, and descriptions of people who had been stopped, frisked, determined to be doing nothing illegal, and released without ever having been arrested, much less convicted of a crime. Should the FBI need to do the equivalent of a retroactive "stop and frisk" of 149,999 innocent cell phone users, gathering their phone numbers and potentially storing them forever to use in ways yet to be determined, in the process of catching one pair of bandits? Even if the FBI were to declare a policy of deleting data incidentally collected on users not under suspicion, must Americans simply trust that every FBI agent will follow this policy faithfully?

The answer is still no. Modern cryptography has moved far beyond merely encrypting and decrypting data. We can now perform many computations on encrypted data, while keeping it encrypted and unknown to the parties performing the computation. For example, we have efficient methods for privacy-preserving set intersection, which start with several sets of encrypted items, decrypt only the elements in the intersection, and leave items not in the intersection encrypted and unreadable by anyone. Thus, cell phone carriers could have stored cell tower data in encrypted form, used privacy-preserving set intersection, and delivered only the bandit's phone number to the FBI without disclosing the other 149,999 phone numbers to anyone. This may sound like magic, but it is merely an illustration of sci-fi author Arthur C. Clarke's maxim that "any sufficiently advanced technology is indistinguishable from magic."

Like any technology, modern cryptography can be misused: for example, to conceal spy agencies' activities without accountability or privacy protection for innocent users. Proper uses should ensure that lawful electronic surveillance activities protect the innocent, are properly authorized and limited in scope, are subject to robust oversight, and follow transparent processes that the public can debate or challenge in court. With proper system design, adequately informed by both policy and technological capabilities, this combination of surveillance power and privacy safeguards is achievable with existing technology.

For example, cell phone carriers could encrypt their lawful intercept records so that neither the carriers themselves nor any single government agency can decrypt them. These records would be useless to malicious insiders at the carriers or hackers who might compromise the carriers' networks, mitigating one valid reason carriers don't want to hold this hot potato. Records could be "unlocked" only when independent agencies representing all three branches of government coordinate, e.g., when an intelligence agency electronically requests a warrant, a judge digitally signs it, and a legislative oversight agency digitally attests that the warrant has been tallied in statistics reported to Congress. This electronic coordination need not be slow; the process could occur within seconds of the judge's signing the warrant.

With privacy-preserving set intersection, an agency need not have a name or phone number to request a warrant. For example, the FBI could have issued a "John Doe" warrant merely listing the cell tower dumps of interest in the High Country Bandits case. The judge authorizing this warrant could limit its scope by specifying a threshold number of these dumps that a phone number must appear in before that phone number can be decrypted and revealed to the FBI. The judge could also specify the maximum number of phone numbers that the warrant may reveal. If, for example, the three requested cell tower dumps unexpectedly coincided with three Justin Bieber concerts, then the warrant might net the phone numbers of thousands of innocent regular teenage fans without yielding useful intelligence. In this case, the set-intersection process would abort without revealing any phone numbers, protecting the fans and requiring the FBI agent to request different cell tower dumps or otherwise narrow the search.

Recent breakthroughs may soon make it practical to perform any computation on encrypted data. Currently, the use of encrypted input data may impose some performance cost, but often such costs are not show stoppers for intelligence agencies following targeted leads. And the costs are falling: DARPA is funding a major effort in computing on encrypted data as part of its PROCEED program.

The NSA is a major employer of cryptographers and computer-security experts. If the US government had directed the agency to work with the broader security-research community on proper application of privacy-preserving technology for warrant-based surveillance, instead of directing it to hoard cell phone metadata of U.S. citizens, a giant and still-ongoing controversy might have been avoided. It is not too late to begin such a collaboration, but that window of opportunity may be closing.