Personal data is best protected by not being revealed at
all
The protection of data and privacy in the Internet is an issue of increasing
concern and importance. Such protection requires not only that general standards
and laws be agreed upon but also that technical measures be taken. An example
of the latter is the "idemix" project at IBM's Zurich Research Laboratory
in Rüschlikon.
Modern forms of electronic communication and commerce are such that practically
any transaction leaves a data trail in the global Internet, i.e., information
about who conducted which transaction with whom and when. Novel in this respect
is not the fact that these data trails exist but rather the ease with which
large amounts of data from many diverse sources can be gathered, combined, and
exploited in a wide variety of ways.
A scenario to illustrate what this can mean for each one of us is quickly given:
Anyone who books a hotel room will probably be registered in an electronic system
with his or her name, home address, and length of stay. This in itself may have
numerous advantages, also for the hotel guest, for example in terms of frequent-flyer
mileage. On the other hand, a malicious hotel employee could supply the guest's
home address to a complice for a low-risk burglary in the absence of the owner.
A regular customer of an online bookshop might appreciate receiving reading
suggestions based on his or her specific preferences, but will be less than
pleased if personal data is passed on to a third party, and the reading preferences
are exploited for other purposes.
Using the Internet in a variety of ways increasingly puts personal data at
risk. But just as modern technology is a threat to one's privacy, it also provides
many tools for its protection: Making consistent use of cryptographic encoding
when saving and transmitting data protects personal data from uninvolved third
parties. A conscientious check of a business partner's identity, for example
using digital signatures and public-key infrastructures (PKIs), can help to
ensure that personal data are only given to partners deemed trustworthy.
Finally, various technical tools exist to specify the privacy preferences of
an individual, for example, which data one is willing to reveal for what purpose
and to whom. Moreover, many organizations that process data publish details
of their privacy policy, such as which data are gathered, and to whom and for
what purpose they may eventually be forwarded.
All these technologies help to prevent an accidental transfer of personal data
or a transfer under unclear conditions. But they cannot prevent data being passed
on by, for example, a malicious hotel employee or, probably a much more frequent
occurrence, being accessed by a curious employee or accidentally revealed because
of technical problems.
This is why researchers at IBM's Zurich Research Laboratory in Rüschlikon,
Switzerland, go one step further and investigate concepts that embrace "data
parsimony." The basic concept is very simple: personal data is best protected
if not revealed at all, i.e., if the amount of data revealed is kept to a minimum.
The idea is not new, many laws on the protection of personal data contain data
frugality as an implicit guideline.
The question then is how "minimum" is defined. Anybody who rents
a car has to produce a valid driver's license, thereby - whether voluntarily
or involuntarily - revealing a wealth of personal data. Actually, the car rental
only needs the name and address of the person renting a car in the event of
an emergency. As long as there is no accident, it would suffice to know that
the person renting a car is in possession of a valid driver's license. In this
case, the data minimum could be quite easily achieved: the name and address
in the driver's license could simply be replaced by a randomly chosen artificial
name, a pseudonym, provided that in an emergency the name hidden by a pseudonym
could be retrieved.
The "idemix" system developed in Rüschlikon uses precisely such
pseudonyms for e-commerce transactions: Today, anybody who subscribes to an
online service has to register with the service using a user name and a password
which he or she has to produce each time he or she wants to access the service.
Under "idemix" a user would first select a pseudonym, then register
using this pseudonym and receive the corresponding credentials with an electronic
signature. If later the user wants to access the service, he or she only must
first provide proof to the service that the corresponding, digitally signed
credentials are in his or her possession.
Of course, a user could merely present his or her pseudonym and the credentials;
however, in many cases this would invalidate the desired data-protection advantages
in that the online service could monitor when and how a user uses the service,
which in turn could result in an involuntary de-anonymization. In addition,
the online service would often even know who owns the pseudonym, for example,
for billing purposes.
By employing modern cryptographic techniques, the so-called Zero-Knowledge
proofs, researchers at IBM have succeeded in resolving this issue: the pseudonym
and credentials are given to the online service only in encrypted form. Although
the online service cannot decrypt the information, it can still employ a clever
interaction tactic with the user to verify the authenticity of the encrypted
pseudonym and that the users must indeed own correct, digitally signed credentials.
In an equally secure manner the user can supply credentials received from another
organization to the online service. The car rental agency in our example could
in this way receive proof of possession of a valid driver's license from the
authorities, and of a valid credit card from a bank. A user can in principle
present his or her credentials any number of times in this way.
Because a new encryption is used every time, the repeated use is hidden from
the online service, i.e., the user is not re-identified and thus, so to speak,
can act completely anonymously. However, for many applications this total anonymity
is undesirable: for example, if a rented car is not returned, the identity of
the person who rented the car has to be retrievable. Therefore, the idemix system
also has provision for a designated authority who can uncover such an identity.
In the case of an "anonymized" driving license, it could for example
be the office that issued the license; in a business context it could be a third
party trusted by both business partners.
|