MAC Addresses and De-Identification
Location analytics companies log the hashed MAC address of mobile devices in range of their sensors at airports, malls, retail locations, stadiums and other venues. They do so primarily in order to create statistical reports that provide useful aggregated information such as average wait times on line, store “hot spots,” and the percentage of devices that never make it into a zone that includes a checkout register. FPF worked with the leading companies providing these services to create an enforceable Mobile Location Code of Conduct that restricts discriminatory uses of data, creates a central opt out, promotes in-store notice and other protections. We filed comments last week with the FTC describing the program in detail.
The only data transmitted by mobile devices that most location companies can log is the MAC address – the Wi-Fi or Bluetooth identifier devices broadcast when Wi-Fi or Bluetooth is turned on. The privacy debate around the use of this technology and the Code has centered on the sensitivity of logging and maintaining hashed MAC addresses, and hinges on whether a MAC address should be considered personal information.
Is a MAC address personal information? Well, it is linked to individual consumer devices, either as a consistent Wi-Fi or Bluetooth identifier. If enough data is linked to any consistent identifier over time, it is in the realm of technical possibility that the identity of a user can be ascertained. If there was a commercially-available database of MAC addresses, it is possible that such a database could be used to identify users. We are not aware of any such MAC address look-up database. But we do recognize that the data collected is linked to a specific device. For this reason, the Code of Conduct treats hashed MAC addresses associated with unique devices as something in between fully anonymized data and explicitly personal data. This reflects the view that Professor Daniel Solove posited effectively when he argued that PII exists not as a binary, but on a spectrum, with no risk of identification at one end, and individual identification at the other. In many real-world instances of data collection, the privacy standards in place reflect where the data lies on this spectrum; they consist not only of technical measures to protect the data, but also internal security and administrative controls, as well as enforceable legal commitments. In the case of Mobile Location Analytics, many companies are confident that by hashing MAC addresses, keeping them under administrative and security controls, and publicly committing not to attempt to identify users, they have adequately de-identified the data they log.
However, it is important to understand, that Code does NOT take the position that hashing MAC addresses amounts to a de-identification process that fully resolves privacy concerns. According to the Code, data is only considered fully “de-identified” where it may not reasonably be used to infer information about or otherwise be linked to a particular consumer, computer, or other device. To qualify as de-identified under the Code, a company must take measures such as aggregating data, adding noise to data, or statistical sampling. These are considered to be reasonable measures that de-identify data under the Code, as long as an MLA company also publicly commits not to try to re-identify the data, and contractually prohibits downstream recipients from trying to re-identify it. To assure transparency, any company that does de-identify data in this way must describe how they do so in their privacy policy.
As most of the companies involved in mobile location analytics do indeed link hashed MAC addresses to individual devices, the data they collect to track devices over time does not qualify as strictly “de-identified” under the Code and the data they collect is not exempt from the Code. Rather, the companies collect and use what the Code terms “de-personalized” data.* De-personalized data is defined in the Code as data that can be linked to a particular device, but cannot reasonably be linked to a particular consumer. Companies using de-personalized data must:
- take measures to ensure that the data cannot reasonably be linked to an individual (for instance, hashing a MAC address or deleting personally identifiable fields);
- publicly commit to maintain the data as de-personalized; and
- contractually prohibit downstream recipients from attempting to use the data to identify a particular individual.
When companies hash MAC addresses, they are thus fully subject to the Codes requirements, including signage, consumer choice, non-discrimination.
Different kinds of data on the PII/non-PII spectrum — given the inherent risks and benefits of each — merit a careful consideration of the combination of reasonable technical encryption and administrative measures and legal commitments that would be most suitable. After all, if “completely unidentifiable by any technical means, no matter how complex or unlikely” were the standard for the use of any data in the science and business worlds, much valuable research and commerce would come to an end. The MLA Code represents a pragmatic view that allows vendors to provide a service that is useful for businesses and consumers, while applying responsible privacy standards.
* Suggestions for a better term that de-personalized are welcomed. We considered “pseudonymized” but found the term awkward.