As you browse the internet, online advertisers track nearly every site you visit, amassing a trove of information on your habits and preferences. When you visit a news site, they might see you’re a fan of basketball, opera and mystery novels, and accordingly select ads tailored to your tastes.
Advertisers use this information to create highly personalized experiences, but they typically don’t know exactly who you are. They observe only your digital trail, not your identity itself, and so you might feel that you’ve retained a degree of anonymity.
But, in a paper I coauthored with Ansh Shukla, Sharad Goel and Arvind Narayanan, we show that these anonymous web browsing records can in fact often be tied back to real-world identities.
To test our approach, we built a website where people could donate their browsing history for the purposes of this study. We then tried to see if we could link their histories back to their Twitter profiles using only publicly available data. Seventy-two percent of people who we tried to deanonymize were correctly identified as the top candidate in the search results, and 81 percent were among the top 15 candidates.
This is, to our knowledge, the largest-scale demonstration of deanonymization to date, since it picks the correct user out of hundreds of millions of possible Twitter users. In addition, our method requires only that a person clicks on the links appearing in their social media feeds, not that they post any content – so even people who are careful about what they share on the internet are still vulnerable to this attack.
How it works
At a high level, our approach is based on a simple observation. Each person has a highly distinctive social network, comprising family and friends from school, work and various stages of their life. As a consequence, the set of links in your Facebook and Twitter feeds is highly distinctive. Clicking on these links leaves a tell-tale mark in your browsing history.
By looking at the set of web pages an individual has visited, we were able to pick out similar social media feeds, yielding a list of candidates who likely generated that web browsing history. In this manner, we can tie a person’s real-world identity to the nearly complete set of links they have visited, including links that were never posted on any social media site.
Carrying out this strategy involves two key challenges. The first is theoretical: How do you quantify how similar a specific social media feed is to a given web browsing history? One simple way is to measure the fraction of links in the browsing history that also appear in the feed. This works reasonably well in practice, but it overstates similarity for large feeds, since those simply contain more links. We instead take an alternative approach. We posit a stylized, probabilistic model of web browsing behavior, and then compute the likelihood a user with that social media feed generated the observed browsing history. Then we choose the social media feed that is most likely.
The second challenge involves identifying the most similar feeds in real time. Here we turn to Twitter, since Twitter feeds (in contrast to Facebook) are largely public. However, even though the feeds are public, we cannot simply create a local copy of Twitter against which we can run our queries. Instead we apply a series of techniques to dramatically reduce the search space. We then combine caching techniques with on-demand network crawls to construct the feeds of the most promising candidates. On this reduced candidate set, we apply our similarity measure to produce the final results. Given a browsing history, we can typically carry out this entire process in under 60 seconds.
Our method is more accurate for people who browse Twitter more actively. Ninety percent of participants who had clicked on 100 or more links on Twitter could be matched to their identity.
Many companies have the tracking resources to carry out an attack like this one, even without the consent of the participant. We attempted to deanonymize each of our experiment participants using only the parts of their browsing histories that were visible to specific tracking companies (because the companies have trackers on those pages). We found that several companies had the resources to accurately identify the participants.
Other deanonymization studies
Several other studies have used publicly available footprints to deanonymize sensitive data.
Perhaps the most famous study along these lines was performed by Latanya Sweeney at Harvard University in 2002. She discovered that 87 percent of Americans were uniquely identifiable based on a combination of their ZIP code, gender and date of birth. Those three attributes were available in both public voter registration data (which she bought for US$20) and anonymous medical data (which were widely distributed, because people thought the data were anonymous). By connecting these data sources, she found the medical records of the governor of Massachusetts.
In 2006, Netflix ran a contest to improve the quality of its movie recommendations. They released an anonymized dataset of people’s movie ratings, and offered $1 million to the team that could improve their recommendation algorithm by 10 percent. Computer scientists Arvind Narayanan and Vitaly Shmatikov noticed that the movies people watched were very distinctive, and most people in the dataset were uniquely identifiable based on a small subset of their movies. In other words, based on Netflix movie choices and IMDB reviews, the researchers were able to determine who those Netflix users actually were.
With the rise of social media, more and more people are sharing information that seems innocuous, but actually reveals a lot of personal information. A study led by Michal Kosinski at the University of Cambridge used Facebook likes to predict people’s sexual orientation, political views and personality traits.
Another team, led by Gilbert Wondracek at Vienna University of Technology, built a “deanonymization machine” that figured out which groups people were part of on the social network Xing, and used that to figure out who they were – since the groups you are part of are often enough to uniquely identify you.
What you can do
Most of these attacks are tricky to defend against, unless you stop using the internet or participating in public life.
Even if you stop using the internet, companies can still collect data on you. If several of your friends upload their phone contacts to Facebook, and your number is in all of their contact lists, then Facebook can make predictions about you, even if you don’t use their service.
The best way to defend against deanonymizing algorithms like ours is to limit the set of people who have access to your anonymous browsing data. Browser extensions like Ghostery block third-party trackers. That means that, even though the company whose website you’re visiting will know that you’re visiting them, the advertising companies that show ads on their page won’t be able to gather your browsing data and aggregate it across multiple sites.
If you are a webmaster, you can help protect your users by letting them browse your site using HTTPS. Browsing using HTTP allows attackers to get your browsing history by sniffing network traffic, which lets them carry out this attack. Many websites have already switched to HTTPS; when we repeated our deanonymization experiment from the perspective of a network traffic sniffer, only 31 percent of participants could be deanonymized.
However, there is very little you can do to protect yourself against deanonymization attacks in general, and perhaps the best course of action is to adjust one’s expectations. Nothing is private in this digital age.
About The Author
Jessica Su, Ph.D. Student at Stanford, Stanford University
The Art of Invisibility: The World's Most Famous Hacker Teaches You How to Be Safe in the Age of Big Brother and Big Data
- Little Brown and Company
Brand: Little Brown and Company
- Mikko Hypponen
- Robert Vamosi
Studio: Little, Brown and Company
Label: Little, Brown and Company
Publisher: Little, Brown and Company
Manufacturer: Little, Brown and Company
Your every step online is being tracked and stored, and your identity literally stolen. Big companies and big governments want to know and exploit what you do, and privacy is a luxury few can afford or understand.
In this explosive yet practical book, Kevin Mitnick uses true-life stories to show exactly what is happening without your knowledge, teaching you "the art of invisibility"--online and real-world tactics to protect you and your family, using easy step-by-step instructions. Reading this book, you will learn everything from password protection and smart Wi-Fi usage to advanced techniques designed to maximize your anonymity.
Kevin Mitnick knows exactly how vulnerabilities can be exploited and just what to do to prevent that from happening. The world's most famous--and formerly the US government's most wanted--computer hacker, he has hacked into some of the country's most powerful and seemingly impenetrable agencies and companies, and at one point was on a three-year run from the FBI. Now Mitnick is reformed and widely regarded as the expert on the subject of computer security.
Invisibility isn't just for superheroes--privacy is a power you deserve and need in the age of Big Brother and Big Data.
Studio: CreateSpace Independent Publishing Platform
Label: CreateSpace Independent Publishing Platform
Publisher: CreateSpace Independent Publishing Platform
Manufacturer: CreateSpace Independent Publishing Platform
Take control of your privacy by removing your personal information from the internet with this updated Fourth Edition. Author Michael Bazzell has been well known in government circles for his ability to locate personal information about anyone through the internet. In Hiding from the Internet: Eliminating Personal Online Information, he exposes the resources that broadcast your personal details to public view. He has researched each source and identified the best method to have your private details removed from the databases that store profiles on all of us.
This book will serve as a reference guide for anyone that values privacy. Each technique is explained in simple steps. It is written in a hands-on style that encourages the reader to execute the tutorials as they go. The author provides personal experiences from his journey to disappear from public view. Much of the content of this book has never been discussed in any publication. Always thinking like a hacker, the author has identified new ways to force companies to remove you from their data collection systems. This book exposes loopholes that create unique opportunities for privacy seekers. Among other techniques, you will learn to:
Remove your personal information from public databases and people search sites
Create free anonymous mail addresses, email addresses, and telephone numbers
Control your privacy settings on social networks and remove sensitive data
Provide disinformation to conceal true private details
Force data brokers to stop sharing your information with both private and public organizations
Prevent marketing companies from monitoring your browsing, searching, and shopping habits
Remove your landline and cellular telephone numbers from online websites
Use a credit freeze to eliminate the worry of financial identity theft and fraud
Change your future habits to promote complete privacy and anonymity
Conduct a complete background check to verify proper information removal
Configure a home firewall with VPN Kill-Switch
Purchase a completely invisible home or vehicle
Online Privacy: How To Remain Anonymous & Protect Yourself While Enjoying A Private Digital Life On The Internet (Online Anonymity, Anonymous Online, Online ... Password, Deep Web, Home Security)
Binding: Kindle Edition
Format: Kindle eBook
Studio: Grand Reveur Publications
Label: Grand Reveur Publications
Publisher: Grand Reveur Publications
Manufacturer: Grand Reveur Publications
Online Privacy: How To Remain Anonymous & Protect Yourself While Enjoying A Private Digital Life On The Internet
Are You Ready To Delve Into Proven Methods, Techniques & Tactics To Remain Anonymous & Protect Yourself From Online Crooks, Scammers & Hackers?
* * *LIMITED TIME OFFER! 50% OFF!* * *
In this day and age we're all constantly connected to the Internet, don't get me wrong the Internet is a fabulous tool, however there are MANY shady individuals out there looking to hack, scam and steal your personal data for their own capital gain, don't fall victim to the scams and identity theft that is so common today - read this book and you'll be clued up to browse the web anonymously without any worry of identity theft, bank account hackings and the like.
Here Is A Preview Of What This Book Contains...
- What Is Anonymity and Is It Possible To Achieve It On The Internet
- How To Thoroughly Clean All Of Your Information Online
- Understanding the Importance of Your IP Address and How It Could Inhibit Your Privacy Online
- How to Shop Anonymously Online
- The Use of Social Networks and Your Anonymity
- Aids To Assist Your Online Privacy
- Bonus Content
- And Much, Much More!
Hurry! For a limited time you can download "Online Privacy: How To Remain Anonymous & Protect Yourself While Enjoying A Private Digital Life On The Internet" for a special discounted price of only 99c
Download Your Copy Right Now!
Tags: Password Journal, Password Tips, Online Privacy, Secure Online, Anonymous, Anonymous Online, Proxy, VPN, IP Address, Cyber Security, Cyber Forensics, Browse Internet Anonymously