Big Data and Filter Bubbles

Personalized searches have a lot of benefits – obviously, search engines who have figured out better algorithms attract more users (for example, Google versus Bing or Yahoo!) because they allow the searcher to find what is most relevant to them, what they are most likely searching for, more easily and efficiently. Social networking sites like Facebook use similar data to draw up a user’s feed, thereby keeping the user on that app or website for a longer amount of time because they just happen to be interested in the posts they are seeing. However, there are some negatives. “Filter bubbles” arise because of increased capability of personalized searches, where a user is fed mostly information that is relevant to them (based on click history, search history, location, etc.) and become separated from information that opposes their viewpoints or information that doesn’t directly concern them (developing countries, for example). Personalized searchers therefore benefit many internet-goers (usually people living in more developed countries who have access to the Internet and smartphones) by making their Internet experience more of what they want to see.

Choosing to eliminate personalized searches would undoubtedly make our lives more difficult because we are so used to it, but if we continue to go down this path, there are incredible dangers. Eli Pariser, an internet activist who coined the term “filter bubble,” talks about the dangers and harmful effects, which affect both the population of computer and phone users, but also those in more developing countries. Filter bubbles describes social media algorithms that are able to create personalized “feeds” or suggestion lists based on what the user is predicted to enjoy [1].  Over time, filter bubbles exacerbates the issue of splinternet or “cyberbalkanization,” dividing the Internet into sub-groups of people who agree with each other and live/interact online in an echo chamber of their own beliefs and viewpoints, never receiving different or opposing ideas, and never even realizing it.

Personalized searches definitely involve the use of “big data,” a concept Professor Barreto talked about in Lecture 1 to describe datasets so large or complex that traditional data processing applications are inadequate. For example, personalized searches analyze millions upon millions of people, their locations, their search history, click history, frequently visited websites…and then compile it, sift through billions of web pages to find the most “relevant” one. This made me think about the “the cost of free” – after all, Google searches are free, but that also means they are likely collecting and storing our search data to give to third party companies to use for advertising and marketing or data collection. Personalized data also uses algorithms to mathematically calculate and return the best search results. It also involves the carry-over effect, which describes when a user performs a search and then follows it up with another search – the second search is then influenced by the first one assuming that they are connected. Overall, personalized searches are used by everyone from Google Search to Facebook’s personalized feed to Yahoo! to Bing, all in order to help users find websites most relevant to the

As a result of personalized searches, more of the information we read online has become polarized. Because we now have access to more and more information, we have the power to select exactly what relevant information we want to view. This can have quite severe repercussions. For instance, the menace of news media polarization has reared its head specifically during the 2016 election – a Pew Research Center study conducted after the election of President Trump found that 64 percent of adults “believe fake news stories cause a great deal of confusion” [2]. Fundamentally, social media makes it particularly effortless to share incorrect information, which, in this case, led to a critical absence of political fact-checking. 


[1] Will Rinehart, “The Election of 2016 and the Filter Bubble Thesis in 2017” on Medium.

[2]  Janna Anderson and Lee Rainie, “The Future of Truth and Misinformation Online”, on Pew Research Center.