The Unbiased Algorithm is a Myth

This week Gizmodo wrote an in-depth story about the bias in Facebook’s trending news product. The first paragraph from Gizmodo is a concise summary:

Facebook workers routinely suppressed news stories of interest to conservative readers from the social network’s influential ‘trending’ news section… Workers prevented stories about [conservative topics] from appearing in the highly-influential section, even though they were organically trending among the site’s users.

Manually intervening with an algorithm to neutralize a flavor of political voice feels surprising and wrong at a gut level. It created enough controversy to rile up some political theater from a conservative senator. But why does it feel so wrong? It’s “manual” part, right? It’s a deliberate intervention where the unbiased algorithm says, “X, Y, and Z are trending on Facebook right now,” and then when Facebook hand-tunes the results to remove “Z” but leave “X” and “Y” it feels like censorship and manipulation.

Users expect an unbiased result from a trending news algorithm. Is that a reasonable expectation? I think looking at the feature directly will help guide towards an answer. Here’s what Facebook’s trending news feature looks like right now:

Facebook’s Trending News Feature

The trending news product has one word of description: “TRENDING” and some up-and-to-the-right-arrow iconography to imply that everything here is trending. Everything else is content. There are no descriptive cues that imply this list of stories is curated in any way, so it does seem like a reasonable expectation of a consumer to assume that this trending list is comprehensive and represents what is actually trending on Facebook right now.

But, looking twice, I see that the second item in the list is about a stabbing in New Bedford, Massachusetts. I’m writing this post from Boston. So, it’s pretty likely that this list is personalized to me, based on geolocation (either IP geolocation or the geography listed in my Facebook profile). And, the entire list is written in English with news stories that seem most relevant to Americans as opposed to, say, Russian citizens reading about Russian events written in Russian. There is personalization built into this algorithm such that there is no single objective list of what’s trending on Facebook right now.

This digression into the trending news list’s visual presentation and the resulting user expectations is important because the conclusion contrasts our typical expectations of news sources. News is never unbiased, and I don’t think any reasonable person expects a single news source to be unbiased. News is written by humans that, try as they might, cannot write from a 100% objective, unbiased point-of-view. The effort to write unbiased news itself often introduces a form of bias known as false balance, where an effort to remove bias by presenting alternate opinions or interpretations equally becomes a misrepresentation of evidence.

People know that the NYTimes will have a more liberal slant on the news than the NYPost, and that’s OK. So why is the trending news section of Facebook held to some higher standard? It’s because the description “trending” on the feature implies there is an algorithm at work, and people expect a computer algorithm to operate without bias. 2+2 will always equal 4 even if the computer executing the algorithm would vote for Trump (if machines were granted suffrage) because the algorithm for addition is known, consistent, and unbiased.

Lets Play Algorithm Designer

“Trending” implies an algorithm for what are the most popular news stories on Facebook right now. A reasonable (albeit, oversimplified) strawman approach for this algorithm would be:

Produce a list of news stories organized in a reserve sorted order of the number of times users mention a particular news story within a couple hours time window.

The uncomfortable bias added to this strawman algorithm is Facebook selectively removing items from this list that are about conservative politics.

But, having seen the inside-baseball view of a number of social media properties over the years, I suspect the output of our strawman algorithm would be pretty sad. Spammers bulk-create accounts on breathtaking scale to artificially amplify noise for the following purposes: (A) make money from a crappy behavior (eg “Cl1ck here: Buy V1agra!”) (B) create support for an agenda (eg “Mugatu for President!”) or (C) grief other users (eg Gamergate garbage). Should Facebook, in the spirit of algorithmic purity, just let “V1agra” become a trending topic if our strawman algorithm surfaces this trend? Of course not! This would be seen as a bug, flagged by QA, and the algorithm would be revised to remove this crap.

So we iterate on our strawman algorithm. Maybe we do something like:

Produce a list of news stories organized in a reserve sorted order of the number of times users mention a particular news story within a couple hours time window, where the users all have at least 100 friends and account start dates older than 1 year. [emphasis to show the addition]

Our modification of our algorithm will hopefully remove the votes of spam accounts, with the assumption that spammers don’t have many legitimate Facebook friends, and their accounts are all still quite young because they get banned quickly after creation. But, now my Aunt Harriet who just joined Facebook last week doesn’t get her vote counted in the trending algorithm when she posts a news story about a M*A*S*H cast reunion in Tampa Bay. We are losing signal in our attempt to filter noise.

Here begins the first step down a slippery slope in which we find that algorithms cannot be completely objective or pure. Designing algorithms requires compromise, especially when the input and output involves human messiness. Algorithms are made by humans that have bias in optimizing the algorithm’s output towards a goal. Designing an algorithm is a highly iterative process. You as an algorithm designer take a stab, look at the output, and if it’s not great, you tweak and try again. With each iteration, you are optimizing towards your subjective interpretation of an ideal goal, such as “Display what is trending on Facebook right now, but filter out crap.”

In addition to biasing an algorithm at the design stage, an algorithm designer can also add bias before the first line of code is written. More than half the time spent developing an algorithm is in upfront data munging: collecting multiple data sets from disparate sources and normalizing it all into a workable form before an algorithm can be started. The algorithm developer makes decisions at this stage to drop data deemed irrelevant, transform data into a new scale to balance weighting, remove duplicates, all of which biases the data in comparison to its ground truth. So, if your expectation is an unbiased algorithm, in practice you’ve failed before you’ve even begun.

Bring It Home

Perhaps my argument doesn’t feel satisfying in the context of this Facebook anti-conservative bias story. A fair opinion would be that spam and conservative news are many shades of gray apart from each other. True, I agree, but if you rest right there, you miss my point. My goal is not to defend Facebook. I think Facebook in general is transparent in their liberal point-of-view. Mark Zuckerberg got on stage last month at F8 and opposed “fearful voices for building walls” by saying that “it takes courage to choose hope over fear.” Good for Mark and good for Facebook, regardless of your politics. I like people and companies who take a stance. They don’t pretend otherwise.

Facebook has a point of view, but that’s not where I spilled most of my pixelated ink today. Instead, I want to dissuade people of the mythical notion that algorithms are somehow magically devoid of the biases of their creators. Even when the goal is to attempt to make an unbiased algorithm, there are too many opportunities to go astray, deliberate or unintentional. The word “algorithm” should not carry a connotation of objectivity.

You might say that Facebook perverted their trending news algorithm by directing humans to mess with the machine’s results post-processing, but what really is the difference between instructions interpreted by a machine (eg code) and instructions interpreted by human news curators (eg directions)? Both are easily encapsulated as algorithmic if the output is consistent and repeatable. The output of Facebook’s trending news feature is the algorithm’s output. This algorithm just happens to be bionic.

People reacting to the Gizmodo article are either genuinely surprised that Facebook’s trending news algorithm is biased, or else they are feigning surprise to push an agenda. I hope to encourage those in the former camp to generally approach “objective” algorithms with a healthy, intellectual skepticism in the future. It’s tempting to say, “oh, algorithms are just math,” but that is a naive treatment of their creation process.

“Seeing ourselves clearly is the project of a lifetime.” -The Nix