Network Connection: What's in an Algorithm?

By Philip Baczewski, executive director, University IT

graphic that illustrates how an algorithm may affect news by serving up online news based on your previous choices of news.Recent news has been filled with stories related to the algorithmic workings of some of the most popular services on the internet. You've probably seen these under headlines about "fake news", "Russian ads", or "misidentified shooter." What you may not realize is that the services involved (Facebook, Twitter, Google, and others) are operated more by computers than by people. Large parallel computer systems are used to process data and user interactions using complex algorithms that manage and present information based on the profile of the user and the nature of the information.

Who makes the Rules?

If you are not a programmer or computer scientist, you may not know that an algorithm is simply a set of logical steps or rules that can be repeated to solve a particular problem or generate a desired output. You may have your own personal algorithms that you use in your life. For example, your grocery shopping algorithm combines your memory of what pantry/refrigerator items you may have used up, your anticipated dining needs for the next week, and the available amount of funds in your account to achieve an optimal outcome of shopping that will allow you feed yourself and/or your family. While you are shopping, you are continually evaluating these data inputs and making decisions to support your desired goal to eat for the next week. Your available funds might make you decide that the week's menus will feature chicken and rice, rather than steak and potatoes.

Photo of Cathy O'Neill, author of Weapons of Math DestructionSocial Media algorithms are like your grocery shopping process, but possibly more complex and involving a greater volume of data. And, similar algorithms are used in many areas of commerce, finance, government, and education. Given the complexity of these data models, they often operate within complex computer systems out of sight from those who they may affect. This leads to a question of who is or should be providing oversight to these automated processes. At a recent conference about data science sponsored by O'Reilly Media, three speakers were featured on this particular topic according to the datanami web site. Mathematician Cathy O'Neil, pictured right, spoke about the potential harm from big data algorithms, Microsoft's Danah Boyd addressed the problem of skewed training data, and the eponymous Tim O'Reilly urged developers to carefully consider the desired goals and outcomes of the big data systems they develop.

Weapons of Math Destruction

Cover of the book Weapons of Math Destruction by Cathy O'NeillI recently read Cathy O'Neil's book, "Weapons of Math Destruction." In the book, she provides a number of examples where data-driven algorithms have incorrectly or unjustly been used to mischaracterize or deny services to particular classes of people. She points out that many of the decisions that go into making the algorithm can build in existing biases or prejudices, especially if the algorithms are using historical data to make predictions of future behavior or viability. (There's also a TED Talk in which she provides the gist of her argument.)

O'Neil points out that the most egregious of the "WMDs" don't measure their own effectiveness and often use proxy data items when more useful data points are not available. So instead of becoming more effective tools, they often compound their built-in biases as more data is collected based on their use. For example, if you predict that a particular neighborhood will have a higher crime rate and send your police to it, it may become a self-fulfilling prediction – there's possibly the same amount of crime happening elsewhere where there isn't the same level of policing, but its the police activity that contributes to the crime statistics.

All the News That's Fit to Tweet

Google, Facebook, and Twitter all run on algorithmic processes designed for a particular outcome – mostly to maximize the impact of advertising that will influence users to buy or think in a particular way. Advertising has been in existence for many years before the invention of the internet or the creation of social media sites. However, it has never before been as active and personalized as it is online or operated in such a large volume of data with the computing power to instantaneously target individuals. Plus, unlike newspapers and magazines (and even TV), advertising serves to provide the primary income stream for many online services.

Recently, we've seen that in the quest for your attention, automated processes can lead to atrocious errors. For example, Google promoted a story that incorrectly identified the culprit in the Las Vegas mass shooting. Also coming to light is the story that Google, Facebook, and Twitter may have been used to place ads or promote information in an attempt to influence the 2016 U.S. Presidential election. The processes that enable this kind of activity are largely automated and unregulated (at least governmentally.) Facebook allows anyone to place an ad for a few dollars and there's not a salesperson or editor involved who might notice inappropriate references or possibly illegal activity (like a foreign agency or government buying politically-influential ads.)

Video to the Extreme

Image showing a confirmation bias by cultural influences by evidence selected for or by the userWe are all subject to a psychological phenomenon termed confirmation bias. It is a human trait that once we have embraced an opinion or belief, we tend to gravitate toward information that confirms our belief while avoiding or rejecting information that would weaken it. In the online world, this sometimes leads to a news "echo chamber" that reinforces a particular version or viewpoint of an event or person. A recent study (funded by Google) downplayed this phenomenon, while a Washington Post writer has praised it.

It seems that confirmation bias has been a particular feature of the YouTube video service (owned by Google.) This has been discussed on NPR where it was noted that "a platform like YouTube has algorithms designed to recommend to you things that it thinks will be more engaging." Those recommendations can lead to a string of similar and related videos, which in the case of cute cats, could be simply amusing, but in the case of religious extremism, could promote radicalization and deadly actions. It's also been reported that YouTube's automated recommendation of 2016 election-related content was biased. Google has as much admitted an issue when it announced in July of 2017 that it would change the redirection algorithm for individuals viewing "terrorist content on YouTube" and "steer them toward video content that confronts extremist messages and debunks its mythology." (Of course, one person's extremism is the next person's religion -- good luck with that approach.) It seems that Google is replacing one biased algorithm with another ("partnerships with NGOs that are experts in this field") and while perceived terrorist content will now possibly be filtered, it would seem to be based on prior data without a stated measure of success.

You are the Product

The problem of confirmation bias reinforced by online news and information seems to be one that is eating away at some of the foundation of U.S. politics. Our two-party system requires compromise to be effective, but political divisions in the U.S. appear to be widening and social media sites are easily manipulated to reinforce those divisions. While online services are not the cause or only influence on this trend, current events do nothing to absolve them of some culpability.

What should be done about big data algorithms? O'Neil recommends their regulation via measurement of the hidden costs of WMDs and via audits to discover the assumptions behind such algorithms to test them for fairness. This response would be similar to the regulation put in place in reaction to the excesses that were inflicted during the 19th-century industrial revolution. However, we don't seem to be living in a political climate that would foster government regulation of private industry. Perhaps, however, in the interest of self-preservation, these services (like Google with YouTube) may attempt to better regulate their own activities. I think that YouTube's redirect method, rather than redirecting to similar content, should redirect to opposite content. If you are viewing a cat video, then YouTube should recommend a dog video. Learning how to replace a computer hard drive should be followed by how to replace a toilet (but, maybe that's not enough of an opposite.) And that Beethoven symphony should be followed up by a Metallica performance.

Image of an eye with numbers running through it.I think its best to remember when you are using Google, or Facebook, Twitter, or other similar services, you are not the customer – you are the product. Those sites are selling your attention to their customers who provide their income. We can lessen the impact of operational confirmation bias built into their algorithms by not reposting, not retweeting, not liking, and not recommending. Just consume the content. Look for opposing viewpoints. Remember that credentials count – look for the "about" link on news sites to find out who is producing the content you are reading or viewing. Is it a trained journalist writing for a major metropolitan newspaper, or some guy writing a column for an IT newsletter? (No judgment here – just an observation.)

Editor's Note: Please note that information in each edition of Benchmarks Online is likely to change or degrade over time, especially the links to various websites. For current information on a specific topic, search the UNT website, UNT's UIT Help Desk or the world wide web. Email your questions and comments to the UNT University Information Technology Department or call 940-565-2324.