Social Listening - Use of Rules Vs Machine Learning
Signal and Noise
Hewlett Packard has a filtering problem on their Twitter channel. The tech giant’s customers use the terms @HP, #HP, HP to reach out to them via tweets. However when Harry Potter fans submit posts on Twitter they often also use the terms @HP, #HP, HP to refer to their favorite Harry Potter movies and books. Twitter users describing the engine capacity of their cars often use HP as an abbreviation for horsepower. Hewlett Packard’s default listening around the terms @HP, #HP, HP makes for a cluttered inbound feed that contains multiple thousands of posts per month that are of no relevance. Chase Bank has to contend with posts from doting teens tweeting about boyfriends named ‘Chase’. They also hear tweets about people that are seeking to ‘chase’ their dreams. The team covering the Twitter @United handle for United Airlines always know when the English soccer team #Manchester #United have a game, and when there is big news regarding the #United #Nations.
Companies have a choice of technologies that they may use to separate the signal from the noise.
Rules
The use of rules - this approach may include the use of Boolean operators (such as AND, OR, NOT) - is a popular method that most social media monitoring systems employ to enable users filter inbound data. Scripts are developed to evaluate and determine whether posts or comments containing certain keyword patterns should or should not be included in a data stream. Going back to the HP example, a basic script might be something like: Include (HP or #HP or @HP) NOT (Hogwarts). Hogwarts is a popular term used in Harry Potter books.
Machine Learning
The machine learning approach is more simple. It does not require the user of the system to craft complicated scripts but rather relies on the system to understand attributes and patterns around text. For example, take the need to refine Hewlett Packard’s Twitter stream in order to identify posts that are actionable. A machine learning algorithm would analyze the posts that have been replied to (i.e. acted upon) to draw up a pattern of the posts that it characterizes as actionable. For example, it would see the pattern that posts containing #HP plus hogwarts tend not to be replied to by the Hewlett Packard support team, so the system would learn that these posts would be identified as not actionable. A similarity index is be used to compare future posts against the pattern that has been built up and a determination is made as to whether any future posts should be classified as actionable or not.
Email Similarities
We all like to stay organized. Suppose you’re working on project called XYZ, you might want to store all the emails regarding project XYZ in a particular folder. That’s a drag to do manually. So you’d want to create a rule in your email application to automatically find and move those messages relating to project XYZ to the appropriate folder. In this type of example the approach for developing a rule in your email system (similar to using Boolean operators) works just fine.
But maybe the plot thickens. Let’s say that it emerges that two of your work colleagues have updated the signatures that they place at the foot of the emails they compose. As part of their departmental responsibilities they indicate that they’re “...working on Project XYZ” in their signatures. Darn. Since the rule you created listens for mentions for project XYZ, now you’re getting all of the incoming email from these two colleagues, not just those that are specific to Project XYZ. No need to worry too much though, you can go back to the rule you created and add a condition to not include emails where the text “working on XYZ” is present.
When the data that you’re working with is simple, known, and easily defined, a rules based approach may work out OK. But in order to keep precision high, each rule needs to be edited each time you encounter an exception such as the one relating to “working on XYZ” in the email signature. The issue with a rules based approach is that can become complicated and scripts may become difficult to maintain when listening for a broad range of ambiguous terms.
Controlling Email Spam
Google entered the market for free web based email in 2004. The space was already packed, with offerings from Yahoo, Microsoft (Hotmail), AOL, and many others. At this time, spam was a bigger problem than it is today for many email users. In order to be successful, Gmail needed to lure users away from entrenched competitors. One of the differentiators that separated Gmail from the crowd was its improved ability to handle spam. Google vastly enhanced email spam filtering by using advanced machine learning methodologies. When a Gmail user clicks the ‘Report Spam’ button, the Gmail spam algorithm logs the message as spam. This signal is centrally recorded so that others in the Community of Gmail users will be less likely to receive that spam message. Importantly, other emails containing similar attributes and characteristics would also be marked as spam. (More on Goggle’s use of machine learning in controlling spam)
The Future for Social Listening
Going forward it’s pretty certain that the volume of traffic (and as a consequence the noise) on social networks will continue to increase. In an effort to filter engagement channels, some companies have created custom handles such as @HPSupport. Creating a rules based approach around these custom handles may be an adequate solution to zone in on some of the outreach from customers. If companies are really intent on listening to and participating in conversations containing obligations and opportunities relating to their brand outside of the narrow lens of a customized engagement handle, a rules based approach is unlikely to be feasible. A machine learning approach similar in use to the one used to solve email spam seems like it will become a necessity. Machine learning doesn’t carry the burden of developing and maintaining complex scripts. The machine learning experience is also more simple and potentially more precise. Rule-based methods don't track changes and don't improve based on patterns of usage. The built-in flexibility around machine learning may be facilitated by a graceful workflow where team members engaging or rejecting (effectively marking as spam) those posts that are not relevant teaches the system what should be listened to, and what is noise.
Author: Steve O’Donoghue, Solariat Inc.
SocialOptimizr (a Solariat product) focuses on machine learning and natural language processing to bring leading edge social listening, filtering, and prioritizing to customers.