Buzz is a hard thing to predict. Many have tried, and so far most have argued that this is not quite possible — see for instance Salganik, Dodds and Watts’ Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market. Yet a recent study by highly respected social graph researchers may indicate otherwise.
In Can Cascades be Predicted, Cheng, Adamic, Dow, Kleinberg and Leskovec argue that the size and the likeliness that a photo’s sharing pattern will accelerate and grow can be predicted with up to 80% accuracy. I strongly encourage you to read the whole paper, it is only 11-pages long — a lot compared to a tweet, but nothing compared to the Constitution of India. Should you however dread at such a feat, you may want to read key passages from the short Facebook post by the authors as well as from the abstract (emphasis mine):
“While there is no silver bullet for creating a photo that will achieve a large number of reshares, we show that it is possible to observe a photo being reshared, and figure out, with increasing confidence, how large it will grow.
On a large sample of photo reshare cascades on Facebook, we find strong performance in predicting whether a cascade will continue to grow in the future.
As in the experiment of Salganik et al., we find that independent resharings of the same photo can generate cascades of very different sizes. But we also show that this observation can be compatible with prediction: after observing small initial portions of these distinct cascades for the same photo, we are able to predict with strong performance which of the cascades will end up being the largest. In other words, our data shows wide variation in cascades for the same content, but also predictability despite this variation.
Rather than attempt to predict aggregate popularity or individual behavior in the next time step, we instead look at whether an information cascade grows over the median size (or doubles in size, as we later show).”
As the passages in emphasis show, this piece of research is not about predicting whether a given piece of content will turn into a viral phenomenon or a meme. It is important that we keep this in mind as many social network pundits, experts and commentators (who like nothing more than to speculate about social networks in their social network posts) will likely turn this very interesting piece of research into something it is definitely not: an algorithm that can predict buzz phenomena.
So what then can we learn from Cheng, Adamic, Dow, Kleinberg and Leskovec? Well, we can certainly start by looking at the five classes of features that help predict whether a cascade (i.e. buzz, whether in its infancy or more developed, but not a simple piece of content) will grow, at what pace, to what extent, following which pattern. These are, in ascending order from the least accurate to the most, (i) the properties of the content, (ii) the features of the original poster, (iii) the features of the resharers, (iv) the structural features of the cascade, (v) the temporal characteristics of the cascade.
Content may be king, as the saying goes, but it certainly is not with respect to predicting, in and of itself, how far it will travel on a viral path. Using this feature and this feature alone, the predictive accuracy is hardly superior to 50%, meaning that a toss of the coin will be almost as effective to predict the viral potential of a given piece of content.
“While content features affected the performance of structural and temporal features, we find that they are weak predictors of how widely disseminated a piece of content would become.”
At the other end of the accuracy spectrum are temporal features, i.e. the speed of the cascade. This is the best proxy to determine whether a nascent viral phenomenon will stay viral and it will grow. The faster contents start to spread, the more likely they are to keep doing so. This feature alone is 78% accurate to measure a viral phenomenon — however it implies that the viral episode has already started…
“Properties related to the “speed” of the cascade were shown to be the most important features in predicting thread length on Facebook, and are a primary mechanism in predicting online content popularity.”
Other features such as the characteristics of the original poster and resharers (i.e. their influence or popularity) as well as the structural aspects of the network structure (star-shaped, linear, …) are intermediately accurate (compared with the previous two) to measure the growth of a cascade. These five features, taken together, will yield an 80% accuracy rate.
To put it in a nutshell, Can Cascades Be Predicted is sure to become a seminal piece of research, helping us understand better how viral phenomena evolve and grow, potentially paving the way for ab-initio predictive measures (i.e. before the buzz even starts), but we’re not there yet…
PP (for post post, as in PS for post scriptum)
I chose to write this post as a reaction to Influencia’s article on Can Cascades be Predicted and to the knee-jerk social network mentions thereof by digital marketers of all sorts. I am grateful to the article’s author for foregoing phony experts and, for once (among the marketing trade media) choosing to put forward a very interesting and serious piece of research. However I am, as always, disappointed to see that the article somewhat misleads readers into believing that there is now a formula that can predict buzz — as the excerpts below show (in French, including my comments; emphasis mine):
“Les scientifiques ont en effet mis au point un ensemble de variables capables de prédire avec précision, dans 80% des cas en moyenne, la viralité de n’importe quel événement.”
Rappelons que l’article scientifique ne porte que sur les photos (pas “n’importe quel événement”) et sur la prolongation ou l’accélération d’un phénomène viral (une cascade) déjà entamé. D’ailleurs la toute dernière phrase de l’article est bien plus fidèle aux prudentes conclusions de l’étude.
“Les scientifiques n’ont, certes, pas trouvé LA recette toute simple pour contrôler à 100% les enjeux du viral. Mais “Cascade” et ses indicateurs sont plutôt précieux pour les marques toujours en quête de performance pour conquérir des clients et en fidéliser d’autres.”
Par son titre (“prédire le big buzz”) et son champ sémantique voire sémiotique (“Les scientifiques”, “Le logiciel”, “L’algorithme”…), l’article d’Influencia induit le lecteur, dont l’oeil passe furtivement d’un tweet ou d’une newsletter à l’autre, en erreur. Il laisse à croire que ça y est, eurêka, a été inventée la formule magique qui nous permettra de déterminer à l’avance quel contenu sera viral, et lequel ne le sera pas. Aussi, pensera-t-on alors, bien peu avisé celui ou celle qui n’utilisera pas ce “logiciel” pour créer à coup sur des contenus viraux (sic).
Finalement c’est davantage “la cascade” de tweets épidermiques que l’article qui me chagrine — ce qui me fait dire qu’on peut donc prédire avec un fort degré de probabilité ce qui crée le buzz parmi les professionnels du marketing : un titre et quelques mots-clés bien choisis…