Social Media Stock Market Prediction: Not My Cup Of Tea
By Daen de Leon. January 28, 2013, 9:00 AM CST
Butler’s Wharf lies on the south bank of the Thames, just to the East of Tower Bridge. One hundred years ago, tea clippers unloaded crates of tea from China and India, a commodity that made many investors rich and cemented the reputation of the Brits as a nation of tea drinkers, exchanging gossip over the Wedgewood at afternoon gatherings. After falling derelict in the early 1970s, Butler’s Wharf now plays home to up-market cafes and restaurants, with the occasional pub to leaven the heady mix. It’s also home to a new breed of financial wizards skipping tea in favor of listening to gossip.
One of those companies, Derwent Capital Markets (DCM), launched a product two weeks ago that it hopes turns this nosiness into profit. DCM Dealer is built on algorithms developed from a 2010 paper written by three computer scientists, Johan Bollen, Huina Mao, and Xiao-Jun Zeng. On the face of it, the main result in the paper is astonishing: they claim an astounding 86.7 percent accuracy for predicting the rise or fall for the Dow Jones Industrial Average (DJIA) over the 11 months from February to December 2008, based on analysis of the mood of roughly 10 million tweets. Bollen’s team used two mood-tracking tools to do this: OpinionFinder and Google Profile of Mood States (GPOMS). These three scientists trained a neural net with the DJIA prices and mood dimensions from OpinionFinder and GPOMS. To cross-validate the results, they used their system to predict the public’s response to the 2008 election results and during Thanksgiving.
I’ll let DCM explain what this means:
The paper detailed a study carried out at Indiana University whereby millions of messages on Twitter were analysed and in turn a “mood” or “sentiment” was derived. This sentiment changed over time as the world swung from feeling happy to feeling sad. The academics then found there was a very close correlation between the sentiment on Twitter and the Dow Jones Industrial Average Index (US Stock Market). The eureka moment came when they realised that changes in sentiment on Twitter preceded movements on the Dow Jones by 3 days which effectively meant investors and traders could use the sentiment information to make better informed investment decisions.
Can this really be right? Can it really be that simple? Analyze enough tweets and you get an 86.7 percent accurate prediction of the DJIA over the coming days?
Before you reach for your checkbook, the answer, of course, is: it depends.
There are a number of caveats to bear in mind about the paper and DCM.
Firstly, the paper. The authors are not economists, nor do they have a background in finance. The main evidence of this is their treatment of something called the Efficient Market Hypothesis (EMH). It exists in about three different forms, but essentially says that information is taken account of in stock prices. The difference between the three forms reflects how quickly that information is taken account of, and whether that information is fully disclosed to the public or not. The upshot is that, because stock prices are correlated with information, it is very difficult to outperform the stock market.
The paper’s authors disagree. They essentially dismiss the EMH in their opening statements, and are clearly fans of one of the EMHs rival theories, something called behavioral economics and finance, which strives to take account of the irrationality of markets by rooting finance and economics firmly in human psychology.
While it’s true that the EMH has been notoriously hard to pin down (the three forms have been developed to relax some of the stricter points of the original strong form EMH), it’s also true that information isn’t just news stories or other stock prices — it is, indeed, people’s sentiment, gossip, rumor, and general feeling. There’s a reason that insider trading is popular — you’re using information that’s not available to the public to gain a pricing advantage. To me, that’s a clear demonstration of the semi-strong form of the EMH — that only public information is reflected in stock prices. Arbitrage opportunities exist because information doesn’t travel instantaneously between markets, and pools of dark liquidity take information out of the market completely so as not to move stock prices too suddenly. And yes, Twitter sentiment is also information. All of these are accounted for by the EMH, so to bury it is premature.
Also, while the paper’s statistical approach is sound, its methodology renders that same approach useless, because the cross-validation is done on the same data set used to train the neural net. In other words, asking new questions of the neural net based only on its training set isn’t going to give you new answers. For it to make sense, a completely different data set should have been chosen. Dirk Petzoldt discusses this problem in this blog entry.
DCM itself is a curious beasty. Originally set up as a hedge fund management company (with poor timing) in 2008, the first and only fund (called Twitter Hedge Fund, no less, because of its dependence on social media data analysis as a market indicator) was launched in August 2011. This Twitter fund did rather well in the first month, returning 1.87 percent (whether that’s for the month, or annualized, I can’t find out). Unfortunately, in August 2011, the U.S. lost its AAA credit rating, and the FTSE 100 plummeted by 1,000 points. Two months later and the fund was dead. I presume that 1.87 percent gain got wiped out.
Since then, DCM downsized and moved into smaller, cheaper offices. The company also changed its business model, turning into a trading platform vendor and taking a year to develop DCM Dealer (with help, presumably, from Bollen, Mao, and Zeng). Details are sketchy, apart from the obligatory happy orange website, and so what’s available remains heavy on promise, light on specifics.
I’m sure that Sentiment, as DCM calls it (note the capital), will have advocates on one hand and detractors on another, as a market indicator. Many other market indicators exist already, and an entire branch of generating and interpreting them has sprung up over the years, rendered easy to do by the rise of the digital computer. It’s called technical analysis, and an old friend of mine who used to write trading software for Saxo Bank in Copenhagen (coincidentally the bank that the CEO of DCM used to work at) recounted a story that tickled me. This friend had a book of technical analysis market indicators he used as a reference, and every now and again, one trader in particular would secretly get his hands on the book. My friend (I’ll call him A) would always know, because the trader would come by the next Monday morning and say, “Hey, A, I noticed we don’t have this form of momentum adjusted breakout Fibonacci cycles in the analysis software.” A would then know that a particularly long week lay ahead.
There’s another way of predicting the future, which relies on the product that used to be stored at Butler’s Wharf, called reading tea leaves. Some people swear by it, others laugh at it. My feeling is that there’s something to the Twitter paper, but given the enormous complexity of decoding social media even as a human being, I’m not sure that software is currently up to the task. While the concept may be sound, I’m giving DCM Dealer a tea leaves rating for the time being.
And if I’m wrong, I’ll buy everyone at DCM a cup of coffee.
About Daen de Leon
An ex-pat Brit, Daen previously lived in Berlin, Copenhagen, and Paris before coming to California in 2009. By day a telecommuting regulatory consultant in the bioscience business, by night an off-grid electronics hobbyist, and occasional writer of science fiction. He lives in the backwoods of Northern California in a small cabin next to his parents' house. He shares the cabin with a number of invertebrates of species unknown.