Data analysis and data mining have been applied to traditional retail spaces for the past two decades. In recent years, some companies stepped into that space to cater to publishers and booksellers to benefit marketing and sales. And even more recently, the phrase “machine learning” has entered the conversation in that industry.
In a nutshell, machine learning “lets computers learn without explicit programming. In analysis, the technology uses algorithms that learn from the data, and, in turn, grow and change when exposed to new information, ultimately uncovering those all-important insights.”[Google]
Computer science isn’t as far removed from the study of literature as you might think. Inderjeet Mani
Could this technology be applied to the entertainment industry at the point of acquisition or as an enterprise quality control? The results across other industries who have adopted it (like manufacturing, retail, healthcare, travel, hospitality, financial services, and energy and utilities) are incredibly promising.
In “How Analytics and Machine Learning Help Organizations Reap Competitive Advantage,” MIT Technology Review in partnership with Google Analytics 360 Suite reported that companies in the top third of their respective industries using data-driven decision making were “on average, 5% more productive and 6% more profitable than their competitors.”
So that’s not just an increase in their own profits and productivity due to implementing a new tool. That’s over their direct competition. But what could this technology do that it isn’t already doing for the entertainment industry?
What if instead of waiting on sales reports to identify trends post-publication, content trends could be projected based on historic data?
What if machine learning could help publishers, agents, and studios not just identify consumer trends and habits, but predict content reception?
Consumer data is extensive, detailed, and can indicate not just what consumers are responding to, but why. Once those similarities are established, couldn’t a computer identify the less obvious but no less resonant comparable qualities, discovering fresh possibilities for titles new and old?
In short, it can.
To consider this in earnest, it’s important to understand that while books hold cultural weight, ultimately they are all data. But unlike the consumer data that, even formless, has proven itself so valuable, text has inherent rules and formations. And each deviation from one rule has its own set of rules and acceptable actions.
These rules are grammar, syntax, and archetypes (which are themselves the subject of data scientists’ attentions); those deviations are diction, literary devices, colloquialism and dialect, pastiche and tone. When book clubs discuss the obvious connection of the blue door to the heroine’s inevitable death, they are discussing the pattern identified in a semi-closed system. They are performing one tiny fraction of the computational potential of an algorithm trained on literature.
Considering this perspective, it’s not difficult to imagine how a computer could evaluate and identify traits of a book. But could it identify a bestseller? (The short answer is yes, but it’s quite complicated, according to Jellybooks founder Andrew Rhomberg)
At its core, that question belies the understandable fear that a computer will determine what is “good” and cast off the rest. But doesn’t that fear in itself really short-change humans in general? Like any tool, it is as strong as those who wield it and as multifaceted as them as well.
OK, so a computer can do it. But should it?
Many think, yes, it should. Take book discovery for an example.
Though now a foregone conclusion, book discovery was the concern du jour of data-minded publishers 10-15 years ago. How would we get readers to the books? How would the reader discover the books? As a result, a flurry of startups emerged that gave consumers new ways to discover, consume, recommend, and interact with their favorite stories. It wasn’t that the traditional methods of consumer discovery weren’t working, but they could work better and produce more consistent, profitable results.
The same logic can and should be applied to the systems in the entertainment industry that acquire and produce consumable media. There’s no lack of raw material, sure, but is the current query and acquisition system bereft of room for improvement?
Simply put: A human cannot read as much as a computer, because a human gets tired, distracted, hungry, etc. A computer does not feel, so it does not want or need. It simply does. And where computers finish, we pick up.
Humans are amazingly creative, observant, and resourceful. But at a certain point, your brain just can’t process any more data any more quickly.
A computer doesn’t have that problem. An algorithm doesn’t get a tension headache from the hours spent reading. Machine learning software doesn’t tire of reading a story they think they’ve heard before and move on; it reads the entire book in a few seconds.
One benefit of a computer doing the analysis is that it can discover the latent aspects of a novel. So, while authors may have their own identifiable style, there are underlying components of a novel that a computer can see better than humans. And what we’re betting on is that underneath the facade of a book lies a story that connects a reader to that text. That’s what we’re uncovering.
So, could such a program recognize the outlandish genius of someone like David Foster Wallace, James Joyce, or Kaye Gibbons? I wouldn’t bet against it in the future; but that’s not entirely the point.
If we only adopt technology that feels familiar, safe, and ultimately just performs parlor tricks, we do ourselves and our industries a disservice. We leave the possibilities of customization and adaptability of machine learning and artificial intelligence behind.
This does not necessarily mean that the goal of machine learning or artificial intelligence is to replicate human creativity (though it might at Wattpad). But instead amplify it, to embolden the editors, agents, and publishers with more data than they could ingest in a lifetime, distilled into actionable insights.
“In this digital future, using machine learning platforms can provide publishers with opportunities to get real-time information about their readers, figure out what is working in the marketplace, and, perhaps, make the bestseller lists more of an accurate depiction of what readers want to read, not simply what is available,” commented Intellogo founder and CEO Neil Balthaser in an op-ed in Digital Book World.
Imagine a day when we take all our data about what people are reading and provide publishers (and authors) ideas of what people want to read, where to find those audiences, and better ways to reach them. This is the model that the film and television industries are already moving toward—with the help of Netflix and Amazon—so why shouldn’t book publishing take advantage of this market information? This type of decision support has not been possible up to this point, and publishers have often published books blindly, hoping that they would find the right audience and sell well.
Though ‘big data’ can be a taboo subject when we talk about the romance of publishing, there are undeniable benefits to be had from using platforms that give publishers and authors information from which they can make informed decisions on how to invest their time and money.
Distinguished Google engineer Sagnik Nandy explained to the MIT Technology Review that where traditional analytics relied on the idea that people would access tools and already know what questions should be asked, based on the data.
“But today, everything is changing so fast as businesses evolve, and there’s a lot going on that you might not be able to see,” Nandy noted.
The beauty of machine learning, as Nandy put it, is that extracts “all the information you’re not asking about. Once you have that information, you can generate insights even before a question is asked. That can be a huge competitive advantage.”
Whoever you are your competitors will implement machine learning and will take the advantage. The question entertainment companies should be asking themselves shouldn’t be “Will this threaten/cheapen art,” but “How can I implement this yesterday?”
In the end, this is just one more tool to add to the creative, entrepreneurial arsenal. As computational linguist Inderjeet Mani so eloquently described in Aeon,
Those who resist the temptation to unleash the capabilities of machines will have to content themselves with the pleasures afforded by smaller-scale, and fewer, discoveries. While critics and book reviewers may continue to be an essential part of public cultural life, literary theorists who do not embrace AI will be at risk of becoming an exotic species – like the librarians who once used index cards to search for information.
“Writing the Book on Artificial Intelligence: LBF Quantum’s Nick Bostrom” by Mark Piesing, Publishing Perspectives.
“Machine Learning and Bestseller Prediction: More Than Words Can Say”by: Chris Sim. Digital Book World.
“Quantifying the Weepy Bestseller” by Andrew Piper and Richard Jean So Republic.