IBPA On AUTHORS.me: ‘Finding Real Creativity With Artificial Intelligence’

IBPA on AUTHORS.me: ‘Finding Real Creativity with Artificial Intelligence’

AUTHORS.me is a technology platform that uses data from bestselling books to evaluate content and connect writers with publishers.

If you’re like me, when you hear the phrase “intelligent machine learning,” you feel the opposite of intelligent. Maybe you picture a robot, but then you move on. It’s definitely not a traditional publishing term. So when I heard about AUTHORS.me, a platform created to modernize the acquisition process in the publishing industry through the use of intelligent machine learning, I thought: “OK, what is it? And, more importantly, does it work?”

Creating a Baseline

Intelligent machine learning is when a computer learns without being explicitly programmed. You might remember Watson, the computer that defeated two human contestants on Jeopardy!Watson was an example of machine learning. Instead of responding to 1980s baseball trivia in the form of a question, AUTHORS.me uses machine learning to parse out insights in creative work in order to identify the appeal of that work.

Read more at Independent Book Publishers of America.

Machine Learning & Essential, Actionable Insights For The Publishing Industry

Machine Learning & Essential, Actionable Insights for the Publishing Industry

Machine Learning & Essential, Actionable Insights for the Publishing Industry
What we can learn from advanced algorithms and what they hold for the future
BY MONICA LANDERS, CEO OF AUTHORS.ME

We began this company as a standardized solution to the laborious and inefficient methods of the traditional query process which is often painful for individual authors as well as publishers and studios. We’ve evolved this platform into a breeding ground for dynamic, transformational publishing technology that benefits every part of the industry. More than two years out, we have developed exciting technology that forecasts successful projects. After many conversations with industry professionals, we are more confident than ever that it can be changed for the better through technology and the essential insights it stands to receive.

Data-Driven Operations and Collection
Our platform is not only robust, it’s extremely effective.

Without getting into how the platform works (you can learn that here, here, and here), let’s look at the state of the industry’s submissions and publication statistics. According to writer Joseph Epstein, at any given moment 200 million Americans have a book they want to publish.  Digital Book World  surveyed writers and discovered that more than 60% submitted their work to a publisher or agent the previous year.

From that, we can estimate that anywhere between 125 million writers submit manuscripts to publishers and agents in the US every year (though anecdotal evidence from agents may suggest more like 20 million submissions every year3). This means each publisher/agent is receiving  between 3,000 and 20,000 submissions a year1. So, the likelihood that a given submission will be published is between just .25%2 at the conservative end and 15% at the optimistic end and, more importantly, the likelihood a manuscript will be rejected or ignored is up to 99.76%  

It’s a wonder anyone tries at all. But, the fact that they do means that there is real worth in trying to make the system work better.

With the AUTHORS.me platform, writers are 7 times more likely to be accepted and 13 times more likely to get positive, forward movement for their manuscript.  

For many writers the most frustrating part about the submissions process isn’t being declined.It’s not knowing where you stand or what is going on. While it is an ongoing process to get publishers and agents to use our workflow statuses to accurately represent when they review work, 39% of writers who submit through us know concretely that their work has been declined and just 56% in the lifetime of the platform are awaiting review. Considering that lifetime submissions on the platform nearly doubled in the past three months, that is a true improvement to the norm.

 

Transformational, Dynamic Development
Algorithm Progress & Essential Insights

For the past year and a half, our developers have been honing and deepening a patent-pending algorithm that delivers the probability an individual manuscript could be a bestseller. The underlying goal is to offer a product that helps publishers, agents, and production companies identify and act on lucrative properties more quickly and with increased acuity. Our belief is that this technology offers the industry transformational power driven by actionable insights.

As the work went on, we discovered that beyond just that singular determination, the program was able to identify strengths and weaknesses within a given piece of writing that could, by an editor or a writer, be turned into essential, actionable insights that both expedite and strengthen the editing process and go-to-market plan.

For example, a report may detect room for improvement in areas such as redundant phrasing, incomparable constructions, or explicit language use. Editing with specific actions or recommendations is far easier and less overwhelming. A manuscript that seemed like it just wasn’t working now has a dynamic road map for revision.

Likewise, the sentiment analysis and comparable literary archetype can, to an industry professional, become a keen market insight that allows for a faster, more objective method of finding comparable titles and, informed with a title’s less obvious but no less essential common characteristics, possibly expand the target audience. In an industry with a reputation for homogeneity both in representation and delivery, these kinds of tools bolster objectivity and in turn create a more diverse landscape.

With these insights in mind, we launched the first iteration of our Intelligent Editorial Analysis Reports in partnership with BookLife, a Publisher’s Weekly website that seeks to provide self-published authors with resources, community, and platform elevation.

The report uses the technology we have been developing for publishers and enterprise entertainment companies and delivers digestible, actionable insights on an individual manuscript. Anyone can upload a manuscript and receive feedback on elements of their writing from style and grammar to syntax and literary device implementation. It points out areas for potential revision as well as commendations for markers of “good” writing. It shows the writer the manuscript’s literary archetype based on sentiment analysis, and it also delivers a numerical evaluation of their manuscript in comparison to best sellers.

The road to get here was full of curious, fascinating experiments and realizations. Let’s look at a few of them.

 Archetypes

We’ve analyzed thousands of books—bestsellers, mid-list titles, backlist classics, self-published books, and unpublished manuscripts—to develop and improve the algorithm. One of the measurement points is a sentiment analysis, which when translated into a plotted arc resembles the narrative arc of the story in question. In performing these thousands of calculations, we discovered that there are measurable differences in manuscripts down to the tenth decimal point.

That is how unique each piece of text is, and how quickly and easily a computer can prove it. Within those fractions of variation exist essential insights into tone, character, style — the possibilities for measurement and action are boundless and thrilling to data scientists and forward-thinking literary analysts alike.

Sentiment Analysis — Part of a Whole

Many teams are working on understanding NLP better and we’ve been able to incorporate into our systems training some of the smartest APIs available., such as Microsoft’s Watson program. In one of these tests, we asked Watson to analyze the sentiment arc for particular  parts of a book; specifically different characters and settings (it’s fascinating but not altogether surprising that an element can have a narrative arc, but that’s a story for another day.)

When we ran these results back through our own program and analyzing bestsellers, we found that the overall sentiment arc plays a much larger role in determining a title’s comparability to the standard bestseller profile than that of any one part of the book. Or, in simpler terms: a book is more than the sum of its parts. The romantic part of me somehow thinks we all knew this already, but this is documented, objective proof to that effect.

Data & Publishing
Finding Meaning in the Numbers

The larger point is that in all of this work, we are discovering objective key performance indicators of raw text that transform previously elusive, ephemeral qualities of writing into quantifiable, measurable, and meaningful data points.

With this information, editors and writers alike can optimize their own individual approach to their work. The industry and community at large can harness the raw power of Big Data to stake claim to their creativity and carve out pieces of the market that fit best, not just fit now.

We’re seeing that the increased prevalence and use of data in publishing doesn’t have to mean a withering competitive landscape, but instead a richer, more vibrant one where the bar is continuously raised, met, and transformed altogether.


1 There are roughly  6,080 traditional publishers and agents in the US.  [2014 SUSB Annual Data Tables by Establishment Industry]

2 In 2016, 311,723  books were traditionally published in the US. [International Publishers Association]

3 In our initial R&D, polled agents and editors who accepted unsolicited work reported an average of 100 submissions/week. 

Who Does It Better,  Humans Or Machines?

Who Does it Better, Humans or Machines?

Who Does it Better?
Humans or machines?
Machine Learning and the Future of Creative Development
human-v-machines

Data analysis and data mining have been applied to traditional retail spaces for the past two decades. In recent years, some companies stepped into that space to cater to publishers and booksellers to benefit marketing and sales. And even more recently, the phrase “machine learning” has entered the conversation in that industry.

In a nutshell, machine learning “lets computers learn without explicit programming. In analysis, the technology uses algorithms that learn from the data, and, in turn, grow and change when exposed to new information, ultimately uncovering those all-important insights.”[Google]

[aesop_quote type=”pull” background=”#282828″ text=”#ffffff” width=”40%” align=”right” size=”2″ quote=”Computer science isn’t as far removed from the study of literature as you might think.” cite=” Inderjeet Mani ” parallax=”off” direction=”left” revealfx=”off”]

Could this technology be applied to the entertainment industry at the point of acquisition or as an enterprise quality control? The results across other industries who have adopted it (like manufacturing, retail, healthcare, travel, hospitality, financial services, and energy and utilities) are incredibly promising.

In “How Analytics and Machine Learning Help Organizations Reap Competitive Advantage,” MIT Technology Review in partnership with Google Analytics 360 Suite reported  that companies in the top third of their respective industries using data-driven decision making were “on average, 5% more productive and 6% more profitable than their competitors.”

So that’s not just an increase in their own profits and productivity due to implementing a new tool. That’s over their direct competition. But what could this technology do that it isn’t already doing for the entertainment industry?  

What if instead of waiting on sales reports to identify trends post-publication, content trends could be projected based on historic data?

What if machine learning could help publishers, agents, and studios not just identify consumer trends and habits, but predict content reception?

Consumer data is extensive, detailed, and can indicate not just what consumers are responding to, but why. Once those similarities are established, couldn’t a computer identify the less obvious but no less resonant comparable qualities, discovering fresh possibilities for titles new and old?

In short, it can.

 

To consider this in earnest, it’s important to understand that while books hold cultural weight, ultimately they are all data. But unlike the consumer data that, even formless, has proven itself so valuable, text has inherent rules and formations. And each deviation from one rule has its own set of rules and acceptable actions.

These rules are grammar, syntax, and archetypes (which are themselves the subject of data scientists’ attentions); those deviations are diction, literary devices, colloquialism and dialect, pastiche and tone. When book clubs discuss the obvious connection of the blue door to the heroine’s inevitable death, they are discussing the pattern identified in a semi-closed system. They are performing one tiny fraction of the computational potential of an algorithm trained on literature.  

Considering this perspective, it’s not difficult to imagine how a computer could evaluate and identify traits of a book. But could it identify a bestseller? (The short answer is yes, but it’s quite complicated,  according to Jellybooks founder Andrew Rhomberg)

At its core, that question belies the understandable fear that a computer will determine what is “good” and cast off the rest. But doesn’t that fear in itself really short-change humans in general? Like any tool, it is as strong as those who wield it and as multifaceted as them as well.

OK, so a computer can do it. But should it?

 

Many think, yes, it should. Take book discovery for an example.

Though now a foregone conclusion, book discovery was the concern du jour of data-minded publishers 10-15 years ago. How would we get readers to the books? How would the reader discover the books? As a result, a flurry of startups emerged that gave consumers new ways to discover, consume, recommend, and interact with their favorite stories. It wasn’t that the traditional methods of consumer discovery weren’t working, but they could work better and produce more consistent, profitable results.

The same logic can and should be applied to the systems in the entertainment industry that acquire and produce consumable media. There’s no lack of raw material, sure, but is the current query and acquisition system bereft of room for improvement?

No.

Simply put: A human cannot read as much as a computer, because a human gets tired, distracted, hungry, etc. A computer does not feel, so it does not want or need. It simply does. And where computers finish, we pick up.

Humans are amazingly creative, observant, and resourceful. But at a certain point, your brain just can’t process any more data any more quickly.

A computer doesn’t have that problem. An algorithm doesn’t get a tension headache from the hours spent reading. Machine learning  software doesn’t tire of reading a story they think they’ve heard before and move on; it reads the entire book in a few seconds.

QUOTE

RATHER THAN JUST REPORTING INFORMATION AND TELLING YOU WHAT’S WRONG, MACHINE LEARNING TECHNOLOGY CAN HELP YOU FIX IT. [AND] HELP YOU DO MORE OF WHAT’S WORKING—AND DO IT AUTOMATICALLY.

One benefit of a computer doing the analysis is that it can discover the latent aspects of a novel. So, while authors may have their own identifiable style, there are underlying components of a novel that a computer can see better than humans. And what we’re betting on is that underneath the facade of a book lies a story that connects a reader to that text. That’s what we’re uncovering.

So, could such a program recognize the outlandish genius of someone like David Foster Wallace, James Joyce, or Kaye Gibbons? I wouldn’t bet against it in the future; but that’s not entirely the point.

If we only adopt technology that feels familiar, safe, and ultimately just performs parlor tricks, we do ourselves and our industries a disservice.  We leave the possibilities of customization and adaptability of machine learning and artificial intelligence behind.

This does not necessarily mean that the goal of machine learning or artificial intelligence is to replicate human creativity (though it might at Wattpad). But instead amplify it, to embolden the editors, agents, and publishers with more data than they could ingest in a lifetime, distilled into actionable insights.

“In this digital future, using machine learning platforms can provide publishers with opportunities to get real-time information about their readers, figure out what is working in the marketplace, and, perhaps, make the bestseller lists more of an accurate depiction of what readers want to read, not simply what is available,” commented Intellogo founder and CEO Neil Balthaser in an op-ed in Digital Book World.

Imagine a day when we take all our data about what people are reading and provide publishers (and authors) ideas of what people want to read, where to find those audiences, and better ways to reach them. This is the model that the film and television industries are already moving toward—with the help of Netflix and Amazon—so why shouldn’t book publishing take advantage of this market information? This type of decision support has not been possible up to this point, and publishers have often published books blindly, hoping that they would find the right audience and sell well.

Though ‘big data’ can be a taboo subject when we talk about the romance of publishing, there are undeniable benefits to be had from using platforms that give publishers and authors information from which they can make informed decisions on how to invest their time and money.

Distinguished Google engineer Sagnik Nandy explained to the MIT Technology Review that where traditional analytics relied on the idea that people would access tools and already know what questions should be asked, based on the data.  

“But today, everything is changing so fast as businesses evolve, and there’s a lot going on that you might not be able to see,” Nandy noted.

The beauty of machine learning, as Nandy put it, is that extracts “all the information you’re not asking about. Once you have that information, you can generate insights even before a question is asked. That can be a huge competitive advantage.”

Whoever you are your competitors will implement machine learning and will take the advantage. The question entertainment companies should be asking themselves shouldn’t be “Will this threaten/cheapen art,” but “How can I implement this yesterday?”

In the end, this is just one more tool to add to the creative, entrepreneurial arsenal. As  computational linguist Inderjeet Mani so eloquently described in Aeon,

Those who resist the temptation to unleash the capabilities of machines will have to content themselves with the pleasures afforded by smaller-scale, and fewer, discoveries. While critics and book reviewers may continue to be an essential part of public cultural life, literary theorists who do not embrace AI will be at risk of becoming an exotic species – like the librarians who once used index cards to search for information.


Further Reading

Writing the Book on Artificial Intelligence: LBF Quantum’s Nick Bostrom” by Mark Piesing, Publishing Perspectives.

“Machine Learning and Bestseller Prediction: More Than Words Can Say”by: Chris Sim. Digital Book World.

Quantifying the Weepy Bestseller” by Andrew Piper and Richard Jean So Republic.

Sentiment Analysis & Book Publishing

Sentiment Analysis & Book Publishing

A few weeks ago, I talked to our developers about a phrase I heard them throwing around a lot: Sentiment Analysis. Once they finished their explanation, I immediately asked them to do a write up for the blog. This is fascinating! The people must know!

And…they gave me a blank stare, a kind smile and then promptly went back to work (as they should). So, with their guidance and fact checking, I’ve tried to translate their detailed, data-rich reports and updates for your reading pleasure.

So, to start off, what exactly is Sentiment Analysis and why are we talking about it? Simply put, sentiment analysis is one way of looking at books and is one of the analytic methods we use to analyze manuscripts. Technology can actually interpret the very life and breath within a manuscript.

How does that work?  

Sentiment Analysis 101

At a glance, sentiment analysis is fairly straightforward: text is analyzed and, using natural language processing, each part of the text is categorized positive–happy statements–or negative–sad/angry statements. Within each language, words can be determined positive (elated, kiss, jump), negative (smash, kill, cry), or neutral (the, a, road). Take these altogether and graph the results, and you can see the emotional arc of a text mapped out in a physical form, called a sentiment map.

Recent Sentiment Analysis Study —  Best Sellers

If you’re still with me, maybe you’re thinking: what can the sentiment map tell us about plot? We mapped out three extremely popular best sellers — Fifty Shades of Grey, The Girl With the Dragon Tattoo, and Gone Girl–to illustrate how these novels are constructed and why some books are said to be more surprising than others.

All three of these books buck the conventional trend of a high, happy opening and a high, happy, neatly wrapped up ending: they all have endings that are dramatically lower in sentiment than the highest point of the book or even the beginning. And, looking at the plot points for each, the graphs make sense.

*Spoiler Alert for all three novels*

1. 50 Shades of Grey  by E.L. James

Sentiment Analysis of Fifty Shades of Grey

 

Unlike most romances, the end of Fifty Shades is not happy.  You can see the sentiment taking a sharp decline in the last few pages as the book ends with the main character crying, swearing she never wants to see her lover again.

2. Girl with the Dragon Tattoo by Stieg Larson


sentiment analysis of Girl with the Dragon TattooThe major dip around 3/4th of the way through the
The Girl With the Dragon Tattoo is when Mikael is trapped by Martin Vanger and almost dies. The sentiment rises as Lisbeth frees him, and continues to rise as Mikael publishes the expose on Wennerstrom. The sentiment heads back downward through the end of the book as  Lisbeth goes to tell Mikael she loves him, only to find him with Erika Berger (Poor Lisbeth 🙁 hasn’t she been through enough?!)

3. Gone Girl by Gillian Flynn

Sentiment Analysis of Gone GirlArguably the progenitor of a recent trend in unreliable, potentially psychotic female central characters, Gone Girl’s graph is easy to follow. Consider: in the first half, Nick is looking for his missing/possibly dead wife Amy, during which time it’s revealed how much Nick cheated on her (a LOT) the seriously fatal state of their marriage, meanwhile Nick is named the number one suspect in her disappearance. So, the general downward spiral, errr slope, of the graph makes sense. The brief, sharp climb around point 60 is when Amy comes back and it turns out Nick won’t be going to jail, but the sentiment just plummets right on down because Nick is still unhappy, Amy is still terrifying and dangerous, and life does not seem to actually be getting better.

The final, brief peak at the end is when Amy tells Nick she’s pregnant, but even that is barely above neutral. In this world, news of Amy and Nick spawning is not really good news, and the sentiment analysis confirms that. And, additionally, this bit of information may give us insight to why so many people read this book and thought “WTF IS UP WITH THAT ENDING?!” It’s because most novels don’t end with a fairly neutral ending; so the reader is left feeling unsettled, as if  there is something missing that they can never recoup.

 

Had I convinced my technology-minded colleagues to write this blog, it may have ended with a discussion of linear regression, decision boundary, sentiment vectors. and macro-arcs. But since my background is in publishing,  I will end more philosophically. Thinking about editing and reviewing, this kind of technology is so exciting. It gives an editor another way to explain the pacing of a book to their writer. It gives a publisher the ability to look at the breadth of their work to see what kind of brand and niche they are establishing and where the holes may be that they could fill. It gives writers an ability to objectively see if the impact they are trying to achieve is actually coming across. And these are just a few of the possibilities. Sentiment Analysis and other machine learning actions are just additional tools in the hands of smart literary professionals. It’s a way to analyze across books and within books. And a way to look at thousands of books in the same time you or I can review one or two.

 

More on Sentiment Analysis:

 ACM Digital Library

Mashable