How Keywords, Machine Learning Unlock Book Discovery

Jim Bryant of argues that advancements natural language machine processing leading to keywords may unlock a revolution in book discovery.

Leading publishing industry analysts started highlighting the increasing urgency of developing better ÔÇ£book discoveryÔÇØ practices more than ten years ago. Several years ago, this issue was highlighted as one of the biggest challenges facing the future of the publishing industry. Today, no one in the publishing world is seriously debating the relevancy of ÔÇ£book discovery.ÔÇØ

The challenge of ÔÇ£book discoveryÔÇØ is highlighted by the well-accepted facts surrounding the notion that never before in history have so many books been available to so many readers.

  • ┬á┬á┬á Hundreds of thousands of backlist titles are being released by publishers.
  • ┬á┬á┬á Hundreds of thousands of self-published books are being offered by ebook retailers and self-publishing platforms.
  • ┬á┬á┬á Hundreds of thousands of titles from foreign publishers are making their way into domestic supply chains.

All of this is happening of course because of the widespread adaptation of ebooks and the convenience of being able to discover and order a book online and to begin reading it instantly on our new mobile devices.

One new solution to book discovery is the use of keywords that can be extracted from the text of a book. Typically, the approach is to algorithmically deconstruct each sentence to identify each word by part or speech. Each word can be assigned a value based on its frequency of use within the book vs. its frequency of use in the English language. Keyword extraction can be further broken down to identify people, places, and a wide range of other useful entities such as the presence of SAT words (the words that we are requested to master before taking the test) or perhaps even the presence of profanity. Using keywords and sentence structure it is also possible to algorithmically measure the complexity of the story and estimate the average reading level required to read to understand the book.

Here is an example of how useful keyword extraction can be. Consider The Mayo Clinic Diet. It is easy to see from the word cloud generated below that two of the keywords are calorie and exercise. But these two words donÔÇÖt appear in the title or with the brief description. Great books like this one can be more easily discovered when these keywords are integrated into search processes deployed by ebook retailers and libraries.