Reading Between the Lines

Using review data to understand book readers’ reactions

Henry Chapman, Research and Insights Analyst

Infegy is best known for our widespread and deep-diving analysis of social media-based datasets. However, we can also apply our decade-plus experience with natural language processing to other fascinating, unstructured textual data. Infegy offers the ability to access other publicly available data via the internet like the data used in this blog.

To show you how this works, we'll analyze Goodreads reviews from The Woman, currently the most popular book in the United States (March 2024). The Woman tells the story of a woman from California joining the military as a nurse right after the start of the Vietnam War. Since its release in early February, it's had rave reviews across critics, which praised its depiction of post-traumatic stress and how it treated the complicated and controversial subject matter. Goodreads is the go-to spot on the internet, along with Amazon book reviews, to give authors and publishers glimpses of how their books are performing from actual readers, so it's the perfect spot to get a perspective on what people think of the book.

A look at our dataset

First, let's take a look at our dataset. To conduct this analysis, we collected all 24,096 reviews that people left on Goodreads about The Woman.

Image 1 - Follower Count
Figure 1: Total reviews we collected from Goodreads (Jan. 2024 through March 2024); Goodreads review data.

While the Goodreads API gives many different data points you could analyze, we collected these:

  • Name: Name of the reviewer
  • isAuthor: Whether the reviewer is also an author
  • followersCount: How many followers the reviewer has
  • textReviewsCount: How many reviews the reviewer has written
  • createdAt: When the reviewer wrote the review
  • rating: How the reviewer rated the book (1-5)
  • Text: the actual text context of the review

Review-based sentiment analysis

We imported all 24,096 reviews into Infegy Starscape, creating a custom dataset. As previously mentioned, while we built our company reputation on analyzing social data, our AI-based natural language processing is adept at handling all sorts of unstructured text.

Comparing reviewer scores with Infegy’s sentiment analysis

Let's delve into how reviewers thought about the book. Figure 2 presents the distribution of book reviewers' 1-5 star ratings. It's worth noting that The Woman was a resounding success on Goodreads, with 93.6% of reviewers awarding the book either 4 or 5 stars.

Image 2 - Bar chart (starscape)
Figure 2: Distribution showing how common each type of review was within our dataset (Jan. 2024 through March 2024); Goodreads review data.

Now that we know how reviewers scored the book, let's look at how Infegy did. You'll note a similarly skewed distribution: We classified most reviews as positive, with a small subset being neutral. This makes a lot of sense—when we train our sentiment analysis engine, we use all sorts of customer reviews (movie reviews, product reviews, book reviews), so our engine is trained to work well on data just like this.

Image 3 - Sentiment distribution
Figure 3: Infegy's sentiment distribution of Goodreads reviews of The Woman (Jan. 2024 through March 2024); Goodreads review data.

Diving into the substance of what reviewers said

After reviewing the book's overall sentiment, let's dive into what reviewers thought. To generate Figure 4, we categorized all the 26k reviews as positive or negative, then aggregated the most meaningful nouns, adjectives, and verbs that appeared. On the positive end, we found that reviewers loved author Kristin Hannah's take on PTSD and the main character, Frankie. On the negative side, reviewers thought Hannah relied on too many cliches and found the novel repetitive. Negative reviewers also seemed to dislike the Barb character. Remember that negative reviews only comprised ~10% of the dataset, so be sure to take their criticism within the appropriate context. Accurate, aggregated topic analysis is crucial for authors and publishing houses looking to receive feedback on what made a book successful or resonate with an audience.

Image 4 - Word cloud (1)
Figure 4: Top positive and negative topics Goodreads reviewers wrote about when discussing The Woman (Jan. 2024 through March 2024); Goodreads review data.

Trend-based estimation

Having a better understanding of what book-specific topics resonated with readers, we'll now walk through some trend-based analysis based on the fields we've collected from GoodReads. This type of trend-based analysis can be crucial if you work at a publishing house. If you can target how metrics change across a successful book release, you can understand which reviewers you need to get your book in front of to guarantee a successful release.

Average follower count vs. review date

First, let's look at how the average reviewer follower count changes pre-and-post book release. The Woman was released on February 6, 2024. You'll note that the average follower count of reviewers surges just before the release date. These accounts represent the big dogs within the GoodReads community - the highly influential reviewers who can make or break a book release based on how their initial reviews fall. Pretty quickly after the release, however, the average follower count drops substantially as more of the more ordinary, average reviewers take over. If you're a publisher, you must get your book in front of the right people well before the release date so you make sure they're writing influential reviews of your book.

Image 5 - Average follower count
Figure 5: Changing follower count vs. review date (Jan. 2024 through March 2024); Goodreads review data.

Average number of reviews written vs. date

Let's get a different view on the reviewer type that reinforces our hypothesis that you must get your book in front of the right reviewers to succeed. In Figure 5, we're looking at the average number of reviewers a reviewer of The Women has written before the book's release date. You'll notice a similar trend to what we saw before - pre-release date, the average account that has written a review of The Woman has written 3x the average number of reviews as post-release date. This finding reinforces our position that you need to target those more prominent GoodReads reviewers to get your review seen.

Image 6 - Average review count
Figure 6: Changing average number of reviews versus release date (Jan. 2024 through March 2024); Goodreads review data.

Takeaways for your brand

We used Infegy Starscape with custom datasets to illuminate the overwhelmingly positive reception of The Woman, with a striking 93.6% of reviewers rating it 4 or 5 stars. This data, complemented by trend-based analysis on reviewer engagement metrics such as follower count, number of reviews written, and likes received, underscores the critical importance of engaging influential reviewers before a book's release. We offer actionable intelligence for publishers aiming to maximize the impact and success of future book releases. Through strategic engagement with critical reviewers, publishers can significantly enhance the visibility and reception of their books in highly competitive markets.

Stay up-to-date with Infegy insights by subscribing to our blog