Choosing a Sentiment Analysis Tool in 2023

Published 23rd November, 2023 by Claire McGregor Choosing a Sentiment Analysis Tool in 2023 diagram

To say that there are a lot of sentiment analysis tools on the market would be an understatement. Which service is the best for you? How can you be sure it will fulfill your use case? What should you be paying for a sentiment analysis tool?

In this guide we will walk though the key considerations to choose the right tool:

What data was the sentiment analysis algorithm trained on?
How does the tool handle indeterminate sentiment?
Does the sentiment analysis tool allow manual recategorization?
How will you test the tool?
Why wouldn't I just use ChatGPT for sentiment analysis?
Lastly, consider the cost

Want to analyze your app review sentiment in minutes?

Join over 25% of the Fortune 100 and 35% of the top charting app developers using Appbot.

Try Appbot, free for 14 days →

I have been working on sentiment analysis for app reviews and other forms of customer feedback for over 9 years. During that time I've answered hundreds of questions and support enquiries about how sentiment analysis works and whether it will fit a particular use case. Myself and our team have tested dozens of other sentiment analysis tools to understand their strengths, weaknesses and pricing models. These questions and experiences informed how Appbot approached sentiment analysis for app reviews, and they're also the reason why we've decided to focus exclusively on sentiment analysis for mobile apps, rather than taking the seemingly obvious path to expand to analyzing other types of feedback.

In this post I've compiled the most common questions we are asked about sentiment analysis, as well as ones that you might not think to ask until after you've purchased a sentiment analysis tool. I hope you find it helpful - please share this post if you do.

Before we get into things to consider it will be helpful to think this question through: What data are you going to analyze? If there are multiple different data sources that you want to perform sentiment analysis on, do you have budget for different tools for each source, or do you need a one-size-fits-all solution? Being clear about these questions is important, before you begin to evaluate possible solutions.

What data was the sentiment analysis algorithm trained on?

Many sentiment analysis tools use a machine learning algorithm that was trained on a certain data set, just like ChatGPT was trained on the contents of the internet. Unlike ChatGPT, however, most sentiment algorithms are trained on much smaller data sets.

There are two important components to understand when you ask this question:

1. How MUCH data was in the training data set?

You want the answer to this question to be as large as possible - hundreds of millions of data points, if possible. Without getting too detailed and nerdy, a large data set is important because it makes it a lot more likely that sound principles of data science have been applied to adjusting the output of the algorithm.

2. What TYPE of data was it?

For this question, look for an algorithm that was trained on the same data as you will use it for, or as close to that as possible. When considering if two types of data are similar it might help to think in terms of:

The usual length (in words or characters) of each single data point
Any nuances in how language is used in that type of data point. For example, in app reviews users often use odd grammar with a lot of abbreviations, extra punctuation and even emoji. Therefore, if I want sentiment analysis tool for app reviews, I will need to find tools trained on a data set that shares those characteristics.

Beware of choosing a sentiment analysis tool that claims that it will be trained on your own data. With a few exceptions in extremely large organizations, the data sets are almost always too small to make this approach viable. If this is the answer you receive, ask for more detail about how large your sample set of data will need to be, and what insight you will be given into how the output of the algorithm is evaluated once the training period is complete.

Ideally, you'll select a tool where the sentiment analysis algorithm was trained on data the same as or similar to the data you want to analyze. I learnt this lesson a very painful (read: expensive) way…

One of the most costly product experiments we have ever run at Appbot was to expand the types of data you could feed into Appbot to be analyzed. This meant that you could analyze just about any type of text you wanted. Our intention was that it would be used only for various short-length forms of customer feedback, like call centre transcripts, NPS survey comments or support email content.

Unfortunately, we received disappointing feedback about the quality of the analysis for any data source other than app reviews. To make matters worse, the only way to improve performance for other data types was to dilute the robust performance of the algorithm for app reviews; our main use case. This illustrated to myself and our team just how challenging it was to build a truly excellent, generic sentiment algorithm, and is one of the reasons we believe that a narrow focus can deliver muc more reliable, accurate and, therefore, useful results.

How does the tool handle indeterminate sentiment?

Indeterminate sentiment is when a piece of text does not clearly fall into the positive or negative category. This might be because:

it is a very short piece of text (for example, lots of app reviews just say something like “it's OK”), or
part of the text indicates positive sentiment and part indicates negative.

Some sentiment analysis tools are very opinionated. They can only output positive and negative results, and force each data point into one category or the other.

Others will offer three categories; positive, negative and neutral. In most cases the algorithm will use the neutral category whenever it is uncertain about the result. In our experience this is better than a binary positive/negative output, providing the neutral result is only returned occasionally.

Some sentiment analysis tools will take it one step further, and identify data points where both positive and negative sentiment are detected. This might be labelled as “unclassified” or “mixed” sentiment.

In most cases you will be looking for a tool that can return positive or negative result the vast majority of the time, but will flag rare occasions when the result is uncertain. This will be more likely if you have chosen a tool that supports at least 3 categories of results, and if the algorithm was trained on the same or similar data to the data you want to analyze.

Does the sentiment analysis tool allow manual recategorization?

This one is a little controversial, because a large part of the motivation to use machines for sentiment classification is usually to remove the subjectivity of human classification. Despite this, the option to reclassify sentiment was one of the most popular feature requests we received at Appbot - until we built it!

The demand for manual reclassification is easy to understand when you consider that no machine learning tool is 100% accurate. When we built Appbot's sentiment analysis algorithm our research indicated that the benchmark for a world class level of accuracy for an ML sentiment analysis tool was around 85%. Because we focus only on app reviews we were able to improve on that, and achieve an accuracy of around 93% - still not perfect, even though we had hundreds of millions of data points in our training data and only a single data type to understand.

It's inevitable that you will disagree with the machine output sometimes, whenever you perform any kind of classification exercise. Internally, it's up to you and your team to decide on a strategy to handle those scenarios, if indeed you feel that there is a need for it (hint: you probably will). Here are a couple of things to consider when deciding how to handle disagreements between the humans and the algorithm:

If your data volume is low (eg. <3 app reviews per day or 100/month) you will need to report on periods of several months, if you decide you'll allow some reclassification. Otherwise you may find that the human alterations to the results mean they are skewed one way or the other.
You may wish to consider having one or a few nominated individuals who are allowed to reclassify. This minimizes the subjective variation in classifications between different people.
Have a policy about when to reclassify, with some guidelines. For example, you might allow a more generous number of reclassifications into the neutral category, but limit the number that can be altered to positive or negative, or you might have a “two sets of eyes” policy before reclassifying a data point.
You may wish to keep a log of reclassifications, so you preserve what the machine's assessment was in case it's needed.

Essentially, you want to ensure any reclassification is performed as consistently as possible and as infrequently as possible. Also, be sure that you have enough data to mean that reclassification is not affecting the statistical significance of the result. If you are finding that you want to reclassify more than about 5% of the data points you may wish to look for a different sentiment analysis tool.

How will you test the tool?

An opportunity to test is important so that you can see how useful the sentiment analysis for a specific tool is when used on your data, as illustrated by the points about reclassification above.

Some services are entirely pay-to-play, meaning that you won't be using the sentiment analysis tools in a real-world setting until after your subscription or software product is purchased. Many will offer you an opportunity for you to see your own data in a demo setting. Be careful here. Unless you're adding the data into the service yourself, you won't have insight into how the data was handled by the sales people before you saw the results. If you do provide some test data for a sales demo, be sure to ask exactly how much manual effort was involved and whether the consultant can share with you the exact steps taken to achieve the result demonstrated.

If you'll be loading various different types of data into the sentiment analysis tool then make sure you test each different type of data and carefully review the results. In general, a tool that has been developed or trained on a particular type of data will perform better on that specific type of data than it will on other forms. We illustrated this in the first point above. A tool like Appbot is trained on app reviews. Our customers tell us it is excellent at analyzing those, but we found out the hard way that it was far less impressive on other forms of user feedback - even the ones we thought were similar to app reviews.

Our experience has been that you can often get a better result with a few purpose built tools, than one generic one.

Why wouldn't I just use ChatGPT for sentiment analysis?

This is a question I've been asked many times since generative AI has hit the mainstream. Myself and our team have tested this extensively, and the results have been… well, very disappointing. ChatGPT is not built with this use case in mind. Whilst it is truly phenomenal at summarizing huge bodies of knowledge, providing instructions, and answering questions it isn't (currently) designed to analyze a large volume of data and provide an opinion about that data. We found that we got a different result from the same set of data each time we passed it in, that it could be frustratingly slow, and that ultimately, the variation in results meant we could not be confident enough to rely on the classification it provided.

Of course, we suggest that you test it yourself if you're curious (or just because it's fun). If you manage to get a solid output repeatedly we'd love to hear how you did it!

Lastly, consider the cost

This is very individual, so I'll be very brief here. After testing, consider how much time the analysis tool will save you, how much manual intervention you envisage for results validation and reporting, and weigh those things against the price tag. Only you will know if it represents good value for your business.

Wrapping up

There are a lot of things to consider when choosing a sentiment analysis tool. Be sure to ask what data the algorithm was trained on, how it handles scenarios where it is unsure about classification and whether you can manually reclassify data points. We suggest looking for a tool that you can test yourself, under real-world conditions, without interference from sales people. And finally, remember to factor in the value equation.

We hope that you have found this guide useful. If you have any questions, feel free to email support@appbot.co, or ping us on X @appbotX.

Want to analyze your app review sentiment in minutes?

Join over 25% of the Fortune 100 and 35% of the top charting app developers using Appbot.

Try Appbot, free for 14 days →

Where to from here?

Simplify app review management: Harness the power of a review aggregator for seamless feedback consolidation.
Uncover the true sentiment: Dive into app review sentiment analysis to understand user feedback at a deeper level.
Maximize user engagement: Combine app review insights with comprehensive app analysis to elevate mobile experience.
Master the art of response: Learn the best practices for crafting an impactful app review reply to engage with your audience effectively.
Boost your app's success: Navigate app store reviews and optimize app store ratings with expert insights.

Share this article on LinkedIn Share this article on X

About The Author

Claire is the Co-founder & Co-CEO of Appbot. Claire has been a product manager and marketer of digital products, from mobile apps to e-commerce sites and SaaS products for the past 15 years. She's led marketing teams to build multi-million dollar revenues and is passionate about growth and conversion optimization. Claire loves to work directly with the world's top app companies delivering tools to help them improve their apps. You can connect with her on LinkedIn.

Enjoying the read? You may also like these

App store review analysis: A complete guide

The complete guide to app store review analysis to help app developers improve their app, boost their star ratings and drive higher downloads and revenue.

App pricing strategy 101: Roadmap to pricing your app

Whether you're launching a new app or looking to optimize an existing one, this guide will empower you to level up your app's profitability.

Mobile app growth strategy - The definitive guide

Master the art of mobile app growth hacking with this definitive guide. Explore 5 key areas: Market, Storytelling, Capturing the Market, Downloads & Conversions.

The app review & rating checklist

In this article we explore best practices used by top charting apps to get more (and better) reviews and put review feedback to good use.

The big obvious mistake we made naming our app Latest

Responding to app store reviews: the staggering growth

5 tips for managing app store reviews and ratings

Choosing a Sentiment Analysis Tool in 2023

Want to analyze your app review sentiment in minutes?

What data was the sentiment analysis algorithm trained on?

How does the tool handle indeterminate sentiment?

Does the sentiment analysis tool allow manual recategorization?

How will you test the tool?

Why wouldn't I just use ChatGPT for sentiment analysis?

Lastly, consider the cost

Wrapping up

Want to analyze your app review sentiment in minutes?

Where to from here?

About The Author

Enjoying the read? You may also like these

Ready to better understand your apps?