WordStat by Provalis Research

Version 8 - New Features

Fast and precise processing of large amounts of unstructured information.

WordStat is now available as a stand-alone text mining platform

With WordStat 8 users can import data and analyse it directly in WordStat without the need to purchase QDA Miner, SimStat or other products. This cuts down on the complexity and learning curve as users can now create their projects directly in WordStat.

AND... WordStat 8 is still integrated with QDA Miner, SimStat and Stata. Should you need to manage your data in one of these other packages or use the features of those other packages you can still manage your data in the QDA Miner, SimStat or Stata and call WordStat for your text mining and analysis of unstructured data.

More data sources

You can now create projects in WordStat 8 from more data sources:

  • Documents: MS Word, RTF, PDF, HTML, etc.
  • Data files: Excel, CSV, Stata, etc.
  • Web survey platforms: SurveyMonkey, Qualtrics, SurveyGizmo, etc.
  • Reference management tools: Endnote, Zotero, Mendeley
  • Social media services: Twitter, Facebook, Reddit, RSS Feeds, Youtube
  • Email platforms: Outlook, Gmail, Hotmail, Mbox, and EML format

Explorer Mode

The new Explorer mode allows users with little text mining experience to quickly and easily extract meaning from large amounts of text data.

  • Identify the most frequent words and phrases.
  • Extract the most salient topics in your documents.
  • Improved topic modeling tool of WordStat 8.
  • Switch between Explorer and Expert mode to access all of WordStat features including content analysis dictionaries, crosstabs, and cooccurrence analyzes features.

Improved topic modelling

Advanced Topic Modelling

Numerous improvements such as:

  • An additional extraction algorithm (NNMF) for faster topic extraction; and
  • An innovative topic enrichment process. This technique moves beyond the “bag-of-word” solution typical of traditional topic modeling by automatically selecting related phrases and providing suggestions for additional expressions, potential exceptions and spelling corrections.

All these innovations should lead to a more precise and comprehensive measurement of salient topics in your text collection.

New and improved graphic displays

WordStat 8 has several new graphic displays to help you better understand the results of your data analysis. We have improved, interactive word clouds, donut, and radar charts.

Deviation table

New in 8.0.7

This is a brand new feature included in WordStat 8. It was added after the release and you need to have downloaded WordStat 8.0.7 or later to have access to it. The Deviation Table allows you to see words/phrases used more or less as compared to other variables. You first need to activate the crosstab button to see the icon. You can right-click to find KWIC, Delete and save to Tab Delimited, HTML or Bitmap.

Export to Tableau

With a simple click, you can also export your results to Tableau Software to use its advanced interactive data visualization tools.

New look interface

Access and extract valuable insights quickly with the new interface.


WordStat 7 interface

WordStat 8

WordStat 8 interface

Better precision in text search. More accurate results.

New features and improvements to categorisation dictions help you be more precise in text search.

New categorisation features
  • Case sensitive entries: the categorisation dictionaries and the exclusion list now support case-sensitive entries to disambiguate words such as “Bill” and “bill”, “Buck” and “buck” or “us” and “US”.
  • Regular Expression (Regex) Searches: The new Regular Expression Editor allows you to create your own Regex formulas to quickly extract specific information from your text data such as email addresses or postal codes.
  • New Substitution process: An improved substitution process separates substitution from lemmatization allowing you to easily track substitutions and keep your content dictionary free of misspelled words.
  • Saving Exclusion and substitution lists: These lists, along with the categorigsation dictionary can now be saved into a categorization model file. This file can be used on other WordStat projects as well as in QDA Miner, WordStat Document Explorer, or in the WordStat software developer kit (SDK).

Numerical transformation

Access and extract valuable insights quickly with the new interface.

A new numerical transformation dialog box allows you to compute numerical variables from other variables with up to 50 transformation functions including trigonometric, statistical, random number functions. Conditional transformation can also be performed using an IF-THEN-ELSE logical structure.

Numerical transformation

Transform text using Python scripts

WordStat 8 opens the possibility of NLP data scientists to use Python script and its full range of open-source libraries to preprocess or transform text documents for analysis in WordStat. This new feature increases the flexibility of WordStat and allows users to use their Python programming skills.


A new "binning" feature allows you to transform continuous values into a smaller number of distinct categories or bins. It can be used to reduce the effect of numerical outliers, abnormal distributions, or convert a continuous numerical variable into an ordinal one. It is especially useful for creating graphical displays of comparisons when there are too many of distinct values in the numerical variable for meaningful analysis.

Analyse emojis

Emojis have become ubiquitous in social media, text messaging, emails. They are used to represent an object, express an idea or sentiment, or add a nuance to a written message. Emojis are often an integral part of the message and can hardly be ignored. WordStat 8.0 can transform emojis into their text representation, allowing you to analyze them either on their own or as part of the whole message.

WordStat 8 can analyse emojis

Explore your documents from Windows Explorer

The new Document Explorer tool allows users to quickly explore the content of their documents from Windows Explorer without the need to import documents or create a project. Just select the documents you would like to explore or the folder containing them, right-click and select Explore to quickly identify the most frequent words and phrases and where they are in your documents. With a simple right-click, you can also perform a semantic search on your documents using an existing categorization dictionary or classify documents using a prediction model in WordStat.

WordStat 8: Improved performance and provision, a more flexible approach and enhanced usability.