keywords - tPMHighlighter - a tool to help you gain insights into text

tPMHighlighter
tPMHighlighter
Go to content
Keywords
What are keywords and key keywords?
Keywords for a text are words which occur within the text more frequently than would be expected based on their frequency in the readymade reference corpus. Key keywords are words which are key in two or more of the texts of your corpus. Keywords can also be generated at the corpus level, comparing the frequencies of words across all the texts in the corpus against their frequencies in the reference corpus.

In the screenshot, a corpus of 4 texts is being used and the keywords for that whole corpus are being used to highlight one particular text.
Like with the Vocabulary Profile screen, words of different rankings can be highlighted in different colours or in different shades of one colour.
For Key Keywords, your corpus must contain at least two texts.

How does tPMHighlighter work with keywords under the hood?
When the corpus is built, the wordlist from your corpus is sent to The Prime Machine occur two or more times in your text or corpus are shortlisted.  tPMHighlighter uses the length of the text and the frequency of the item, and compares it against the size of the reference corpus and the frequency of the item there using the log-likelihood metric. Lists of keywords are generated on your computer for each text and for your corpus overall. The keywords for each text are used to create a list of key keywords, but shortlisting text level keywords which occur in two or more texts.
When a text is displayed, each word can be given a colour to represent its ranking in the keyword or key keyword lists.  Words in the text which match the list will be highlighted if they have a ranking higher or equal to the slider setting (with 10 being the top ranked items). The multicoloured button will work through the list of highlighting colours, highlighting words for each set of 10 rankings with a different colour.
For information about how keywords are calculated see Jeaco (2020a).

Back to content