
Tips.Net > WordTips Home > Macros > VBA Examples > Determining Word Frequency
Summary: How to construct a word frequency list. (This tip works with Microsoft Word 97, Word 2000, Word 2002, and Word 2003.)
As you are analyzing your documents, you may wonder if there is a way to create a word frequency list. In other words, you may want to generate a list of every unique word in your document, along with the number of times it appears.
Unfortunately, Word doesn't include such a feature. You can, however, create your own using a macro. The following VBA macro is an example:
Sub WordFrequency()
Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for unique words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j As Integer 'Temporary variables
Dim k As Integer '
Dim l As Integer '
Dim Temp As Integer '
Dim tword As String '
' Set up excluded words
Excludes = "[the][a][of][is][to][for][this][that][by][be][and][are]"
' Find out how to sort
ByFreq = True
ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "WORD")
If ans = "" Then End
If UCase(ans) = "WORD" Then
ByFreq = False
End If
Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(LCase(aword))
If SingleWord < "a" Or SingleWord > "z" Then SingleWord = "" 'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = "" 'On exclude list?
If Len(SingleWord) > 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded. _
Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
Next aword
' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or
(ByFreq And Freq(l) > Freq(k)) Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j
' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Trim(Str(Freq(j))) & vbTab & Words(j) & vbCrLf
Next j
End With
System.Cursor = wdCursorNormal
j = MsgBox("There were " & Trim(Str(WordNum)) & _
" different words ", vbOKOnly, "Finished")
End Sub
When you open a document and run this macro, you are asked if you want to create a list sorted by word or by frequency. If you choose word, then the resulting list is shown in alphabetical order. If you choose frequency, then the resulting list is in descending order based on how many times the word appeared in the document.
While the macro is running, the status bar indicates what is happening. Depending on the size of your document and the speed of your computer, the macro may take a while to complete. (I ran it with a 719-page document with over 349,000 words and it took about five minutes to complete.)
Note that there is a line in the macro that sets a value in the Excludes string. This string contains words that the macro will ignore when putting together the word list. If you want to add words to the list, simply add them to the string, between [square brackets]. Also, make sure the exclusion words are in lowercase.
Tip #879 applies to Microsoft Word versions: 97 2000 2002 2003
Great Idea! Word is a tool to get what you really want—printed output. This means you need to make sure that Word works as well as possible with your printer, whether it is sitting on your desk or in a room down the hall.
Check out WordTips: Printing and Printers today!
Have thousands of WordTips at your fingertips, on your own system. Answer your own questions or help support others. (more information...)
Ask a Word Question
Make a Comment
Beauty Tips
Car Tips
Cleaning Tips
College Tips
Cooking Tips
Excel2007 Tips
ExcelTips
Family Tips
Gardening Tips
Health Tips
Home Tips
Money Tips
Organizing Tips
Pest Tips
Pet Tips
Word2007 Tips
WordTips