Analytics Python

Word Count in Python

This article is all about word count in python. In our last article, I explained word count in PIG but there are some limitations when dealing with files in PIG and we may need to write UDFs for that.

Those can be cleared in Python. I will show you how to do a word count in Python file easily. This is a simple program which you can get done on any Python editors.

Word Count in Python

Word count in Python

Considering you have already installed Python on your system and you have a sample file on which you want to do a word count in python.

If you don’t have any sample file, recommend you to download the below file. We are using this for example purpose.

Sample File Download

Python word count example

First, open the file and save it in a variable like below-

File= open(‘filepath’)

And now the logic for word count in python will be like, we will check if the word exists in the file, just increase the count else leave it as it is.

So below is the finalized python word count code which you can directly run on your Python Editor. Just change the path of the file.

Import sys
File= open(‘/C:sentimentdata’)
Wordcount={}
For word in file.read().split():
If word not in wordcount:
wordcount[word]=1
else:
wordcount[word]+=1
for k,v in wordcount.items():
print k,v;

This was all about word count in python and python word count code. Hope this will help you. You will be getting output like below-

Now suppose you have to find the top 5 record from this list of words. So what will you do?

Let’s see how to find top 5 words in python?

Top 5 Words in a file in Python

Already in the above section, we have found the count of each word and now just we have to find the most 5 occurred words.

All you to do is just arrange the result of the first section in descending order so that we can find the desired result. Here is the updated code-

File= open(‘/C:sentimentdata’)
Wordcount={}
For word in file.read().split():
If word not in wordcount:
wordcount[word]=1
else:
wordcount[word]+=1
wordcount= sorted (wordcount.items(),key=lambda x:x[1],reverse=true)
for k,v in wordcount.items[:5]:
print k,v;

If you want to, even more, customize this code then here it is-

From collections import counter
With open(‘file’) as file
Wordcount= counter(file.read().split())
For k,v in wordcount.most_common(5):
Print(k,v);

And you are done. This was all about word count in python and finding top 5 words in a file through python.

Do try these and let us know how it worked. Do share the issue, if you will experience any.

 

2 Comments

Leave a Comment