Create a python script named character_counter.py
Import the necessary libraries.
import sys from collections import Counter import re import csv
Counter will be used for counting the characters.
re (regular expression) will be used for regular expressions.
csv (comma separated values) is used to print the values.
We define a function for opening and reading the text file.
def main(input_file): with open(input_file, 'r') as f: text = f.read()
Regular expression is used to clean the text file.
In this case we will ignore everything except for alphabets and numbers.
clean = re.sub('[^A-Za-z0-9]+', '', text)
Above expression can be changed according which character needs to be counted.
Now in order to count the characters we call the counter module.
counts = Counter(clean)
We need an output file to write our count.
with open('output_file.csv', 'wb') as m: w = csv.writer(m) w.writerows(counts.items())
Here we open a “output_file.csv” (if file is not available in the folder it will be created). We write the counts variable to the file.
Lastly the argument value to run the the program.
if __name__ == '__main__': main(sys.argv[1])
The whole code should look something like this:
import sys from collections import Counter import re import csv def main(input_file): with open(input_file, 'r') as f: text = f.read() clean = re.sub('[^A-Za-z0-9]+', '', text) #Keep all a-z, A-B and 0-9 characters. counts = Counter(clean) with open('output_file.csv', 'wb') as m: w = csv.writer(m) w.writerows(counts.items()) # Write the dictionary to CSV file if __name__ == '__main__': main(sys.argv[1])
There should be three files in the folder.
The folder should contain
character_counter.py
input_file.txt with some thing written inside
output_file.csv
Navigate to the folder and run inside command line.
python character_counter.py input_file.txt
An example of output_file.csv
W,4 V,5 Y,8 X,2 a,183 c,59 b,13 e,154
Here is what I came up with after reading a sample text file.