Word Frequency from a CSV Column in Python -
i have .csv file column of messages have collected, wish word frequency list of every word in column. here have far , not sure have made mistake, appreciated. edit: expected output write entire list of words , count (without duplicates) out .csv file.
import csv collections import counter collections import defaultdict output_file = 'comments_word_freqency.csv' input_stream = open('comments.csv') reader = csv.reader(input_stream, delimiter=',') reader.next() #skip header csvrow = [row[3] row in reader] #get fourth column open(output_file, 'rb') csvfile: row in reader: freq_dict = defaultdict(int) # "int" part # means values of dictionary integers. line in csvrow: words = line.split(" ") word in words: word = word.lower() # ignores case type freq_dict[word] += 1 writer = csv.writer(open(output_file, "wb+")) # lets write csv file. key, value in freq_dict.items(): # iterates through dictionary , writes each pair own line. writer.writerow([key, value])
the code uploaded on place, think you're getting at. returns list of word , number of times appeared in original file.
words= [] open('comments_word_freqency.csv', 'rb') csvfile: reader = csv.reader(csvfile) reader.next() row in reader: csv_words = row[3].split(" ") in csv_words: words.append(i) words_counted = [] in words: x = words.count(i) words_counted.append((i,x)) #write csv file open('output.csv', 'wb') f: writer = csv.writer(f) writer.writerows(edgl)
then rid of duplicates in list call set() on it
set(words_counted)
your output this:
'this', 2 'is', 1 'your', 3 'output', 5
Comments
Post a Comment