N grams module

class n_grams.Text(filename)

A class for analysing texts. Includes functions for calculating various different statistics, including finding n-grams and common words.

average_word_length()

Return mean, median and mode word length. Includes only words (i.e. no numbers) in calculation.

float tuple
Mean, median and mode word length
common_words(n=10)

Return the n most common words in the text. Only looks for words with 3 or more letters and ignores a given set of very common words.

n : integer, option
Number of words to return. Default is 10
string list
Most common words in text (with most common first)
find_ngrams(n)

Find n-grams of Text object. Returns dictionary of n-grams.

n : integer
Length of n-grams to construct
output : dictionary
Dictionary of n-grams in the text. The keys are the n-grams, the values the frequency they appear in the text.
longest_words(n=10)

Return the n longest words in the text.

n : integer, optional
Number of words to return. Default is 10.
string list
N longest words in text (sorted with longest first)
static read_text(filename)

Read the text from the file ‘filename’ and return lowercase.

filename : string
Name and path of file to be analysed.
string list
List of words in text, with all punctuation stripped and all text set to lowercase
text_report()

Print a report of the text, giving information about various different metrics.

word_count()

Returns number of words in the text.

integer
number of words in text