N grams module¶
-
class
n_grams.
Text
(filename)¶ A class for analysing texts. Includes functions for calculating various different statistics, including finding n-grams and common words.
-
average_word_length
()¶ Return mean, median and mode word length. Includes only words (i.e. no numbers) in calculation.
- float tuple
- Mean, median and mode word length
-
common_words
(n=10)¶ Return the n most common words in the text. Only looks for words with 3 or more letters and ignores a given set of very common words.
- n : integer, option
- Number of words to return. Default is 10
- string list
- Most common words in text (with most common first)
-
find_ngrams
(n)¶ Find n-grams of Text object. Returns dictionary of n-grams.
- n : integer
- Length of n-grams to construct
- output : dictionary
- Dictionary of n-grams in the text. The keys are the n-grams, the values the frequency they appear in the text.
-
longest_words
(n=10)¶ Return the n longest words in the text.
- n : integer, optional
- Number of words to return. Default is 10.
- string list
- N longest words in text (sorted with longest first)
-
static
read_text
(filename)¶ Read the text from the file ‘filename’ and return lowercase.
- filename : string
- Name and path of file to be analysed.
- string list
- List of words in text, with all punctuation stripped and all text set to lowercase
-
text_report
()¶ Print a report of the text, giving information about various different metrics.
-
word_count
()¶ Returns number of words in the text.
- integer
- number of words in text
-