1.Functions Defined for NLTK's Frequency Distributionsgit
Example | Description |
---|---|
fdist = FreqDist(samples) | create a frequency distribution containing the given samples |
fdist[sample] += 1 | increment the count for this sample |
fdist['monstrous'] | count of the number of times a given sample occurred |
fdist.freq('monstrous') | frequency of a given sample |
fdist.N() | total number of samples |
fdist.most_common(n) | the n most common samples and their frequencies |
for sample in fdist: | iterate over the samples |
fdist.max() | sample with the greatest count |
fdist.tabulate() | tabulate the frequency distribution |
fdist.plot() | graphical plot of the frequency distribution |
fdist.plot(cumulative=True) | cumulative plot of the frequency distribution |
fdist1 |= fdist2 | update fdist1 with counts from fdist2 |
fdist1 < fdist2 | test if samples in fdist1 occur less frequently than in fdist2 |
2.Some Word Comparison Operatorsapi
Function | Meaning |
---|---|
s.startswith(t) | test if s starts with t |
s.endswith(t) | test if s ends with t |
t in s | test if t is a substring of s |
s.islower() | test if s contains cased characters and all are lowercase |
s.isupper() | test if s contains cased characters and all are uppercase |
s.isalpha() | test if s is non-empty and all characters in s are alphabetic |
s.isalnum() | test if s is non-empty and all characters in s are alphanumeric |
s.isdigit() | test if s is non-empty and all characters in s are digits |
s.istitle() | test if s contains cased characters and is titlecased (i.e. all words in s have initial capitals) |
3.Basic Corpus Functionality defined in NLTKless
Example | Description |
---|---|
fileids() | the files of the corpus |
fileids([categories]) | the files of the corpus corresponding to these categories |
categories() | the categories of the corpus |
categories([fileids]) | the categories of the corpus corresponding to these files |
raw() | the raw content of the corpus |
raw(fileids=[f1,f2,f3]) | the raw content of the specified files |
raw(categories=[c1,c2]) | the raw content of the specified categories |
words() | the words of the whole corpus |
words(fileids=[f1,f2,f3]) | the words of the specified fileids |
words(categories=[c1,c2]) | the words of the specified categories |
sents() | the sentences of the whole corpus |
sents(fileids=[f1,f2,f3]) | the sentences of the specified fileids |
sents(categories=[c1,c2]) | the sentences of the specified categories |
abspath(fileid) | the location of the given file on disk |
encoding(fileid) | the encoding of the file (if known) |
open(fileid) | open a stream for reading the given corpus file |
root | if the path to the root of locally installed corpus |
readme() | the contents of the README file of the corpus |
4.NLTK's Conditional Frequency Distributionsthis
Example | Description |
---|---|
cfdist = ConditionalFreqDist(pairs) | create a conditional frequency distribution from a list of pairs |
cfdist.conditions() | the conditions |
cfdist[condition] | the frequency distribution for this condition |
cfdist[condition][sample] | frequency for the given sample for this condition |
cfdist.tabulate() | tabulate the conditional frequency distribution |
cfdist.tabulate(samples, conditions) | tabulation limited to the specified samples and conditions |
cfdist.plot() | graphical plot of the conditional frequency distribution |
cfdist.plot(samples, conditions) | graphical plot limited to the specified samples and conditions |
cfdist1 < cfdist2 | test if samples in cfdist1 occur less frequently than in cfdist2 |