NLTK——經常使用函數

1.Functions Defined for NLTK's Frequency Distributionsgit

 

Example Description
fdist = FreqDist(samples) create a frequency distribution containing the given samples
fdist[sample] += 1 increment the count for this sample
fdist['monstrous'] count of the number of times a given sample occurred
fdist.freq('monstrous') frequency of a given sample
fdist.N() total number of samples
fdist.most_common(n) the n most common samples and their frequencies
for sample in fdist: iterate over the samples
fdist.max() sample with the greatest count
fdist.tabulate() tabulate the frequency distribution
fdist.plot() graphical plot of the frequency distribution
fdist.plot(cumulative=True) cumulative plot of the frequency distribution
fdist1 |= fdist2 update fdist1 with counts from fdist2
fdist1 < fdist2 test if samples in fdist1 occur less frequently than in fdist2

 

2.Some Word Comparison Operatorsapi

 

 

Function Meaning
s.startswith(t) test if s starts with t
s.endswith(t) test if s ends with t
t in s test if t is a substring of s
s.islower() test if s contains cased characters and all are lowercase
s.isupper() test if s contains cased characters and all are uppercase
s.isalpha() test if s is non-empty and all characters in s are alphabetic
s.isalnum() test if s is non-empty and all characters in s are alphanumeric
s.isdigit() test if s is non-empty and all characters in s are digits
s.istitle() test if s contains cased characters and is titlecased (i.e. all words in s have initial capitals)

 

3.Basic Corpus Functionality defined in NLTKless

Example Description
fileids() the files of the corpus
fileids([categories]) the files of the corpus corresponding to these categories
categories() the categories of the corpus
categories([fileids]) the categories of the corpus corresponding to these files
raw() the raw content of the corpus
raw(fileids=[f1,f2,f3]) the raw content of the specified files
raw(categories=[c1,c2]) the raw content of the specified categories
words() the words of the whole corpus
words(fileids=[f1,f2,f3]) the words of the specified fileids
words(categories=[c1,c2]) the words of the specified categories
sents() the sentences of the whole corpus
sents(fileids=[f1,f2,f3]) the sentences of the specified fileids
sents(categories=[c1,c2]) the sentences of the specified categories
abspath(fileid) the location of the given file on disk
encoding(fileid) the encoding of the file (if known)
open(fileid) open a stream for reading the given corpus file
root if the path to the root of locally installed corpus
readme() the contents of the README file of the corpus

4.NLTK's Conditional Frequency Distributionsthis

Example Description
cfdist = ConditionalFreqDist(pairs) create a conditional frequency distribution from a list of pairs
cfdist.conditions() the conditions
cfdist[condition] the frequency distribution for this condition
cfdist[condition][sample] frequency for the given sample for this condition
cfdist.tabulate() tabulate the conditional frequency distribution
cfdist.tabulate(samples, conditions) tabulation limited to the specified samples and conditions
cfdist.plot() graphical plot of the conditional frequency distribution
cfdist.plot(samples, conditions) graphical plot limited to the specified samples and conditions
cfdist1 < cfdist2 test if samples in cfdist1 occur less frequently than in cfdist2
相關文章
相關標籤/搜索