NLP之nltk学习笔记1

浏览: 2774
import nltk

In [3]:

nltk.download()
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

Out[3]:

True

In [8]:

#搜索文本
from nltk.book import *
text1.concordance('monstrous')#搜索单词
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us ,
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u

In [9]:

text2.concordance('affection')
text3.concordance('lived')
text5.concordance('lol')
Displaying 25 of 79 matches:
, however , and , as a mark of his affection for the three girls , he left them
t . It was very well known that no affection was ever supposed to exist between
deration of politeness or maternal affection on the side of the former , the tw
d the suspicion -- the hope of his affection for me may warrant , without impru
hich forbade the indulgence of his affection . She knew that his mother neither
rd she gave one with still greater affection . Though her late conversation wit
can never hope to feel or inspire affection again , and if her home be uncomfo
m of the sense , elegance , mutual affection , and domestic comfort of the fami
, and which recommended him to her affection beyond every thing else . His soci
ween the parties might forward the affection of Mr . Willoughby , an equally st
the most pointed assurance of her affection . Elinor could not be surprised at
he natural consequence of a strong affection in a young and ardent mind . This
opinion . But by an appeal to her affection for her mother , by representing t
every alteration of a place which affection had established as perfect with hi
e will always have one claim of my affection , which no other can possibly shar
f the evening declared at once his affection and happiness . " Shall we see you
ause he took leave of us with less affection than his usual behaviour has shewn
ness ." " I want no proof of their affection ," said Elinor ; " but of their en
onths , without telling her of his affection ;-- that they should part without
ould be the natural result of your affection for her . She used to be all unres
distinguished Elinor by no mark of affection . Marianne saw and listened with i
th no inclination for expense , no affection for strangers , no profession , an
till distinguished her by the same affection which once she had felt no doubt o
al of her confidence in Edward ' s affection , to the remembrance of every mark
was made ? Had he never owned his affection to yourself ?" " Oh , no ; but if
Displaying 25 of 38 matches:
ay when they were created . And Adam lived an hundred and thirty years , and be
ughters : And all the days that Adam lived were nine hundred and thirty yea and
nd thirty yea and he died . And Seth lived an hundred and five years , and bega
ve years , and begat Enos : And Seth lived after he begat Enos eight hundred an
welve years : and he died . And Enos lived ninety years , and begat Cainan : An
years , and begat Cainan : And Enos lived after he begat Cainan eight hundred
ive years : and he died . And Cainan lived seventy years and begat Mahalaleel :
rs and begat Mahalaleel : And Cainan lived after he begat Mahalaleel eight hund
years : and he died . And Mahalaleel lived sixty and five years , and begat Jar
s , and begat Jared : And Mahalaleel lived after he begat Jared eight hundred a
and five yea and he died . And Jared lived an hundred sixty and two years , and
o years , and he begat Eno And Jared lived after he begat Enoch eight hundred y
and two yea and he died . And Enoch lived sixty and five years , and begat Met
; for God took him . And Methuselah lived an hundred eighty and seven years ,
, and begat Lamech . And Methuselah lived after he begat Lamech seven hundred
nd nine yea and he died . And Lamech lived an hundred eighty and two years , an
ch the LORD hath cursed . And Lamech lived after he begat Noah five hundred nin
naan shall be his servant . And Noah lived after the flood three hundred and fi
xad two years after the flo And Shem lived after he begat Arphaxad five hundred
at sons and daughters . And Arphaxad lived five and thirty years , and begat Sa
ars , and begat Salah : And Arphaxad lived after he begat Salah four hundred an
begat sons and daughters . And Salah lived thirty years , and begat Eber : And
y years , and begat Eber : And Salah lived after he begat Eber four hundred and
begat sons and daughters . And Eber lived four and thirty years , and begat Pe
y years , and begat Peleg : And Eber lived after he begat Peleg four hundred an
Displaying 25 of 822 matches:
ast PART 24 / m boo . 26 / m and sexy lol U115 boo . JOIN PART he drew a girl w
ope he didnt draw a penis PART ewwwww lol & a head between her legs JOIN JOIN s
a bowl i got a blunt an a bong ...... lol JOIN well , glad it worked out my cha
e " PART Hi U121 in ny . ACTION would lol @ U121 . . . but appearently she does
30 make sure u buy a nice ring for U6 lol U7 Hi U115 . ACTION isnt falling for
didnt ya hear !!!! PART JOIN geeshhh lol U6 PART hes deaf ppl here dont get it
es nobody here i wanna misbeahve with lol JOIN so read it . thanks U7 .. Im hap
ies want to chat can i talk to him !! lol U121 !!! forwards too lol JOIN ALL PE
k to him !! lol U121 !!! forwards too lol JOIN ALL PErvs ... redirect to U121 '
loves ME the most i love myself JOIN lol U44 how do u know that what ? jerkett
ng wrong ... i can see it in his eyes lol U20 = fiance Jerketts lmao wtf yah I
cooler by the minute what 'd I miss ? lol noo there too much work ! why not ??
that mean I want you ? U6 hello room lol U83 and this .. has been the grammar
the rule he 's in PM land now though lol ah ok i wont bug em then someone wann
flight to hell :) lmao bbl maybe PART LOL lol U7 it was me , U83 hahah U83 ! 80
ht to hell :) lmao bbl maybe PART LOL lol U7 it was me , U83 hahah U83 ! 808265
082653953 K-Fed got his ass kicked .. Lol . ACTION laughs . i got a first class
. i got a first class ticket to hell lol U7 JOIN any texas girls in here ? any
. whats up U155 i was only kidding . lol he 's a douchebag . Poor U121 i 'm bo
??? sits with U30 Cum to my shower . lol U121 . ACTION U1370 watches his nads
ur nad with a stick . ca u U23 ewwww lol *sniffs* ewwwwww PART U115 ! owww spl
ACTION is resisting . ur female right lol U115 beeeeehave Remember the LAst tim
pm's me . charge that is 1.99 / min . lol @ innocent hahah lol .... yeah LOLOLO
is 1.99 / min . lol @ innocent hahah lol .... yeah LOLOLOLLL U12 thats not nic
s . lmao no U115 Check my record . :) Lol lick em U7 U23 how old r u lol Way to

In [11]:

text1.similar('monstrous')#搜索相似词(有相似的上下文关系)
text2.similar('monstrous')#搜索相似词(有相似的上下文关系)
imperial reliable mouldy domineering maddens delightfully gamesome
loving impalpable passing modifies mean lazy subtly doleful few
trustworthy tyrannical mystifying lamentable
very exceedingly heartily so a remarkably extremely as vast amazingly
great good sweet

In [14]:

text2.common_contexts(['monstrous','very'])#搜索共同上下文(空格处表示都能填写)
is_pretty am_glad a_pretty be_glad a_lucky

In [15]:

text4.dispersion_plot(['citizens','democracy','freedom','duties','America'])#词汇在全文分布图,text4为美国总统就职演说,可看出使用的词汇随时间变化

text.4_演讲.png


In [17]:

#text1.generate()#自动生成文章,现在这个包已经在新版本去掉了

In [18]:

##计数词汇

In [19]:

len(text1)#统计文章单词数

Out[19]:

260819

In [20]:

sorted(set(text2))#去除重复后的文章所有词汇排序(set集合所以没重复元素)

Out[20]:

['!',
'!"',
'!"--',
"!'",
'!\'"',
'!--',
'!--"',
'"',
'"\'',
'"--',
'&',
"'",
"',",
"'--",
'(',
')',
'),',
')--',
'***',
',',
',"',
',"--',
",'",
',)',
',-',
',--',
',--"',
'-',
'--',
'--"',
"--'",
'--(',
'--,',
'-?',
'-?"',
'.',
'."',
'."--',
".'",
'.\'"',
".'--",
'.)',
'.)--',
'.--',
'.--"',
'...',
'..."',
'.]',
'1',
'10',
'11',
'12',
'13',
'14',
'15',
'16',
'17',
'18',
'1811',
'19',
'2',
'20',
'200',
'21',
'22',
'23',
'24',
'25',
'26',
'27',
'28',
'29',
'3',
'30',
'31',
'32',
'33',
'34',
'35',
'36',
'37',
'38',
'39',
'4',
'40',
'41',
'42',
'43',
'44',
'45',
'46',
'47',
'48',
'49',
'5',
'50',
'6',
'7',
'7000L',
'8',
'9',
':',
':--',
':--"',
';',
';"',
';"--',
";'",
';--',
';--"',
'>',
'?',
'?"',
'?"--',
"?'",
'?)',
'?--',
'?--"',
'A',
'ALL',
'ALMOST',
'ALWAYS',
'AM',
'ANY',
'ARE',
'Abbeyland',
'About',
'Absence',
'Abundance',
'Add',
'Affecting',
'After',
'Again',
'Against',
'Ah',
'All',
'Allenham',
'Allow',
'Almost',
'Altogether',
'Am',
'Among',
'Amongst',
'An',
'And',
'Annamaria',
'Anne',
'Another',
'Anxiety',
'April',
'Are',
'As',
'Ashamed',
'Astonished',
'Astonishment',
'At',
'Austen',
'Avignon',
'Ay',
'Aye',
'BEEN',
'BOTH',
'Bad',
'Bartlett',
'Barton',
'Bath',
'Beautifully',
'Because',
'Before',
'Being',
'Believe',
'Benevolent',
'Berkeley',
'Besides',
'Betty',
'Between',
'Beyond',
'Biddy',
'Bishop',
'Bond',
'Bonomi',
'Born',
'Brandon',
'Bristol',
'Brown',
'Building',
'Buildings',
'Burgess',
'Business',
'But',
'By',
'CAN',
'CATCHING',
'CHAPTER',
'COULD',
'Can',
'Careless',
'Careys',
'Cartwright',
'Casino',
'Cassino',
'Certainly',
'Chagrined',
'Charlotte',
'Choice',
'Christian',
'Christmas',
'Civil',
'Clarke',
'Cleveland',
'Cold',
'Colonel',
'Columella',
'Combe',
'Come',
'Common',
'Comparisons',
'Concealing',
'Concern',
'Conduit',
'Confess',
'Consider',
'Considering',
'Constantia',
'Continual',
'Conversation',
'Cottage',
'Could',
'Court',
'Courtland',
'Cowper',
'Cross',
'Cruel',
'D',
'DEAR',
'DID',
'DO',
'DOES',
'DRAW',
'Dartford',
'Dashwood',
'Dashwoods',
'Davies',
'Dawlish',
'Dear',
'Dearest',
'Delaford',
'Dennison',
'Depend',
'Design',
'Determined',
'Devonshire',
'Did',
'Disappointed',
'Disappointment',
'Do',
'Doctor',
'Does',
'Domestic',
'Don',
'Donavan',
'Dorsetshire',
'Down',
'Dr',
'Drury',
'Dullness',
'During',
'Duty',
'EDWARD',
'ELINOR',
'END',
'ESTEEM',
'Each',
'Eager',
'Early',
'East',
'Easter',
'Edward',
'Elinor',
'Eliza',
'Elliott',
'Ellison',
'Ellisons',
'Encouraged',
'Engaged',
'Engagement',
'England',
'Epicurism',
'Esq',
'Esteem',
'Even',
'Every',
'Everybody',
'Excellent',
'Exchange',
'Excuse',
'Exert',
'Exeter',
'Extend',
'Extravagance',
'F',
'FAITH',
'FERRARS',
'Fanny',
'Far',
'Farm',
'February',
'Ferrars',
'Few',
'Fifteen',
'Fifty',
'Five',
'Folly',
'For',
'Forgive',
'Fortunately',
'Four',
'Friday',
'From',
'Frosts',
'GAUCHERIE',
'Gardens',
'Gentleman',
'Get',
'Gibson',
'Gilberts',
'Go',
'God',
'Godby',
'Going',
'Gone',
'Good',
'Gracious',
'Grandeur',
'Gray',
'Greatness',
'Grecian',
'Grey',
'HAD',
'HAS',
'HE',
'HER',
'HERS',
'HIM',
'HIS',
'Had',
'Half',
'Hamlet',
'Hanger',
'Hanover',
'Happy',
'Harley',
'Harris',
'Harry',
'Has',
'Have',
'Having',
'He',
'Heaven',
'Henry',
'Henshawe',
'Her',
'Here',
'High',
'His',
'Hitherto',
'Holborn',
'Holburn',
'Hon',
'Honiton',
'Hope',
'Hour',
'House',
'How',
'However',
'Hum',
'Hunters',
'Hush',
'I',
'II',
'IN',
'INconvenience',
'IS',
'If',
'Imagine',
'Impatient',
'Impossible',
'Improve',
'Impudence',
'In',
'Indeed',
'Indies',
'Infirmity',
'Inn',
'Instead',
'Invited',
'Is',
'It',
'JOHN',
'James',
'Jane',
'January',
'Jenning',
'Jennings',
'John',
'Just',
'KNEW',
'Kensington',
'Kingham',
'Know',
'Knowing',
'L',
'LESS',
'LET',
'LONG',
'LOOK',
'LOOKED',
'LUCY',
'La',
'Lady',
'Ladyship',
'Lane',
'Last',
'Laughing',
'Law',
'Let',
'Life',
'Like',
'Little',
'Lodging',
'Lombardy',
'London',
'Long',
'Longstaple',
'Look',
'Lord',
'Luckily',
'Lucy',
'M',
'MADAM',
'MAY',
'ME',
'MIND',
'MONTH',
'MUST',
'MY',
'Ma',
'Mab',
'Madam',
'Magna',
'Mall',
'Mama',
'Mamma',
'Mansion',
'Many',
'March',
'Margaret',
'Marianne',
'Marlborough',
'Martha',
'Mary',
'Master',
'May',
'Me',
'Men',
'Michaelmas',
'Mid',
'Middleton',
'Middletons',
'Midsummer',
'Mind',
'Mine',
'Misery',
'Miss',
'Misses',
'Mistress',
'Monday',
'Months',
'More',
'Morton',
'Most',
'Mr',
'Mrs',
'Much',
'Music',
'Must',
'My',
'NOT',
'NOW',
'Nancy',
'Nay',
'Neither',
'Never',
'New',
'Newton',
'No',
'Nobody',
'None',
'Nor',
'Norfolk',
'Norland',
'Not',
'Nothing',
'November',
'Now',
'OCCASION',
'ONCE',
'ONE',
'OUGHT',
'OWN',
'October',
'Of',
'Offended',
'Oh',
'On',
'Once',
'One',
'Only',
'Opportunity',
'Opposition',
'Or',
'Other',
'Others',
'Our',
'Oxford',
'P',
'PARTIES',
'Pall',
'Palmer',
'Palmers',
'Pardon',
'Park',
'Parliament',
'Parrys',
'Parsonage',
'Perhaps',
'Pity',
'Please',
'Pleased',
'Plymouth',
'Poor',
'Pope',
'Portman',
'Pratt',
'Pray',
'Precious',
'Preparation',
'Prescriptions',
'Priory',
'Queen',
'Quite',
'REALLY',
'ROBERT',
'Rather',
'Reading',
'Really',
'Recollecting',
'Reflection',
'Regard',
'Relate',
'Remember',
'Reserved',
'Restless',
'Richard',
'Richardson',
'Richardsons',
'Robert',
'Rose',
'S',
'SHALL',
'SHE',
'SHOULD',
'SIR',
'SOMETIMES',
'STILL',
'Sackville',
'Sally',
'Sandersons',
'Saturday',
'Scarcely',
'Scotland',
'Scott',
'Secrecy',
'Selfish',
'Sense',
'Sensibility',
'September',
'Seven',
'Shakespeare',
'Shall',
'Sharpe',
'She',
'Short',
'Should',
'Shyness',
'Simpson',
'Since',
'Sincerely',
'Sir',
'Sit',
'Smith',
'So',
'Some',
'Somehow',
'Somerset',
'Somersetshire',
'Something',
'Sometimes',
'Soon',
'Sophia',
'Sparks',
'Square',
'St',
'Stanhill',
'Steele',
'Steeles',
'Still',
'Strange',
'Street',
'Streets',
'Such',
'Sunday',
'Supported',
'Supposing',
'Sure',
'Surely',
'Surprised',
'Sussex',
'THAT',
'THE',
'THEIR',
'THEM',
'THEN',
'THERE',
'THESE',
'THEY',
'THIS',
'THREE',
'TIME',
'TOLD',
'TRIED',
'TWICE',
'TWO',
'Take',
'Taylor',
'Tell',
'Temple',
'Thank',
'That',
'The',
'Their',
'Then',
'There',
'These',
'They',
'Think',
'This',
'Thomas',
'Thomson',
'Those',
'Though',
'Three',
'Thunderbolts',
'Thursday',
'Thus',
'Till',
'Time',
'Tis',
'To',
'Towards',
'Truth',
'Tuesday',
'Twice',
'Twill',
'Two',
'US',
'Unaccountable',
'Undoubtedly',
'Ungracious',
'Upon',
'Use',
'VERY',
'Valley',
'Vanity',
'Very',
'Volume',
'W',
'WAS',
'WE',
'WERE',
'WHAT',
'WHERE',
'WILL',
'WILLOUGHBY',
'WITHOUT',
'WORD',
'WOULD',
'Wait',
'Walker',
'Want',
'Was',
'Watched',
'We',
'Wednesday',
'Well',
'Were',
'Westminster',
'Westons',
'Weymouth',
'What',
'Whatever',
'When',
'Whenever',
'Where',
'Whereas',
'Wherever',
'Whether',
'Which',
'While',
'Whitakers',
'Whitwell',
'Who',
'Whoever',
'Whom',
'Why',
'Will',
'William',
'Williams',
'Willing',
'Willoughby',
'Willoughbys',
'With',
'Within',
'Without',
'Would',
'Writing',
'YOU',
'YOUR',
'Yes',
'Yet',
'You',
'Your',
'[',
']',
'a',
'abandoned',
'abatement',
'abhor',
'abhorred',
'abhorrence',
'abilities',
'ability',
'able',
'ablest',
'abode',
'abominably',
'abounded',
'about',
'above',
'abridge',
'abridgement',
'abroad',
'abruptly',
'abruptness',
'absence',
'absent',
'absolute',
'absolutely',
'abstracted',
'abstraction',
'abstruse',
'absurd',
'absurdity',
'abundance',
'abundantly',
'abuse',
'abused',
'abuses',
'acacia',
'accelerate',
'accent',
'accents',
'accept',
'acceptable',
'acceptably',
'acceptance',
'accepted',
'accepting',
'accident',
'accidental',
'accidentally',
'accidently',
'accommodate',
'accommodating',
'accommodation',
'accommodations',
'accompanied',
'accompany',
'accomplished',
'accomplishment',
'accordant',
'according',
'accordingly',
'accosted',
'account',
'accounted',
'accounts',
'accrue',
'accurately',
'accusation',
'accuse',
'accustom',
'accustomary',
'aches',
'aching',
'acknowledge',
'acknowledged',
'acknowledging',
'acknowledgment',
'acknowledgments',
'acquaintance',
'acquaintances',
'acquainted',
'acquiesced',
'acquiescence',
'acquired',
'acquisition',
'acquit',
'acquitted',
'acquitting',
'across',
'act',
'acted',
'acting',
'action',
'actions',
'active',
'acts',
'actual',
'actually',
'acute',
'acutely',
'acuteness',
'adapted',
'add',
'added',
'adding',
'addition',
'additional',
'additions',
'address',
'addressed',
'addresses',
'addressing',
'adequate',
'adhering',
'adieu',
'adieus',
'adjoining',
'adjusting',
'administer',
'administering',
'admirable',
'admiration',
'admire',
'admired',
'admirers',
'admires',
'admiring',
'admission',
'admit',
'admittance',
'admitted',
'admitting',
'adopt',
'adopted',
'adorned',
'advance',
'advanced',
'advancement',
'advances',
'advancing',
'advantage',
'advantageous',
'advantages',
'advice',
'advisable',
'advise',
'advised',
'affability',
'affable',
'affair',
'affairs',
'affect',
'affectation',
'affected',
'affectedly',
'affecting',
'affection',
'affectionate',
'affectionately',
'affections',
'affects',
'affirmative',
'affixed',
'afflict',
'afflicted',
'afflicting',
'affliction',
'afflictions',
'affluence',
'afford',
'afforded',
'affording',
'affront',
'affronting',
'afraid',
'after',
'afternoon',
'afterward',
'afterwards',
'again',
'against',
'age',
'ages',
'aggrandizement',
'aggravation',
'agitate',
'agitated',
'agitation',
'ago',
'agonies',
'agony',
'agree',
'agreeable',
'agreed',
'agreeing',
'agreement',
'aid',
'ailment',
'ailments',
'aim',
'aimed',
'air',
'alacrity',
'alarm',
'alarmed',
'alarming',
'alarms',
'alienated',
'alighted',
'alike',
'alive',
'all',
'alleged',
'alleviation',
'allow',
'allowable',
'allowance',
'allowances',
'allowed',
'allowing',
'alloy',
'alluded',
'allusion',
'almost',
'alone',
'along',
'aloud',
'alphabet',
'already',
'also',
'altar',
'alter',
'alteration',
'alterations',
'altered',
'altering',
'alternately',
'alternative',
'although',
'altogether',
'always',
'am',
'amazement',
'amazing',
'amazingly',
'ambition',
'amended',
'amendment',
'amends',
'amiable',
'amiably',
'amidst',
'amiss',
'among',
'amongst',
'amount',
'amounted',
'ample',
...]

In [21]:

len(set(text1))#去除重复词汇之后的单词数

Out[21]:

19317

In [26]:

(len(text1)-len(set(text1)))/len(text1)#自定义:计算重复率

Out[26]:

0.9259371441497743

In [27]:

#计算重复词密度
from __future__ import division
len(text1)/len(set(text1))

Out[27]:

13.502044830977896

In [34]:

#关键词统计出现个数
text3.count('smote')

Out[34]:

5

In [35]:

100*text4.count('a')/len(text2)#关键词密度

Out[35]:

1.5073176244561226

In [36]:

#自定义函数:文本平均密度值
def lexical_diversity(text):
    return len(text)/len(set(text))
#百分比
def percentage(count,total):
    return 100*count/total

In [37]:

lexical_diversity(text1)

Out[37]:

13.502044830977896

In [38]:

lexical_diversity(text5)

Out[38]:

7.420046158918563

In [40]:

percentage(5,8)

Out[40]:

62.5

In [41]:

percentage(text1.count('a'),len(text1))

Out[41]:

1.7517895552087845
推荐 0
本文由 ID王大伟 创作,采用 知识共享署名-相同方式共享 3.0 中国大陆许可协议 进行许可。
转载、引用前需联系作者,并署名作者且注明文章出处。
本站文章版权归原作者及原出处所有 。内容为作者个人观点, 并不代表本站赞同其观点和对其真实性负责。本站是一个个人学习交流的平台,并不用于任何商业目的,如果有任何问题,请及时联系我们,我们将根据著作权人的要求,立即更正或者删除有关内容。本站拥有对此声明的最终解释权。

0 个评论

要回复文章请先登录注册