Python数据科学(三)- python与数据科学应用(Ⅲ)

浏览: 3014

传送门:

Python数据科学(一)- python与数据科学应用(Ⅰ)

Python数据科学(二)- python与数据科学应用(Ⅱ)

Python数据科学(三)- python与数据科学应用(Ⅲ)

Python数据科学(四)- 数据收集系列

Python数据科学(五)- 数据处理和数据采集

Python数据科学(六)- 资料清理(Ⅰ)

Python数据科学(七)- 资料清理(Ⅱ)

Python数据科学(八)- 资料探索与资料视觉化

Python数据科学(九)- 使用Pandas绘制统计图表

1.使用Python计算文章中的字

speech_text = '''
I love you,Not for what you are,But for what I amWhen I am with you.I love you,Not
only for whatYou have made of yourself,But for whatYou are making of me.I love
youFor the part of meThat you bring out;I love youFor putting your handInto my
heaped-up heartAnd passing overAll the foolish, weak thingsThat you can’t
helpDimly seeing there,And for drawing outInto the lightAll the beautiful
belongingsThat no one else had lookedQuite far enough to find.I love you because
youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;Out of
the worksOf my every dayNot a reproachBut a song.I love youBecause you have
doneMore than any creedCould have doneTo make me goodAnd more than any
fateCould have doneTo make me happy.You have done itWithout a touch,Without a
word,Without a sign.You have done itBy being yourself.Perhaps that is whatBeing a
friend means,After all.
'''

speech = speech_text.split()

dic = {}
for word in speech:
if word not in dic:
dic[word]=1
else:
dic[word]=dic[word] + 1


dic.items()

在使用nltk的时候,发现一直报错,可以使用下边两行命令安装nltk

import nltk
nltk.download()

会弹出以下窗口,下载nltk.

正在下载

正在下载

如果这种方式下载完成了 那就直接跳过下一步

我下了很多次最后都下载失败了,现在说第二种方法。
直接下载打包好的安装包:下载地址1:云盘密码znx7,下来的包nltk_data.zip 解压到C盘根目录下,这样是最保险的,防止找不到包。下载地址2:云盘密码4cp3

感谢【V_can--Python与自然语言处理_第一期_NLTK入门之环境搭建提供的安装包】

去除停用词

去除停用词

2.使用第二种方法直接使用python中的第三方库Counter

#代码如下
from collections import Counter
c = Counter(speech)
c. most_common(10)#出现的前十名
print(c. most_common(10))

for sw in stop_words:
del c[sw]
c.most_common(10)

Counter 是实现的 dict 的一个子类,可以用来方便地计数。

Counter 是实现的 dict 的一个子类,可以用来方便地计数。

  • 附上完整代码

speech_text = '''
I love you,
Not for what you are,
But for what I amWhen I am with you.
I love you,
Not only for whatYou have made of yourself,
But for whatYou are making of me.
I love youFor the part of meThat you bring out;
I love youFor putting your handInto my heaped-up heartAnd passing overAll the foolish,
weak thingsThat you can’t helpDimly seeing there,
And for drawing outInto the lightAll the beautiful belongingsThat no one else had lookedQuite far enough to find.
I love you because youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;
Out of the worksOf my every dayNot a reproachBut a song.
I love youBecause you have doneMore than any creedCould have doneTo make me goodAnd more than any fateCould have doneTo make me happy.
You have done itWithout a touch,
Without a word,
Without a sign.
You have done itBy being yourself.
Perhaps that is whatBeing a friend means,
After all.
'''

#解决大小写的问题
speech = speech_text.lower().split()
print(speech)

dic = {}
for word in speech:
if word not in dic:
dic[word] = 1
else:
dic[word] = dic[word] + 1

import operator
swd = sorted(dic.items(),key=operator.itemgetter(1),reverse=True)
print(swd)

#停用词处理
from nltk.corpus import stopwords
stop_words = stopwords.words('English')

for k,v in swd:
if k not in stop_words:
print(k,v)


from collections import Counter
c = Counter(speech)
c. most_common(10)#出现的前十名
print(c. most_common(10))

for sw in stop_words:
del c[sw]
c.most_common(10)

通过这两种方法我们就不难明白为什么现在Python 在数据分析、科学计算领域用得越来越多,除了语言本身的特点,第三方库也很多很好用。


人生几何,何不python当歌?

推荐 2
本文由 一只写程序的猿 创作,采用 知识共享署名-相同方式共享 3.0 中国大陆许可协议 进行许可。
转载、引用前需联系作者,并署名作者且注明文章出处。
本站文章版权归原作者及原出处所有 。内容为作者个人观点, 并不代表本站赞同其观点和对其真实性负责。本站是一个个人学习交流的平台,并不用于任何商业目的,如果有任何问题,请及时联系我们,我们将根据著作权人的要求,立即更正或者删除有关内容。本站拥有对此声明的最终解释权。

0 个评论

要回复文章请先登录注册