Python数据科学（三）- python与数据科学应用(Ⅲ)

发表: 2017-10-31 浏览: 3014

数据科学

传送门：

Python数据科学（一）- python与数据科学应用(Ⅰ)

Python数据科学（二）- python与数据科学应用(Ⅱ)

Python数据科学（三）- python与数据科学应用(Ⅲ)

Python数据科学（四）- 数据收集系列

Python数据科学（五）- 数据处理和数据采集

Python数据科学（六）- 资料清理(Ⅰ)

Python数据科学（七）- 资料清理(Ⅱ)

Python数据科学（八）- 资料探索与资料视觉化

Python数据科学（九）- 使用Pandas绘制统计图表

1.使用Python计算文章中的字

speech_text = '''

  I love you,Not for what you are,But for what I amWhen I am with you.I love you,Not

 only for whatYou have made of yourself,But for whatYou are making of me.I love

 youFor the part of meThat you bring out;I love youFor putting your handInto my

 heaped-up heartAnd passing overAll the foolish, weak thingsThat you can’t

 helpDimly seeing there,And for drawing outInto the lightAll the beautiful

 belongingsThat no one else had lookedQuite far enough to find.I love you because

 youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;Out of 

the worksOf my every dayNot a reproachBut a song.I love youBecause you have

 doneMore than any creedCould have doneTo make me goodAnd more than any

 fateCould have doneTo make me happy.You have done itWithout a touch,Without a

 word,Without a sign.You have done itBy being yourself.Perhaps that is whatBeing a 

friend means,After all.

'''



speech = speech_text.split()



dic = {}

for word in speech:

    if word not in dic:

        dic[word]=1

    else:

        dic[word]=dic[word] + 1





dic.items()

在使用nltk的时候，发现一直报错，可以使用下边两行命令安装nltk

import nltk

nltk.download()

会弹出以下窗口，下载nltk.

正在下载

如果这种方式下载完成了那就直接跳过下一步

我下了很多次最后都下载失败了，现在说第二种方法。
直接下载打包好的安装包：下载地址1：云盘密码znx7，下来的包nltk_data.zip 解压到C盘根目录下，这样是最保险的，防止找不到包。下载地址2：云盘密码4cp3

感谢【V_can--Python与自然语言处理_第一期_NLTK入门之环境搭建提供的安装包】

去除停用词

2.使用第二种方法直接使用python中的第三方库Counter

#代码如下

from collections import Counter

c = Counter(speech)

c. most_common(10)#出现的前十名

print(c. most_common(10))



for sw in stop_words:

    del c[sw]

c.most_common(10)

Counter 是实现的 dict 的一个子类，可以用来方便地计数。

附上完整代码



speech_text = '''

I love you,

Not for what you are,

But for what I amWhen I am with you.

I love you,

Not only for whatYou have made of yourself,

But for whatYou are making of me.

I love youFor the part of meThat you bring out;

I love youFor putting your handInto my heaped-up heartAnd passing overAll the foolish, 

weak thingsThat you can’t helpDimly seeing there,

And for drawing outInto the lightAll the beautiful belongingsThat no one else had lookedQuite far enough to find.

I love you because youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;

Out of the worksOf my every dayNot a reproachBut a song.

I love youBecause you have doneMore than any creedCould have doneTo make me goodAnd more than any fateCould have doneTo make me happy.

You have done itWithout a touch,

Without a word,

Without a sign.

You have done itBy being yourself.

Perhaps that is whatBeing a friend means,

After all.

'''



#解决大小写的问题

speech = speech_text.lower().split()

print(speech)



dic = {}

for word in  speech:

    if word not in dic:

        dic[word] = 1

    else:

        dic[word] = dic[word] + 1



import operator

swd = sorted(dic.items(),key=operator.itemgetter(1),reverse=True)

print(swd)



#停用词处理

from nltk.corpus import stopwords

stop_words = stopwords.words('English')



for k,v in swd:

    if k not in stop_words:

        print(k,v)





from collections import Counter

c = Counter(speech)

c. most_common(10)#出现的前十名

print(c. most_common(10))



for sw in stop_words:

    del c[sw]

c.most_common(10)

通过这两种方法我们就不难明白为什么现在Python 在数据分析、科学计算领域用得越来越多，除了语言本身的特点，第三方库也很多很好用。

人生几何，何不python当歌？

0 个评论

要回复文章请先登录或注册