基于文本挖掘技术的'十九大'报告分析

浏览: 1638

一、研究背景

2017年10月18日,中国共产党第十九次代表大会在北京隆重召开,全国各地刮起学习贯彻党的十九大精神的热潮。作者作为一名在校学生,参加了学院集体学习,目睹了会议的盛况,联想起我国这些年来不断向前发展的历程,深感祖国目前的强大!同时,我们对于十九大精神的学习、贯彻是未来一段时间的主要任务,深刻了解报告的内涵对于每个人都息息相关,所以写了这一篇博客与大家共享,因为报告基本都是由文字构成,本文特采用文本挖掘的技术,针对"十九大"报告做进一步分析。

二、文本处理

第一步:加载所需R包

library(readxl);library(plyr);library(stringr);library(jiebaR);library(wordcloud2);library(tm)

第二步:导入数据

我提前把十九大报告文字版下载下来,放在R读取数据默认的工作路径中:

article<-readLines("report.txt")

第三步:进行分词、去停止词等操作

engine<-worker()
segment(article,engine)

分词效果如下:

[1] "决胜"         "全面"         "建成"         "小康社会"     "夺取"         "新"          
[7] "时代" "中国" "特色" "社会主义" "伟大胜利" "在"
[13] "中国共产党" "第十九次" "全国代表大会" "上" "的" "报告"
[19] "同志" "们" "现在" "我" "代表" "第十八届"
[25] "中央委员会" "向" "大会" "作" "报告" "中国共产党"
[31] "第十九次" "全国代表大会" "是" "在" "全面" "建成"
[37] "小康社会" "决胜" "阶段" "中国" "特色" "社会主义"
[43] "进入" "新" "时代" "的" "关键时期" "召开"
[49] "的" "一次" "十分" "重要" "的" "大会"
[55] "大会" "的" "主题" "是" "不" "忘"
[61] "初" "心" "牢记" "使命" "高举" "中国"
[67] "特色" "社会主义" "伟大旗帜" "决胜" "全面" "建成"
[73] "小康社会" "夺取" "新" "时代" "中国" "特色"
[79] "社会主义" "伟大胜利" "为" "实现" "中华民族" "伟大"
[85] "复兴" "的" "中国" "梦" "不懈" "奋斗"
[91] "不忘" "初心" "方得" "始终" "中国共产党" "人"
[97] "的" "初心" "和" "使命" "就是" "为"
[103] "中国" "人民" "谋" "幸福" "为" "中华民族"
[109] "谋" "复兴" "这个" "初心" "和" "使命"
[115] "是" "激励" "中国共产党" "人" "不断前进" "的"
[121] "根本" "动力" "全党同志" "一定" "要" "永远"
[127] "与" "人民" "同呼吸" "共命运" "心连心" "永远"
[133] "把" "人民" "对" "美好生活" "的" "向往"
[139] "作为" "奋斗目标" "以" "永不" "懈怠" "的"
[145] "精神状态" "和" "一往无前" "的" "奋斗" "姿态"
[151] "继续" "朝着" "实现" "中华民族" "伟大" "复兴"
[157] "的" "宏伟目标" "奋勇前进" "当前" "国内外" "形势"
[163] "正在" "发生" "深刻" "复杂" "变化" "我国"
[169] "发展" "仍" "处于" "重要" "战略" "机遇期"
[175] "前景" "十分" "光明" "挑战" "也" "十分"
[181] "严峻" "全党同志" "一定" "要" "登高望远" "居安思危"
[187] "勇于" "变革" "勇于创新" "永不" "僵化" "永不"
[193] "停滞" "团结" "带领" "全国" "各族人民" "决胜"
[199] "全面" "建成" "小康社会" "奋力" "夺取" "新"
[205] "时代" "中国" "特色" "社会主义" "伟大胜利" "一"
[211] "过去" "五年" "的" "工作" "和" "历史性"
[217] "变革" "十八" "大" "以来" "的" "五年"
[223] "是" "党和国家" "发展" "进程" "中极" "不"
[229] "平凡" "的" "五年" "面对" "世界" "经济"
[235] "复苏" "乏力" "局部" "冲突" "和" "动荡"
[241] "频发" "全球性" "问题" "加剧" "的" "外部环境"
[247] "面对" "我国" "经济" "发展" "进入" "新"
[253] "常态" "等" "一系列" "深刻" "变化" "我们"
[259] "坚持" "稳中求进" "工作" "总" "基调" "迎难而上"
[265] "开拓进取" "取得" "了" "改革开放" "和" "社会主义"
[271] "现代化" "建设" "的" "历史性" "成就" "为"
[277] "贯彻" "十八" "大" "精神" "党中央" "召开"
[283] "七次" "全会" "分别" "就" "政府" "机构"
[289] "改革" "和" "职能" "转变" "全面" "深化改革"
[295] "全面" "推进" "依法治国" "制定" "十三" "五"
[301] "规划" "全面" "从严治党" "等" "重大" "问题"
[307] "作出" "决定" "和" "部署" "五年" "来"
[313] "我们" "统筹" "推进" "五位一体" "总体布局" "协调"
[319] "推进" "四个" "全面" "战略" "布局" "十二五"
[325] "规划" "胜利" "完成" "十三" "五" "规划"
[331] "顺利" "实施" "党和国家" "事业" "全面" "开"
[337] "创新" "局面" "经济" "建设" "取得" "重大成就"
[343] "坚定不移" "贯彻" "新" "发展" "理念" "坚决"
[349] "端正" "发展" "观念" "转变" "发展" "方式"
[355] "发展" "质量" "和" "效益" "不断" "提升"
[361] "经济" "保持" "中" "高速" "增长" "在"
[367] "世界" "主要" "国家" "中" "名列前茅" "国内"
[373] "生产总值" "从" "五十四万" "亿元" "增长" "到"
[379] "八十万" "亿元" "稳居" "世界" "第二" "对"
[385] "世界" "经济" "增长" "贡献率" "超过" "百分之三十"
[391] "供给" "侧" "结构性" "改革" "深入" "推进"
[397] "经济" "结构" "不断" "优化" "数字" "经济"
[403] "等" "新兴产业" "蓬勃发展" "高铁" "公路" "桥梁"
[409] "港口" "机场" "等" "基础设施" "建设" "快速"
[415] "推进" "农业" "现代化" "稳步" "推进" "粮食"
[421] "生产能力" "达到" "一万二千" "亿斤" "城镇化率" "年均"
[427] "提高" "一点" "二个" "百分点" "八千" "多万"
[433] "农业" "转移" "人口" "成为" "城镇居民" "区域"
[439] "发展" "协调性" "增强" "一带" "一路" "建设"
[445] "京津冀" "协同" "发展" "长江" "经济带" "发展"
[451] "成效显著" "创新" "驱动" "发展" "战略" "大力"
[457] "实施" "创新型" "国家" "建设" "成果" "丰硕"
[463] "天宫" "蛟龙" "天眼" "悟空" "墨子" "大"
[469] "飞机" "等" "重大" "科技成果" "相继问世" "南海"
[475] "岛礁" "建设" "积极" "推进" "开放型" "经济"
[481] "新" "体制" "逐步" "健全" "对外贸易" "对外"
[487] "投资" "外汇储备" "稳居" "世界" "前列" "全面"
[493] "深化改革" "取得" "重大突破" "蹄" "疾步" "稳"
[499] "推进" "全面" "深化改革" "坚决" "破除" "各"
[505] "方面" "体制" "机制" "弊端" "改革" "全面"
[511] "发力" "多点" "突破" "纵深" "推进" "着力"
[517] "增强" "改革" "系统性" "整体性" "协同" "性"
[523] "压茬" "拓展" "改革" "广度" "和" "深度"
[529] "推出" "一千五百多" "项" "改革" "举措" "重要"
[535] "领域" "和" "关键环节" "改革" "取得" "突破性"
[541] "进展" "主要" "领域" "改革" "主体" "框架"
[547] "基本" "确立" "中国" "特色" "社会主义" "制度"
[553] "更加" "完善" "国家" "治理" "体系" "和"
[559] "治理" "能力" "现代化" "水平" "明显提高" "全"
[565] "社会" "发展" "活力" "和" "创新" "活力"
[571] "明显增强" "民主" "法治" "建设" "迈出" "重大"
[577] "步伐" "积极" "发展" "社会主义" "民主" "政治"
[583] "推进" "全面" "依法治国" "党的领导" "人民" "当家作主"
[589] "依法治国" "有机" "统一" "的" "制度" "建设"
[595] "全面" "加强" "党的领导" "体制" "机制" "不断完善"
[601] "社会主义" "民主" "不断" "发展" "党内" "民主"
[607] "更加" "广泛" "社会主义" "协商" "民主" "全面"
[613] "展开" "爱国统一战线" "巩固" "发展" "民族宗教" "工作"
[619] "创新" "推进" "科学" "立法" "严格执法" "公正司法"
[625] "全民" "守法" "深入" "推进" "法治" "国家"
[631] "法治" "政府" "法治" "社会" "建设" "相互促进"
[637] "中国" "特色" "社会主义" "法治" "体系" "日益完善"
[643] "全" "社会" "法治" "观念" "明显增强" "国家"
[649] "监察" "体制改革" "试点" "取得实效" "行政" "体制改革"
[655] "司法" "体制改革" "权力" "运行" "制约" "和"
[661] "监督" "体系" "建设" "有效" "实施" "思想"
[667] "文化" "建设" "取得" "重大进展" "加强" "党"
[673] "对" "意识形态" "工作" "的" "领导" "党"
[679] "的" "理论" "创新" "全面" "推进" "马克思主义"
[685] "在" "意识形态" "领域" "的" "指导" "地位"
[691] "更加" "鲜明" "中国" "特色" "社会主义" "和"
[697] "中国" "梦" "深入人心" "社会主义" "核心" "价值观"
[703] "和" "中华" "优秀" "传统" "文化" "广泛"
[709] "弘扬" "群众性" "精神文明" "创建活动" "扎实" "开展"
[715] "公共" "文化" "服务水平" "不断" "提高" "文艺创作"
[721] "持续" "繁荣" "文化" "事业" "和" "文化产业"
[727] "蓬勃发展" "互联网" "建设" "管理" "运用" "不断完善"
[733] "全民" "健身" "和" "竞技" "体育" "全面"
[739] "发展" "主旋律" "更加" "响亮" "正" "能量"
[745] "更加" "强劲" "文化" "自信" "得到" "彰显"
[751] "国家" "文化" "软" "实力" "和" "中华文化"
[757] "影响力" "大幅" "提升" "全党" "全" "社会"
[763] "思想" "上" "的" "团结" "统一" "更加"
[769] "巩固" "人民" "生活" "不断" "改善" "深入"
[775] "贯彻" "以" "人民" "为" "中心" "的"
[781] "发展" "思想" "一大批" "惠民" "举措" "落地"
[787] "实施" "人民" "获得" "感" "显著" "增强"
[793] "脱贫" "攻坚战" "取得" "决定性" "进展" "六千多万"
[799] "贫困人口" "稳定" "脱贫" "贫困" "发生率" "从"
[805] "百分之十点" "二" "下降" "到" "百分之四" "以下"
[811] "教育" "事业" "全面" "发展" "中西部" "和"
[817] "农村" "教育" "明显" "加强" "就业" "状况"
[823] "持续" "改善" "城镇" "新增" "就业" "年均"
[829] "一千三百" "万人" "以上" "城乡居民" "收入" "增速"
[835] "超过" "经济" "增速" "中等" "收入" "群体"
[841] "持续" "扩大" "覆盖" "城乡居民" "的" "社会保障"
[847] "体系" "基本" "建立" "人民" "健康" "和"
[853] "医疗卫生" "水平" "大幅提高" "保障性" "住房" "建设"
[859] "稳步" "推进" "社会" "治理" "体系" "更加"
[865] "完善" "社会" "大局" "保持稳定" "国家" "安全"
[871] "全面" "加强" "生态" "文明" "建设" "成效显著"
[877] "大" "力度" "推进" "生态" "文明" "建设"
[883] "全党全国" "贯彻" "绿色" "发展" "理念" "的"
[889] "自觉性" "和" "主动性" "显著" "增强" "忽视"
[895] "生态" "环境保护" "的" "状况" "明显" "改变"
[901] "生态" "文明" "制度" "体系" "加快" "形成"
[907] "主体" "功能区" "制度" "逐步" "健全" "国家"
[913] "公园" "体制" "试点" "积极" "推进" "全面"
[919] "节约资源" "有效" "推进" "能源" "资源" "消耗"
[925] "强度" "大幅" "下降" "重大" "生态" "保护"
[931] "和" "修复" "工程" "进展" "顺利" "森林"
[937] "覆盖率" "持续" "提高" "生态" "环境治理" "明显"
[943] "加强" "环境" "状况" "得到" "改善" "引导"
[949] "应对" "气候变化" "国际" "合作" "成为" "全球"
[955] "生态" "文明" "建设" "的" "重要" "参与者"
[961] "贡献者" "引领者" "强军" "兴军开" "创新" "局面"
[967] "着眼于" "实现" "中国" "梦" "强军" "梦"
[973] "制定" "新形势下" "军事" "战略方针" "全力" "推进"
[979] "国防" "和" "军队" "现代化" "召开" "古田"
[985] "全军" "政治" "工作" "会议" "恢复" "和"
[991] "发扬" "我党我军" "光荣传统" "和" "优良作风" "人民军队"
[997] "政治" "生态" "得到" "有效"
[ reached getOption("max.print") -- omitted 13122 entries ]
 自定义函数进行去停止词操作:
removewords <- function(target_words,stop_words){
+ target_words = target_words[target_words%in%stop_words==FALSE]
+ return(target_words)
+ }
stopwords<-readLines(file.choose(),encoding = 'UTF-8')

第四步:通过关键词函数进行主题的提取

vector_keywords(article2,worker('keywords'))
   936.814     817.87    757.914    727.047     523.91 
"社会主义" "人民" "发展" "建设" "特色"

由结果我们可以看到,十九大报告的主题始终围绕着“社会主义、人民、发展、建设、特色”这五个词语,90多年来,我们伟大的中国共产党始终是人民的领导核心,坚持中国特色社会主义不动摇,在和平与开放成为时代主题的条件下,一心一意搞发展、全心全意谋建设。使得我们国家在很长一段时间内都保持着经济的高速增长,这充分说明中国共产党是带领全国人民从一个胜利走向另一个胜利的掌舵人,只有中国共产党才能带领中国人民建设繁荣昌盛富强的新中国!

第五步 计算词频并绘制词云图

>article3<-as.data.frame(table(article2))
> article3<-article3[order(article3$Freq,decreasing = TRUE),]
> wordcloud2(article3,backgroundColor = 'black')

image.png

从图中我们可以直观看出十九大报告的核心,始终围绕着国家上层建筑的构建以及人们生活水平的提高。一个善于总结经验的政党是大有希望的政党,一个善于总结经验的民族是大有前途的民族。经验给人启示,历史昭示未来。在我看来,我们的党是伟大的党、时刻进步的党。从党的一届届代表大会上,我们的党时刻在提高自己的认识、时刻在探索中国前进的道路、时刻在改正和改变自己,时刻保持自己的先锋模范带头作用。即保持党员的先进性起着至关重要的作用。

 

 

 

推荐 3
本文由 15027782681 创作,采用 知识共享署名-相同方式共享 3.0 中国大陆许可协议 进行许可。
转载、引用前需联系作者,并署名作者且注明文章出处。
本站文章版权归原作者及原出处所有 。内容为作者个人观点, 并不代表本站赞同其观点和对其真实性负责。本站是一个个人学习交流的平台,并不用于任何商业目的,如果有任何问题,请及时联系我们,我们将根据著作权人的要求,立即更正或者删除有关内容。本站拥有对此声明的最终解释权。

2 个评论

不错,赞一个
谢谢大佬

要回复文章请先登录注册