第一周--R语言快速入门学习笔记（1）

发表: 2018-03-19 浏览: 2339

R语言

一、前言

常用数据分析工具对比

Excel：办公室应用软件（收费）

spss：专业统计分析软件（收费）

maltab：商业软件（收费）

R：开源的数据分析软件

常用数据挖掘工具—商用

SAS：商业软件，模板固定不可修改，提供菜单操作和编程（SAS独特代码，羞涩难懂）

SAPP Clementine：商业软件，流操作的图形界面模式，模版固化

常用数据挖掘工具—开源

R语言：开源，丰富的算法包和强大的画图能力，可以通过修改源代码来设计适合自己业务的模型

R Data Miner：开源，图形化操作（R语言通过rattle包调出界面）

weka：开源，图形化操作（R语言通过Rweka包调出weka中模型）

二、R、RStudio and Rattle安装及介绍

（一）R

1、R语言优势

R语言是一款开源的软件

R语言可以运行在多种平台上

R语言可以轻松地从各种类型的数据源导入导出数据

R语言内置了多种统计及数学分析功能

R语言拥有顶尖的制图功能

2、R语言缺点

R语言速度略慢

R包比较混乱

3、R语言安装：https://www.r-project.org/

执行命令快捷键：Ctrl+R

清除控制台内容快捷键：Ctrl+L

（二）RStudio

RStudio 的安装：https://www.rstudio.com/

（三）Rattle

Rattle：可视化数据挖掘工具

Rattle 是使用 RGtk2 包提供的 Gnome 图形用户界面。

Rattle 的安装：

install.packages(“RGtk2”)

install.packages(“rattle”)

即可完成 Rattle 的安装

通过 library 函数载入这个包，并通过 rattle()命令调出 Rattle 界面。

三、新手上路

（一）R语言基本使用

R 语言的标准赋值符号是<-，也可以使用=。

a <- 1:10

a

[1] 1 2 3 4 5 6 7 8 9 10

错误: 找不到对象'A'

cor(iris[,1:4])

Clipboard Image.png

Cor(iris[,1:4])

错误: 没有"Cor"这个函数

（二）获得帮助、工作空间、包

1、获得帮助：

?median

??median

2、工作空间：就是当前 R 的工作环境。

# 创建数据对象 a,b

a <- 1:10

b <- 10:1

# 创建模型对象 fit

fit <- lm(Sepal.Length~Sepal.Width,data=iris)

# 创建图形对象 q、p

library(ggplot2)

q <- qplot(mpg,wt,data=mtcars)

library(rCharts)

names(iris) = gsub('\\.', '', names(iris))

p <-rPlot(SepalLength ~ SepalWidth | Species,

data = iris, type = 'point', color = 'Species')

可以通过 ls()命令查找当前工作空间的对象

ls()

[1] "a" "b" "fit" "p" "q"

通过 rm 函数移除一个或多个对象。

ls()

[1] "a" "b" "fit" "iris" "p" "q"

rm(list="fit")

ls()

[1] "a" "b" "iris" "p" "q"

rm(list=ls())

ls()

character(0)

3、包是 R 函数、数据、预编译代码以一种定义完善的格式组成的集合。

查看包位置：（真的有那个点！！！）

.libPaths()

[1] "C:/Program Files/R/R-3.2.2/library"

通过 install.packages(“package_name”,”dir”)命令安装包。

通过 library 或者 require 命令将包加载到 R 内存中

runExample()

Error: could not find function "runExample"

library(shiny)

Warning message:程辑包‘shiny’是用 R 版本 3.2.5 来建造的

runExample()

Valid examples are "01_hello", "02_text", "03_reactivity", "04_mpg", "05_sliders", "06_tabsets",

"07_widgets", "08_html", "09_upload", "10_download", "11_timer"

runExample("01_hello")

Listening on http://127.0.0.1:7835

Note: the specification for S3 class “AsIs” in package ‘jsonlite’ seems equivalent to one from

package ‘RJSONIO’: not turning on duplicate class definitions for this class.

（三）数据对象

1、向量的概念：向量是以一维数组的方法管理数据的一种对象类型。

在大多数情况下，使用长度大于 1 的向量。可以在 R 中使用 c( )函数和相应的参数来创建一个向量：

w<-c(1,3,4,5,6,7)      #数值型

w

[1] 1 3 4 5 6 7

length(w)

[1] 6

mode(w)

[1] "numeric"

w1<-c("张三","李四","王五") #字符型

w1

[1] "张三" "李四" "王五"

length(w1)

[1] 3

mode(w1)

[1] "character

w2<-c(T,F,T) #逻辑型

w2

[1] TRUE FALSE TRUE

length(w2)

[1] 3

mode(w2)

[1] "logical"

w2 <- c(True,Flase,True)

Error: object 'True' not found

w2 <- c(TRUE,FALSE,TRUE)

w2

[1] TRUE FALSE TRUE

一个向量的所有元素都必须属于相同的模式。如果不是，R 将强制执行类型转换。

w4<-c(w,w1) # 数值型+字符型=字符型

w4

[1] "1" "3" "4" "5" "6" "7" "张三" "李四" "王五"

mode(w4)

[1] "character"

w5<-c(w1,w2) # 字符型+逻辑型=字符型

w5

[1] "张三" "李四" "王五" "TRUE" "FALSE" "TRUE"

mode(w5)

[1] "character"

向量化

rm(list=ls())

(w<-seq(1:10))

[1] 1 2 3 4 5 6 7 8 9 10

(x<-sqrt(w))

[1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427 3.000000

3.162278

也可以利用 R 的这个特性进行向量的算术运算

rm(list=ls())

(w1<-c(2,3,4))

[1] 2 3 4

(w2<-c(3.1,4.2,5.3))

[1] 3.1 4.2 5.3

(w<-w1+w2)

[1] 5.1 7.2 9.3

如果两个向量的长度不同，R 将利用循环规则，该规则重复较短的向量元素，直到得到的向量长度与较长的向量的长度相同。

rm(list=ls())

(w1<-c(2,4,6,8))

[1] 2 4 6 8

(w2<-c(10,12))

[1] 10 12

(w<-w1+w2)

[1] 12 16 16 20

rm(list=ls())

(w1<-c(2,4,6,8))

[1] 2 4 6 8

(w2<-c(10,12,14))

[1] 10 12 14

(w<-w1+w2)

[1] 12 16 20 18

Warning message:

In w1 + w2 :

longer object length is not a multiple of shorter object length

等差序列的创建

seq(1,-9);seq(1,9)    # 只给出首项和尾项数据，by 自动匹配为 1 或-1

[1] 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9

[1] 1 2 3 4 5 6 7 8 9

seq(1,-9,length.out=5)    #给出首项和尾项数据以及长度，自动计算等差

[1] 1.0 -1.5 -4.0 -6.5 -9.0

seq(1,-9,by=-2)    #给出首项和尾项数据以及等差，自动计算长度

[1] 1 -1 -3 -5 -7 -9

seq(1,by=2,length.out=10)    #给出首项和等差以及长度数据，自动计算尾项

[1] 1 3 5 7 9 11 13 15 17 19

2、矩阵

利用矩阵 matrix 可以描述二维数据，和向量相似，其内部元素可以是实数、复数、字符、逻辑型数据。

(w<-seq(1:10))

[1] 1 2 3 4 5 6 7 8 9 10

(a<-matrix(w,nrow=5,ncol=2))

[,1] [,2]

[1,] 1 6

[2,] 2 7

[3,] 3 8

[4,] 4 9

[5,] 5 10

(a<-matrix(w,nrow=5,ncol=2,byrow=T)) #按行填充

[,1] [,2]

[1,] 1 2

[2,] 3 4

[3,] 5 6

[4,] 7 8

[5,] 9 10

a[1,2]

[1] 2

(a<-matrix(w,nrow=5,ncol=2,byrow=T,dimnames=list(paste0("r",1:5),paste0("l",1:2)))) #给行列设置名称

矩阵的合并

(x1<-rbind(c(1,2),c(3,4)))

 (x2<-10+x1)

x <- rbind(c(10,10),c(10,10))

x

x+x1

x <- rbind(c(10,10),c(10,10))

x

x+x1

(x3<-cbind(x1,x2))

(x4<-rbind(x1,x2))

 cbind(1,x1)

rbind(1,x1)

3、数据框

在 R 语言中，很多数据分析算法函数的输入对象都是数据框对象。而且，在使用读取 excel/txt等格式数据集的函数时，也是以数据框对象输入的。

my.dataset<-data.frame(site=c("A","B","A","A","B"),

+ season=c("Winter","Summer","Summer","Spring","Fall"),

+ pH=c(7.4,6.3,8.6,7.2,8.9))

my.dataset

names(my.dataset) #读取数据框的列名

[1] "site" "season" "pH"

names(my.dataset)[1]<-"type" #对数据框的第一列列名更为 type

names(my.dataset)

[1] "type" "season" "pH"

（四）数据导入

setwd("E:\\培训课程\\天善智能\\R 语言基础免费课程\\data")

getwd()

[1] "E:/培训课程/天善智能/R 语言基础免费课程/data"

import.txt <- read.table("iris.txt",header = TRUE) # 读入 iris.txt 文件

head(import.txt)

import.csv <- read.table("iris.csv",header = TRUE,sep = ",") #读入 iris.csv 文件

head(import.csv)

import.csv1 <- read.csv("iris.csv") # 利用 read.csv 将 iris.csv 文件读入

head(import.csv1)

# 读取非结构化文本文件

unstructuredText <- readLines("unstructuredText.txt")

Warning message:

In readLines("unstructuredText.txt") :

incomplete final line found on 'unstructuredText.txt'

unstructuredText

[1] "R 语言是一套开源的数据分析解决方案，几乎可以独立完成数据处理、数据可视化、数

据建模及模型评估等工作，而且可以完美配合其他工具进行数据交互。具体来说，R 语言具

有以下优势："

[2] "1）高效的数据处理能力"

[3] "2）数据分析"

[4] "3）数据可视化"

[5] "4）通过庞大的 R 程序包库文件进行扩展"

# Excel 文件的导入

# 利用 RODBC 包读入

library(RODBC)

channel <- odbcConnectExcel2007("sample.xlsx") # 建立连接

odbcdf <- sqlFetch(channel,'data') # 读取工作表 data 的数据

odbcdf

odbcClose(channel) # 关闭连接

library(xlsx)

res <- read.xlsx('sample.xlsx',1) # 利用 read.xlsx 函数读取 Excel 文件

res

detach(package:xlsx)

library(XLConnect)

wb <- loadWorkbook("sample.xlsx") # 加工作薄加载到 R 中

xldf<-readWorksheet(wb,sheet=getSheets(wb)[1]) #读取第一个工作表的数据

xldf

# 访问网络数据

salary_data <- read.csv("http://www.justinmrao.com/salary_data.csv")

head(salary_data)

1 个评论

梁勇

写的挺好的，代码可以用代码的格式下

要回复文章请先登录或注册