pandas札记11_数据规整

发表: 2019-08-06 浏览: 1060

pandas python3

import numpy as np 

import pandas as pd

import matplotlib.pyplot as plt

层次化索引hierarchical indexing

数据分散在不同的文件或者数据库中
层次化索引在⼀个轴上拥有多个（两个以上）索引级别
低维度形式处理高维度数据

# 创建S:索引是一个数组组成的列表

data = pd.Series(np.random.randn(9),

                 index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'],

                        [1, 2, 3, 1, 3, 1, 2, 2, 3]])

data

image.png

data['b']



# 部分索引选取数据子集

# 切片形式

data['b':'c']



# 列表形式

data.loc[['b', 'c']]

image.png

data.loc[['b', 'd']]

data.loc[:, 2]

image.png

# 对于DF类型数据

frame = pd.DataFrame(np.arange(12).reshape((4, 3)),

                     index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],

                     columns=[['Ohio', 'Ohio', 'Colorado'],

                              ['Green', 'Red', 'Green']])

frame



# 索引设置名字

frame.index.names = ['key1', 'key2']

# 属性设置名字

frame.columns.names = ['state', 'color']

frame

image.png

重排与分级排序

重新调整某条轴上的各级别的顺序
指定级别上的值对数据进行排序
swaplevel()接受两个级别编号或名称

image.png

# level=0 通过第一层索引key1排序

frame.sort_index(level=0)



# level=1 通过第一层索引key2排序

frame.sort_index(level=1)

image.png

根据级别统计求和

通过level指定某条轴
指定行或者列

image.png

合并数据集

pandas.merge：根据键将不同DF中的行连接起来，类似于数据库的join操作
pandas.concat：沿着轴将对象叠在一起
法combine_first可以将重复数据拼接在⼀起，⽤⼀个对象中的值填充另⼀个的缺失值

df1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],

                    'data1': range(7)})

df2 = pd.DataFrame({'key': ['a', 'b', 'd'],'data2': range(3)})



# 默认根据重叠列名key根据进行合并

pd.merge(df1,df2)

pd.merge(df1, df2, on='key')

image.png

merge

默认是内连接
结果中的键是交集：只有a、b在两个DF中同时存在

image.png

image.png

索引行的合并

DF的连接键有时位于索引
传入left_index=True或right_index=True

image.png

join()

按照索引合并
合并多个DF对象，要求没有重复的列
默认使用左连接，保留左边的行索引
简单的合并参数可以是一组DF

image.png

轴向索引

连接concatentation、绑定binding、堆叠stacking
Numpy的concatenate()函数实现
pandas的concat()函数实现

image.png

合并与重叠

索引全部或者部分重叠的两个数据

Numpy的where函数：类似if-else
Series有⼀个combine_first⽅法

a = pd.Series([np.nan, 2.5, np.nan, 3.5, 4.5, np.nan],

              index=['f', 'e', 'd', 'c', 'b', 'a'])

b = pd.Series(np.arange(len(a), dtype=np.float64),

              index=['f', 'e', 'd', 'c', 'b', 'a'])



b[-1] = np.nan



np.where(pd.isnull(a), b, a)



# Series有⼀个combine_first⽅法

b[:-2].combine_first(a[2:])

image.png

重塑和轴向旋转

reshape
pivot：⽤set_index创建层次化索引，再⽤unstack重塑；长格式转化为宽格式
pandas.melt：将宽格式转化为长格式，合并多列
stack：列旋转为行:S------>DF；
- 默认会滤除缺失数据
- 修改：dropna=False，不滤除
unstack：行旋转为列：DF---->S

image.png

0 个评论

要回复文章请先登录或注册