【Python】concatenate、merge、concat、join等多种连接函数的用法详解（含Python代码）

您所在的位置：网站首页 › concatenate函数python › 【Python】concatenate、merge、concat、join等多种连接函数的用法详解（含Python代码）

【Python】concatenate、merge、concat、join等多种连接函数的用法详解（含Python代码）

2023-09-30 14:28| 来源: 网络整理| 查看: 265

一、Numpy中的concatenate()函数 import pandas as pd import numpy as np a = np.array([[1, 2], [3, 4]]) b = np.array([[5, 6]]) c=np.concatenate((a, b), axis=0) print(c)

结果展示如下：

[[1 2] [3 4] [5 6]]

我们是按行拼接。按列呢？

a = np.array([[1, 2], [3, 4]]) b = np.array([[5, 6],[7,8]]) c=np.concatenate((a, b), axis=1)

结果展示为：

[[1 2 5 6] [3 4 7 8]] 二、Pandas中的merge

merge，类似数据库中的

(1)内连接，pd.merge(a1, a2, on='key')

(2)左连接，pd.merge(a1, a2, on='key', how='left')

(3)右连接，pd.merge(a1, a2, on='key', how='right')

(4)外连接， pd.merge(a1, a2, on='key', how='outer')

我们看一下第一个数据集：

data1 = pd.DataFrame( np.arange(0,16).reshape(4,4), columns=list('abcd') ) print(data1) a b c d 0 0 1 2 3 1 4 5 6 7 2 8 9 10 11 3 12 13 14 15

我们定义第二个数据集：

data2 = [ [4,1,5,7], [6,5,7,1], [9,9,123,129], [16,16,32,1] ] data2 = pd.DataFrame(data2,columns = ['a','b','c','d']) print(data2) a b c d 0 4 1 5 7 1 6 5 7 1 2 9 9 123 129 3 16 16 32 1

1. 第一种方式，内连接，即求两个数据集的交集：

first=pd.merge(data1,data2,on=['b']) print(first)

结果展示为：

a_x b c_x d_x a_y c_y d_y 0 0 1 2 3 4 5 7 1 4 5 6 7 6 7 1 2 8 9 10 11 9 123 129

不难看出，第一个数据集的b为1，5，9，13；第二个数据集的b为1，5，9，16，所以内连接只选择了b的交集，即1，5，9。

2. 第二种方式，左连接：

左边的表不加限制！

second=pd.merge(data1,data2,on='b',how='left') print(second)

结果展示为：

a_x b c_x d_x a_y c_y d_y 0 0 1 2 3 4.0 5.0 7.0 1 4 5 6 7 6.0 7.0 1.0 2 8 9 10 11 9.0 123.0 129.0 3 12 13 14 15 NaN NaN NaN

为什么右边的表第四行为空呢？因为右边的表b列的第四行是16，而左边的表第四行是13，不匹配。

3. 第三种方式，右连接：

third=pd.merge(data1,data2,on='b',how='right') print(third)

结果展示为：

a_x b c_x d_x a_y c_y d_y 0 0.0 1 2.0 3.0 4 5 7 1 4.0 5 6.0 7.0 6 7 1 2 8.0 9 10.0 11.0 9 123 129 3 NaN 16 NaN NaN 16 32 1

是不是和上面的左连接是对应的，到这里就很清晰了。

这种连接方法在数据库语法中也经常用到，有必要掌握！

注意：

如果 on 有两个条件，on = [‘a’,‘b’]

how = ‘left’,‘right’,‘outer’

三、Pandas中的join

join的用法还是相对麻烦的，这里简单举个例子：

data2.columns=list('pown') result = data1.join(data2) print(result)

这里需要特别注意：列名不能重叠！

结果展示为：

a b c d p o w n 0 0 1 2 3 4 1 5 7 1 4 5 6 7 6 5 7 1 2 8 9 10 11 9 9 123 129 3 12 13 14 15 16 16 32 1 四、Pandas中的concat函数

这个方法能够实现上面所有的方法的效果，concat函数是pandas底下的方法，可以把数据根据不同的轴进行简单的融合。

语法为：

pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,keys=None, levels=None, names=None, verify_integrity=False)

参数说明：

objs:series,dataframe，或者panel构成的序列list

axis：0 行，1列

join：inner，outer

为了更好的查看连接后的数据来源，添加一个keys更好查看：

four=pd.concat([data1,data2,data3],keys=['data1','data2','data3']) print('four',four) four a b c d data1 0 0 1 2 3 1 4 5 6 7 2 8 9 10 11 3 12 13 14 15 data2 0 4 1 5 7 1 6 5 7 1 2 9 9 123 129 3 16 16 32 1 data3 0 4 1 5 7 1 6 5 7 1 2 9 9 123 129 3 16 16 32 1

列合并（也就是行对齐）：axis = 1：

five=pd.concat([data1,data2,data3],axis = 1,keys = ['data1','data2','data3']) print('five',five) five data1 data2 data3 a b c d a b c d a b c d 0 0 1 2 3 4 1 5 7 4 1 5 7 1 4 5 6 7 6 5 7 1 6 5 7 1 2 8 9 10 11 9 9 123 129 9 9 123 129 3 12 13 14 15 16 16 32 1 16 16 32 1

在有些数据不存在的时候，会自动填充NAN：

data4 = data3[['a','b','c']] six=pd.concat([data1,data4]) print('six',six) six a b c d 0 0 1 2 3.0 1 4 5 6 7.0 2 8 9 10 11.0 3 12 13 14 15.0 0 4 1 5 NaN 1 6 5 7 NaN 2 9 9 123 NaN 3 16 16 32 NaN

join：inner 交集

outer：并集

seven=pd.concat([data1,data4],join='inner') print('seven',seven)

结果展示为：

seven a b c 0 0 1 2 1 4 5 6 2 8 9 10 3 12 13 14 0 4 1 5 1 6 5 7 2 9 9 123 3 16 16 32

当有列明不一致的时候：

eight=pd.concat([data1,data4]) print(eight)

结果展示为：

a b c d 0 0 1 2 3.0 1 4 5 6 7.0 2 8 9 10 11.0 3 12 13 14 15.0 0 4 1 5 NaN 1 6 5 7 NaN 2 9 9 123 NaN 3 16 16 32 NaN 源代码如下： import pandas as pd import numpy as np # In[]:数据的合并 # 1 ，merge，类似数据库中的 # (1)内连接，pd.merge(a1, a2, on='key') # (2)左连接，pd.merge(a1, a2, on='key', how='left') # (3)右连接，pd.merge(a1, a2, on='key', how='right') # (4)外连接， pd.merge(a1, a2, on='key', how='outer') data1 = pd.DataFrame( np.arange(0,16).reshape(4,4), columns=list('abcd') ) print(data1) data2 = [ [4,1,5,7], [6,5,7,1], [9,9,123,129], [16,16,32,1] ] data2 = pd.DataFrame(data2,columns = ['a','b','c','d']) print(data2) # 内连接 ,交集 first=pd.merge(data1,data2,on=['b']) print(first) # 左连接注意：如果 on 有两个条件，on = ['a','b'] # how = 'left','right','outer' second=pd.merge(data1,data2,on='b',how='left') print(second) third=pd.merge(data1,data2,on='b',how='right') print(third) # 2，append,相当于R中的rbind # ignore_index = True:这个时候表示index重新记性排列,而且这种方法是复制一个样本 data1.append(data2,ignore_index = True) # 3，join data2.columns=list('pown') # 列名不能重叠：在这里的用法和R中rbind很像，但是join的用法还是相对麻烦的 result = data1.join(data2) print(result) # 4,concat 这个方法能够实现上面所有的方法的效果 # concat函数是pandas底下的方法，可以把数据根据不同的轴进行简单的融合 # # a,相同字段表首尾巴相接 data1.columns = list('abcd') data2.columns =list('abcd') data3 = data2 # 为了更好的查看连接后的数据来源，添加一个keys更好查看 four=pd.concat([data1,data2,data3],keys=['data1','data2','data3']) print('four',four) # b ，列合并（也就是行对齐）：axis = 1, five=pd.concat([data1,data2,data3],axis = 1,keys = ['data1','data2','data3']) print('five',five) data4 = data3[['a','b','c']] # 在有些数据不存在的时候，会自动填充NAN six=pd.concat([data1,data4]) print('six',six) # c:join:inner 交集，outer ，并集 seven=pd.concat([data1,data4],join='inner') print('seven',seven) eight=pd.concat([data1,data4]) print(eight) # 在列名没有一个相同的时候会报错 # data4.index = list('mnp') # pd.concat([data1,data4])

【本文地址】

【Python】concatenate、merge、concat、join等多种连接函数的用法详解（含Python代码）

【Python】concatenate、merge、concat、join等多种连接函数的用法详解（含Python代码）

今日新闻

推荐新闻