R的数据框基本操作：创建、访问、修改

您所在的位置：网站首页 › 如何修改dataframe中的数据 › R的数据框基本操作：创建、访问、修改

R的数据框基本操作：创建、访问、修改

2023-10-29 09:59| 来源: 网络整理| 查看: 265

数据框是R中数据组织最常用的方式。与矩阵类似的是，它们都是表格的形式，不同的是，数据框是多个不同存储类型的向量集合，而矩阵是要求所有向量的存储类型相同。

创建数据框data.frame()

data.frame(域名1=向量名1，域名2=向量名2,....) 可通过names()函数显示各个域名。

> Name = c("Jack", "May","Rose","Tom") > maths = c(78,99,89,80) > Chinese = c(88,87,69,90) > English = c(90,85,83,93) > df = data.frame(studentName = Name, ChineseScore = Chinese, MathScore = maths, EnglishScore = English) > df studentName ChineseScore MathScore EnglishScore 1 Jack 88 78 90 2 May 87 99 85 3 Rose 69 89 83 4 Tom 90 80 93 > str(df) 'data.frame': 4 obs. of 4 variables: #obs指的是行，variables指的是列 $ studentName : Factor w/ 4 levels "Jack","May","Rose",..: 1 2 3 4 $ ChineseScore: num 88 87 69 90 $ MathScore : num 78 99 89 80 $ EnglishScore: num 90 85 83 93

注意studentName的存储类型是因子（Factor）。关于因子，是一种特殊向量，后续再做讨论。

你也可以创建一个空的数据框：

> a = data.frame(x1=character(0),x2=logical(0),x3=numeric(0)) > a [1] x1 x2 x3 (or 0-length row.names) > str(a) 'data.frame': 0 obs. of 3 variables: $ x1: Factor w/ 0 levels: $ x2: logi $ x3: num 访问数据框

有3种方式:

数据框名$域名（常用）数据框名[["域名"]] 数据框名[[域编号]] > df studentName ChineseScore MathScore EnglishScore 1 Jack 88 78 90 2 May 87 99 85 3 Rose 69 89 83 4 Tom 90 80 93 > df$ChineseScore [1] 88 87 69 90 > df[["ChineseScore"]] [1] 88 87 69 90 > df[[2]] [1] 88 87 69 90

也可以用绑定函数attach来直接访问里面的向量，这样的好处是无需指定数据框名称。

> attach(df) > ChineseScore [1] 88 87 69 90 > detach(df)

注意attach和detach必须配对出现。所以使用时要谨慎。

与attach和detach类似的函数作用又with函数，基本书写格式为：

with(数据框名,{ 域访问函数1 域访问函数2 · · · })

还是刚才的例子：

> df studentName ChineseScore MathScore EnglishScore 1 Jack 88 78 90 2 May 87 99 85 3 Rose 69 89 83 4 Tom 90 80 93 > with(df,{ + print(ChineseScore) + SumScore = ChineseScore + MathScore + EnglishScore #生成局部向量 + print(SumScore) + }) [1] 88 87 69 90 [1] 256 271 241 263

注意这个SumScore是局部向量，在with{}之外无法使用：

> SumScore Error: object 'SumScore' not found 修改数据框 1. 添加列

若要修改数据框中的域值，将这个总分SumScore加入到数据框中，怎么办呢？利用within函数可以办到，格式为：

数据框名 = within(数据框名,{ 域访问函数 · · · 域修改函数 · · · })

在df中加入SumScore的具体操作：

> df = within(df,{ + SumScore = ChineseScore + MathScore + EnglishScore + }) > df studentName ChineseScore MathScore EnglishScore SumScore 1 Jack 88 78 90 256 2 May 87 99 85 271 3 Rose 69 89 83 241 4 Tom 90 80 93 263

在within{}内生成的新向量默认加入数据框，成为新的域。这样的好处是无需生成新的变量，再添加进去，节省了内存。

2. 修改数据框列的顺序

如果要把ChineseScore，MathScore对换一下的话 df[,c('studentName','MathScore','ChineseScore','EnglishScore','SumScore')]

3. subset筛选数据

筛选出总分为SumScore大于260的数据：

x = subset(df, SumScore > 260) x studentName ChineseScore MathScore EnglishScore SumScore 2 May 87 99 85 271 4 Tom 90 80 93 263

本文参考：《R语言数据挖掘》第2版薛薇编著本文持续更新中

【本文地址】

R的数据框基本操作：创建、访问、修改

R的数据框基本操作：创建、访问、修改

今日新闻

推荐新闻