R

2023-04-23 08:06| 来源: 网络整理| 查看: 265

目录 0.问题导入 1.随机生成示例数据 2.默认数据可视化（图1） 3.字母加数字内置顺序排列错误排除 4.字母+数字内置顺序排列错误排除后结果可视化（图2） 5.字符串排列问题示例数据生成 6.R中字符串顺序排列示例（图3） 7.自定义横轴顺序 8.自定义横轴顺序后结果可视化（图4） 9.本篇总结 10.本文所用到package(若没有需要通过install.packages 进行安装) 11.致谢 0. 问题导入

有时候，我们在绘图过程中，横轴/纵轴往往不是连续变量，如时间，指标值等，就会存在R内置排序的问题。对于字母与数字组合，若位数大于2，如 S1, S2, S3 ... S10, S11, S12...等这样的元素，R会优先按照个位大小进行优先排序，就会导致S1，S10，S11，S12，S2...,S9这样的乱序结果（如图1）。而我们有时候会需要进行顺序的样本比较，那么这样的问题如何解决呢？本篇给出解决方案～

图1 字母+数字乱序示例

1. 随机生成示例数据 x = paste0('S',1:20) y = runif(20,-5,5) pl_df = data.frame(x = x,y = y) neg_index = which(pl_df$y 数据结构一览： Sample 代表样本编号，在地里应用中可以是多个栅格，或是多个样点； Value表示样本对应的指标值； Type表示各样本点指标值是否小于0的判断结果。 head(pl_df) Sample Value Type 1 S1 3.596238 Positive Zone 2 S2 3.188980 Positive Zone 3 S3 -1.661844 Negative Zone 4 S4 4.682527 Positive Zone 5 S5 -3.507134 Negative Zone 6 S6 4.383677 Positive Zone 2. 默认数据可视化（图1） fontsize = 12 p = ggplot()+ geom_bar(data = pl_df, aes(x = Sample, y = Value, fill = Type),position = 'dodge',stat = 'identity')+ theme_bw()+ theme( axis.text = element_text(face = 'bold',color = 'black',size = fontsize,hjust = 0.5), axis.title = element_text(face = 'bold',color = 'black',size = fontsize,hjust = 0.5), legend.position = 'bottom', legend.direction = 'horizontal', legend.text = element_text(face = 'bold',color = 'black',size = fontsize,hjust = 0.5), legend.title = element_text(face = 'bold',color = 'black',size = fontsize,hjust = 0.5) )+ xlab('Sample Index')+ ylab('Testing Value') #dir.create('plot') png('plot/plot1.png', height = 15, width = 25, units = 'cm', res = 800) print(p) dev.off() 3. 字母加数字内置顺序排列错误排除

对，就是一行，而往往为了解决这个问题，我们可能需要上网搜很多帖子，花费最少1-2个小时解决这个问题。本文直接为大家提供方便，直接给出解决方案。

pl_df$Sample = factor(pl_df$Sample, levels = x)

给完解决方案，我们来说说后面的原理。由于字母加数字组合之后，R系统会从字符串第一位开始比较，直到最后一位。比较规则以a-z, A-z, 1-9来进行先后排序，而不会根据个位+十位组合后的数值大小进行比较。这也是出现乱序的原因。

这里提出的解决方案是将如下Sample元素的Level 按照我们的需求进行自定义。注意：如下展示的是更新前的pl_df$Sample 的Levels.

unique(pl_df$Sample) [1] S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20 Levels: S1 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S2 S20 S3 S4 S5 S6 S7 S8 S9

并转换为如下 Levels为我们期望的排列顺序（更新后的pl_df$Sample 的Levels）

pl_df$Sample [1] S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20 Levels: S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20 4. 字母+数字内置顺序排列错误排除后结果可视化（图2）

如图2红框所示，我们解决了R中字母+数字内置顺序排列的错误，按照我们预设的正确顺序进行了排列。

p2 = ggplot()+ geom_bar(data = pl_df, aes(x = Sample, y = Value, fill = Type),position = 'dodge',stat = 'identity')+ theme_bw()+ theme( axis.text = element_text(face = 'bold',color = 'black',size = fontsize,hjust = 0.5), axis.title = element_text(face = 'bold',color = 'black',size = fontsize,hjust = 0.5), legend.position = 'bottom', legend.direction = 'horizontal', legend.text = element_text(face = 'bold',color = 'black',size = fontsize,hjust = 0.5), legend.title = element_text(face = 'bold',color = 'black',size = fontsize,hjust = 0.5) )+ xlab('Sample Index')+ ylab('Testing Value') #dir.create('plot') png('plot/plot2.png', height = 15, width = 25, units = 'cm', res = 800) print(p2) dev.off() 图2 字母+数字内置顺序排列错误排除后结果可视化

然而，小编写到这的时候又想到了一个问题。有时候，我们需要比较的对象是全字符编码的，如"Beijing", "Shanghai"等的城市名，或是病患A，病患B等。R的绘图系统会默认根据字母a-z/A-Z进行顺序排列。但这真的是我们想要的吗？

比如，我们我们可能需要比较Beijing/Shanghai/AnHui等多个地区的GDP产值。但若根据字符排序，AnHui会排到Beijing前面，而Shanghai则会与Beijing分开。如果我们的研究重点是将Beijing-Shanghai进行横向比较的化，问题该怎么解决呢？本篇同样给出解决方案

5. 字符串排列问题示例数据生成 index = round(runif(20,1,26)) index2 = round(runif(20,1,26)) x2 = paste0(LETTERS[index],letters[index2]) pl_df2 = data.frame(City_initial = x2, GDP = y) pl_df2$Type = 'Positive Value' neg_index = which(pl_df2$GDP

【本文地址】

R

R

今日新闻

推荐新闻