re.sub()用法的详细介绍

您所在的位置：网站首页 › r语言sub函数的使用方法 › re.sub()用法的详细介绍

re.sub()用法的详细介绍

2024-01-07 21:36| 来源: 网络整理| 查看: 265

目录一、前言二、函数原型三、使用案例1.匹配单个数字或字母2.匹配多个数字或字母3.匹配其他四、致谢

一、前言

在字符串数据处理的过程中，正则表达式是我们经常使用到的，python中使用的则是re模块。下面会通过实际案例介绍 re.sub() 的详细用法，该函数主要用于替换字符串中的匹配项。

二、函数原型

首先从源代码来看一下该函数原型，包括各个参数及其意义：

def sub(pattern, repl, string, count=0, flags=0): """Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return a replacement string to be used.""" return _compile(pattern, flags).sub(repl, string, count)

从上面的代码中可以看到re.sub()方法中含有5个参数，下面进行一一说明（加粗的为必须参数）：（1）pattern：该参数表示正则中的模式字符串；（2）repl：该参数表示要替换的字符串（即匹配到pattern后替换为repl），也可以是个函数；（3）string：该参数表示要被处理（查找替换）的原始字符串；（4）count：可选参数，表示是要替换的最大次数，而且必须是非负整数，该参数默认为0，即所有的匹配都会被替换；（5）flags：可选参数，表示编译时用的匹配模式（如忽略大小写、多行模式等），数字形式，默认为0。

三、使用案例

下面将以一个字符串（包含大小写英文、数字、中英文标点、特殊符号等）作为示例进行使用案例讲解，该字符串如下：

>>> s = "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn" 1.匹配单个数字或字母

（1）只匹配单一数字

>>> import re >>> s "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn" >>> re.sub(r'[0-9]', '*', s) "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m ** years old. Today is ****/**/**. It is a wonderful DAY! @HHHHello,,,#***ComeHere***...**？AA？zz？——http://welcome.cn"

上面 re.sub(r'[0-9]', '*', s) 这句话则表示只匹配单一数字，并将每一个数字替换为一个星号。

（2）只匹配单一字母

>>> s "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn" >>> re.sub(r'[a-z]', '*', s) "大家好，我是一个程序员小白。I '* ** **** ** ********* ******, *** I’* 18 ***** ***. T**** ** 2020/01/01. I* ** * ********* DAY! @HHHH****,,,#111C***H***222...66？AA？**？——****://*******.**" >>> re.sub(r'[A-Z]', '*', s) "大家好，我是一个程序员小白。* 'm so glad to introduce myself, and *’m 18 years old. *oday is 2020/01/01. *t is a wonderful ***! @****ello,,,#111*ome*ere222...66？**？zz？——http://welcome.cn" >>> re.sub(r'[A-Za-z]', '*', s) "大家好，我是一个程序员小白。* '* ** **** ** ********* ******, *** *’* 18 ***** ***. ***** ** 2020/01/01. ** ** * ********* ***! @********,,,#111********222...66？**？**？——****://*******.**"

上面 re.sub(r'[a-z]', '*', s) 这句话则表示只匹配单一小写字母，并将每一个小写字母替换为一个星号。上面 re.sub(r'[A-Z]', '*', s) 这句话则表示只匹配单一大写字母，并将每一个大写字母替换为一个星号。上面 re.sub(r'[A-Za-z]', '*', s) 这句话则表示只匹配单一字母，并将每一个字母替换为一个星号。

（3）匹配单一数字和字母

>>> s "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn" >>> re.sub(r'[0-9A-Z]', '*', s) "大家好，我是一个程序员小白。* 'm so glad to introduce myself, and *’m ** years old. *oday is ****/**/**. *t is a wonderful ***! @****ello,,,#****ome*ere***...**？**？zz？——http://welcome.cn" >>> re.sub(r'[0-9a-z]', '*', s) "大家好，我是一个程序员小白。I '* ** **** ** ********* ******, *** I’* ** ***** ***. T**** ** ****/**/**. I* ** * ********* DAY! @HHHH****,,,#***C***H******...**？AA？**？——****://*******.**" >>> re.sub(r'[0-9A-Za-z]', '*', s) "大家好，我是一个程序员小白。* '* ** **** ** ********* ******, *** *’* ** ***** ***. ***** ** ****/**/**. ** ** * ********* ***! @********,,,#**************...**？**？**？——****://*******.**"

上面 re.sub(r'[0-9A-Z]', '*', s) 这句话则表示只匹配单一数字和大写字母，并将每一个数字和大写字母替换为一个星号。上面 re.sub(r'[0-9a-z]', '*', s) 这句话则表示只匹配单一数字和小写字母，并将每一个数字和小写字母替换为一个星号。上面 re.sub(r'[0-9A-Za-z]', '*', s) 这句话则表示只匹配单一数字和字母，并将每一个数字和字母替换为一个星号。

2.匹配多个数字或字母

注意：这里的所说的多个指的是大于等于一个。

（1）匹配多个数字

>>> s "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn" >>> re.sub(r'[0-9]+', '*', s) "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m * years old. Today is */*/*. It is a wonderful DAY! @HHHHello,,,#*ComeHere*...*？AA？zz？——http://welcome.cn"

上面 re.sub(r'[0-9]+', '*', s) 这句话则表示匹配多个连续的数字，并将多个连续的数字替换为一个星号。

（2）匹配多个字母

>>> s "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn" >>> re.sub(r'[a-z]+', '*', s) "大家好，我是一个程序员小白。I '* * * * * *, * I’* 18 * *. T* * 2020/01/01. I* * * * DAY! @HHHH*,,,#111C*H*222...66？AA？*？——*://*.*" >>> re.sub(r'[A-Z]+', '*', s) "大家好，我是一个程序员小白。* 'm so glad to introduce myself, and *’m 18 years old. *oday is 2020/01/01. *t is a wonderful *! @*ello,,,#111*ome*ere222...66？*？zz？——http://welcome.cn" >>> re.sub(r'[a-zA-Z]+', '*', s) "大家好，我是一个程序员小白。* '* * * * * *, * *’* 18 * *. * * 2020/01/01. * * * * *! @*,,,#111*222...66？*？*？——*://*.*"

上面 re.sub(r'[a-z]+', '*', s) 这句话则表示匹配多个连续的小写字母，并将多个连续的小写字母替换为一个星号。上面 re.sub(r'[A-Z]+', '*', s) 这句话则表示匹配多个连续的大写字母，并将多个连续的大写字母替换为一个星号。上面 re.sub(r'[A-Za-z]+', '*', s) 这句话则表示匹配多个连续的字母，并将多个连续的字母替换为一个星号。

（3）匹配多个数字和字母

上面 re.sub(r'[0-9A-Za-z]+', '*', s) 这句话则表示匹配多个连续的数字和字母，并将多个连续的数字、连续的字母、连续的数字和字母替换为一个星号。

3.匹配其他

（1）匹配非数字

上面 re.sub(r'[^0-9]', '*', s) 这句话则表示匹配单个非数字，并将单个非数字替换为一个星号。上面 re.sub(r'[^0-9]+', '*', s) 这句话则表示匹配多个连续的非数字，并将多个连续的非数字替换为一个星号。

（2）匹配非字母

>>> s "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn" >>> re.sub(r'[^a-z]', '*', s) '*****************m*so*glad*to*introduce*myself**and***m****years*old*****oday*is**************t*is*a*wonderful***********ello********ome*ere************zz***http***welcome*cn' >>> re.sub(r'[^A-Z]', '*', s) '**************I*************************************I*******************T********************I*****************DAY***HHHH***********C***H************AA***********************' >>> re.sub(r'[^A-Za-z]', '*', s) '**************I**m*so*glad*to*introduce*myself**and*I*m****years*old****Today*is*************It*is*a*wonderful*DAY***HHHHello*******ComeHere*********AA*zz***http***welcome*cn' >>> re.sub(r'[^a-z]+', '*', s) '*m*so*glad*to*introduce*myself*and*m*years*old*oday*is*t*is*a*wonderful*ello*ome*ere*zz*http*welcome*cn' >>> re.sub(r'[^A-Z]+', '*', s) '*I*I*T*I*DAY*HHHH*C*H*AA*' >>> re.sub(r'[^A-Za-z]+', '*', s) '*I*m*so*glad*to*introduce*myself*and*I*m*years*old*Today*is*It*is*a*wonderful*DAY*HHHHello*ComeHere*AA*zz*http*welcome*cn'

上面 re.sub(r'[^a-z]', '*', s) 这句话则表示匹配单个非小写字母，并将单个非小写字母替换为一个星号。上面 re.sub(r'[^A-Z]', '*', s) 这句话则表示匹配单个非大写字母，并将单个非大写字母替换为一个星号。上面 re.sub(r'[^A-Za-z]', '*', s) 这句话则表示匹配单个非字母，并将单个非字母替换为一个星号。上面 re.sub(r'[^a-z]+', '*', s) 这句话则表示匹配多个连续的非小写字母，并将多个连续的非小写字母替换为一个星号。上面 re.sub(r'[^A-Z]+', '*', s) 这句话则表示匹配多个连续的非大写字母，并将多个连续的非大写字母替换为一个星号。上面 re.sub(r'[^A-Za-z]+', '*', s) 这句话则表示匹配多个连续的非字母，并将多个连续的非字母替换为一个星号。

（3）匹配非数字和非字母

上面 re.sub(r'[^0-9A-Za-z]', '*', s) 这句话则表示匹配单个非数字和非字母，并将单个非数字和非字母替换为一个星号。上面 re.sub(r'[^0-9A-Za-z]+', '*', s) 这句话则表示匹配多个连续的非数字和非字母，并将多个连续的非数字和非字母替换为一个星号。

（4）匹配固定形式

a.只保留字母和空格，将 repl 设置为空字符即可。

>>> s "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn" >>> re.sub(r'[^a-z ]', '', s) ' m so glad to introduce myself and m years old oday is t is a wonderful elloomeerezzhttpwelcomecn' >>> re.sub(r'[^a-z ]+', '', s) ' m so glad to introduce myself and m years old oday is t is a wonderful elloomeerezzhttpwelcomecn' >>> re.sub(r'[^A-Za-z ]', '', s) 'I m so glad to introduce myself and Im years old Today is It is a wonderful DAY HHHHelloComeHereAAzzhttpwelcomecn' >>> re.sub(r'[^A-Za-z ]+', '', s) 'I m so glad to introduce myself and Im years old Today is It is a wonderful DAY HHHHelloComeHereAAzzhttpwelcomecn'

如果要使句子语义和结构更完整，则要先将其余字符替换为空格（即repl设置为空格），然后去除多余的空格，如下：

>>> s1 = re.sub(r'[^A-Za-z ]+', ' ', s) >>> s1 ' I m so glad to introduce myself and I m years old Today is It is a wonderful DAY HHHHello ComeHere AA zz http welcome cn' >>> re.sub(r'[ ]+', ' ', s1) ' I m so glad to introduce myself and I m years old Today is It is a wonderful DAY HHHHello ComeHere AA zz http welcome cn'

b.去除以 @ 开头的英文单词

>>> s "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn" >>> re.sub(r'@[A-Za-z]+', '', s) "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! ,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"

c.去除以？结尾的英文单词和数字（注意这是中文问号）

>>> s "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn" >>> re.sub(r'[A-Za-z]+？', '', s) "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？——http://welcome.cn" >>> re.sub(r'[0-9A-Za-z]+？', '', s) "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...——http://welcome.cn"

d.去除原始字符串中的URL

>>> s "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn" >>> re.sub(r'http[:.]+\S+', '', s) "大家好，我是一个程序员小白。I 'm so glad to introduce myself, and I’m 18 years old. Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——" 四、致谢

以上就是通过实际案例对 re.sub() 用法的详细介绍，感谢大家的阅读，如果你觉得笔者写得不错，记得点个赞哦~ 当然，如果有问题，也可以在下方留言哦~

【本文地址】

re.sub()用法的详细介绍

re.sub()用法的详细介绍

今日新闻

推荐新闻