python3 （3）—- 去掉字符串中的标点符号

qq_1144521901 Python 2023-03-17 20 0 原文

首先是参考了如下的文章https://blog.csdn.net/luckyliuwenyuan/article/details/82782517，下面是他写的代码


import string
 
def removePunctuation(text):
    '''去掉字符串中标点符号
    '''
    #方法一：使用列表添加每个字符，最后将列表拼接成字符串，目测要五行代码以上
    temp = []
    for c in text:
        if c not in string.punctuation:
            temp.append(c)
    newText = ''.join(temp)
    print(newText)
 
    #方法二：给join传递入参时计算符合条件的字符
    b = ''.join(c for c in text if c not in string.punctuation)
    print(b)
    return newText

但是问题是，他用的 string.punctuation里面，只能去除英文的标点符号，对中文的是没有效果的。（我已经测试过了）

那么怎么去除中文的标点符号，或者是你想怎么去除就去除谁呢。

那么就需要我们自定义一个集合，里面是我们要去除的符或者数字即可。这里也用到了正则表达式。

下面我们可以参考这篇文章

https://blog.csdn.net/weixin_37294079/article/details/60764352

代码如下：

from string import punctuation 
import re
import jieba
add_punc='，。、【】“”：；（）《》‘’{}？！⑦()、%^>℃：.”“^-——=擅长于的&#@￥'
all_punc=punctuation+add_punc
def sentence_cut(x):#cut words and delete punctuation
    x=re.sub(r'[A-Za-z0-9]|/d+','',x)#delet numbers and letters
    testline = jieba.cut(x,cut_all=False)
    testline=' '.join(testline)
    testline=testline.split(' ')
    te2=[]
    for i in testline:
        te2.append(i)
        if i in all_punc:
            te2.remove(i)
    return te2

其实我们只要他的几行代码即可（可以他中间用了jieba）

add_punc='，。、【】“”：；（）《》‘’{}？！⑦()、%^>℃：.”“^-——=擅长于的@￥'

all_punc=punctuation+add_punc

解释：第一行就是他定义的要去除的中文的字符；第二行就是把自定义的和原来有的结合。

综合上诉的代码，我的代码如下：

from string import punctuation
str = "《三国演义》中的“水镜先生”是司马徽"
add_punc='，。、【】“”：；（）《》‘’{}？！⑦()、%^>℃：.”“^-——=擅长于的&#@￥' # 自定义--中文的字符
all_punc = punctuation + add_punc
temp = []
for c in str:
    if c not in all_punc :
        temp.append(c)
newText = ''.join(temp)
print(newText)

输出结果：