WordCloud词云图去除停用词的正确方法

您所在的位置：网站首页 › apex如何注册新号 › WordCloud词云图去除停用词的正确方法

WordCloud词云图去除停用词的正确方法

2024-01-26 14:16| 来源: 网络整理| 查看: 265

前言

之前我们已经学习了如何使用wordcloud制作英文和中文词云，今天我们接着讲解，在实际制作词云中，有很多词是没有展示出的意义的，例如我，他等主语，那如何不显示这些词了，这就涉及到停用词。

wordcloud自带停用词

wordcloud自带一个停用词表，是一个集合的数据类型。

from wordcloud import STOPWORDS print(STOPWORDS)

如果我们需要添入一些其他的词的话，也很简单，直接用add或者update方法即可（因为这是集合数据）。

from matplotlib import pyplot as plt from wordcloud import WordCloud,STOPWORDS text = 'my is luopan. he is zhangshan' stopwords = STOPWORDS stopwords.add('luopan') wc = WordCloud(stopwords=stopwords) wc.generate(text) plt.imshow(wc)

中文停用词使用

用wordcloud库制作中文词云图，必须要分词，所以总结下来，中文中需要设置停用词的话可以有三种方法。

在分词前，将中文文本的停用词先过滤掉。分词的时候，过滤掉停用词。在wordcloud中设置stopwords。

在这里我们只讲解第三种方法，设置stopwords，我们需要先有一个中文停用词表，在网上下载即可，然后将停用词表清洗为集合数据格式。

首先我们读取停用词表的内容，设置为集合数据结构。

stopwords = set() content = [line.strip() for line in open('hit_stopwords.txt','r').readlines()] stopwords.update(content) stopwords

接着，我们就对文本进行分词，制作词云图即可。

from matplotlib import pyplot as plt from wordcloud import WordCloud import jieba text = '我叫罗攀，他叫关羽，我叫罗攀，他叫刘备' cut_word = " ".join(jieba.cut(text)) stopwords = set() content = [line.strip() for line in open('hit_stopwords.txt','r').readlines()] stopwords.update(content) wc = WordCloud(font_path = r'/System/Library/Fonts/Supplemental/Songti.ttc', stopwords = stopwords) wc.generate(cut_word) plt.imshow(wc)

最后，如何美化词云图，我们下期再见~

【本文地址】

WordCloud词云图去除停用词的正确方法

WordCloud词云图去除停用词的正确方法

今日新闻

推荐新闻