python之统计文本中出现最多的单词

您所在的位置:网站首页 最多的英语词组怎么写 python之统计文本中出现最多的单词

python之统计文本中出现最多的单词

2024-01-19 02:21| 来源: 网络整理| 查看: 265

文章目录 问题:文本词频统计 -统计一篇英文词频具体实现步骤应用例子:统计一篇六级作文中的词频 在很多情况下,会遇到这样的问题:对于一篇给 定文章,希望统计其中多次出现的词语,进而概 要分析文章的内容。这个问题的解决可用于对网 络信息进行自动检索和归档。 n 在信息爆炸时代,这种归档或分类十分有必要。 这就是“词频统计”问题。

说明:本文设txt为字符串

问题:文本词频统计 -统计一篇英文词频

方法:

第一步:分解并提取英文文章的单词第二步:对每个单词进行计数第三步:对单词的统计值从高到低进行排序 具体实现步骤

第一步:分解并提取文章中的单词

通过txt.lower()函数将字母变成小写,排除原文大 小写差异对词频统计的干扰。为统一分隔方式,可 以将各种特殊字符和标点符号使用txt.replace()方法 替换成空格,再使用txt.split()方法提取单词。

txt=txt.lower() for s in ',.\n ': txt=txt.replace(s,' ') list=txt.split()

第二步:对每个单词进行计数

count={} for word in list: if word in counts: counts[word] = counts[word] + 1 else: counts[word] = 1

或者,这个处理逻辑可以更简洁的表示为如下代码:

for word in list: count[word]=count.get(word,0)+1

第三步:对单词的统计值从高到低进行排序 由于字典类型没有顺序,需要将其转换为有顺序的 列表类型,再使用sort()方法和lambda函数配合实 现根据单词次数对元素进行排序。

sort=sorted(count.items(), key=lambda item:item[1],reverse=True) print(sort) 应用例子:统计一篇六级作文中的词频 txt='''To be successful in a job interview or in almost any interview situation, the applicants houlddemonstrate certain personal and professional qualities.   Most likely, the first and often a lasting impression of a person is determined by the clotheshe wears. The job applicant should take care to appear well-groomed and modestly dressed, avoiding the extremes of too pompous or too casual attire .   Besides care for personal appearance, he should pay close attention to his manner of speaking, which should be neither ostentatious nor familiar but rather straight forward, grammaticallyaccurate, and in a friendly way.   In addition, he should be prepared to talk knowledgeably about the requirements of theposition, for which he is applying in relation to his own professional experience and interests.   And finally, the really impressive applicant must convey a sense of self-confidence andenthusiasm for work, as these are factors all interviewers value highly.   If the job seeker displays the above-mentioned characteristics, he, with a little luck, willcertainly succeed in the typical personnel interview.''' for s in ',.\n ': txt=txt.replace(s,' ') txt=txt.lower() list=txt.split() print(list) count=dict() for i in list: count[i]=count.get(i,0)+1 print(count) sort=sorted(count.items(), key=lambda item:item[1],reverse=True) print(sort)

结果如下:

[('the', 10), ('in', 6), ('a', 6), ('and', 6), ('to', 5), ('of', 5), ('should', 4), ('he', 4), ('be', 3), ('job', 3), ('interview', 3), ('for', 3), ('or', 2), ('personal', 2), ('professional', 2), ('is', 2), ('applicant', 2), ('care', 2), ('too', 2), ('his', 2), ('which', 2), ('successful', 1), ('almost', 1), ('any', 1), ('situation', 1), ('applicants', 1), ('houlddemonstrate', 1), ('certain', 1), ('qualities', 1), ('most', 1), ('likely', 1), ('first', 1), ('often', 1), ('lasting', 1), ('impression', 1), ('person', 1), ('determined', 1), ('by', 1), ('clotheshe', 1), ('wears', 1), ('take', 1), ('appear', 1), ('well-groomed', 1), ('modestly', 1), ('dressed', 1), ('avoiding', 1), ('extremes', 1), ('pompous', 1), ('casual', 1), ('attire', 1), ('besides', 1), ('appearance', 1), ('pay', 1), ('close', 1), ('attention', 1), ('manner', 1), ('speaking', 1), ('neither', 1), ('ostentatious', 1), ('nor', 1), ('familiar', 1), ('but', 1), ('rather', 1), ('straight', 1), ('forward', 1), ('grammaticallyaccurate', 1), ('friendly', 1), ('way', 1), ('addition', 1), ('prepared', 1), ('talk', 1), ('knowledgeably', 1), ('about', 1), ('requirements', 1), ('theposition', 1), ('applying', 1), ('relation', 1), ('own', 1), ('experience', 1), ('interests', 1), ('finally', 1), ('really', 1), ('impressive', 1), ('must', 1), ('convey', 1), ('sense', 1), ('self-confidence', 1), ('andenthusiasm', 1), ('work', 1), ('as', 1), ('these', 1), ('are', 1), ('factors', 1), ('all', 1), ('interviewers', 1), ('value', 1), ('highly', 1), ('if', 1), ('seeker', 1), ('displays', 1), ('above-mentioned', 1), ('characteristics', 1), ('with', 1), ('little', 1), ('luck', 1), ('willcertainly', 1), ('succeed', 1), ('typical', 1), ('personnel', 1)]


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3