让对话机器人 Rasa 支持中文

您所在的位置:网站首页 h5转weights 让对话机器人 Rasa 支持中文

让对话机器人 Rasa 支持中文

2023-04-09 18:08| 来源: 网络整理| 查看: 265

Rasa 安装之后,默认是不支持中文对话的。

学习、配置的策略

查到的示例,pipeline 配置各不相同,不动手试,难以知道相互间的优劣。

所以,先从能运行的最简单配置开始。例如使用《Rasa 实战:构建开源对话机器人》这本书上的推荐的中文 pipeline。 里面有个医疗机器人的 nlu 配置示例。当然,只包含了 nlu 部分的配置,即识别意图和实体,没有回复配置。

最简单的中文配置

打开项目根目录下的 config.yml 配置文件,修改如下:

recipe: default.v1 language: zh pipeline: - name: JiebaTokenizer - name: LanguageModelFeaturizer model_name: "bert" model_weight: "bert-base-chinese" - name: "DIETClassifier" language 需要由 en 修改为 zh,即中文。 pipeline 可以参考我整理的 Rasa NLU pipeline 组件列表。 什么是 NLU

NLU(Natural Language Understanding)是自然语言理解的缩写。

rasa 中 nlu 的作用:

Rasa NLU 模块的主要功能是解析用户输入数据,识别出用户输入的实体、意图等关键信息,同时也可以添加诸如情感分析等自定义模块。

配置 nlu.yml

修改 data/nlu.yml,在已有的英文语料基础上,增加一些中文的语料。

version: "3.1" nlu: - intent: greet examples: | - hey - hello - hi - hello there - good morning - good evening - moin - hey there - let's go - hey dude - goodmorning - goodevening - good afternoon - 你好! - 您好! - 在么! - 在吗! - 喂! - intent: goodbye examples: | - cu - good by - cee you later - good night - bye - goodbye - have a nice day - see you around - bye bye - see you later - 拜拜! - 再见! - 拜! - 退出。 - 结束。 - exit - intent: affirm examples: | - yes - y - indeed - of course - that sounds good - correct - 是的 - 是 - intent: deny examples: | - no - n - never - I don't think so - don't like that - no way - not really - 不 - 不是的 - 不是 重新训练模型

data 目录下的各种 yml 配置文件里存储的就是训练数据,例如 nlu.yml。

rasa train nlu

期间下载 tf_model.h5 1.88G,怎么这么大。。。(这个文件是 BERT 模型引入的。BERT,Bidirectional Encoder Representations from Transformers,是一种基于 TensorFlow 框架的模型。BERT 模型使用 Transformer 架构来学习文本表示,可以用于各种自然语言处理任务,如文本分类、命名实体识别、问答等。TensorFlow 是一个广泛使用的机器学习框架,可用于训练和部署各种深度学习模型。tf_model.h5 是使用 TensorFlow 框架训练的模型文件,其中 .h5 表示它是一个 HDF5 格式的文件。)

但是训练出来的模型文件,只有 20M。

> ls -lah models/ total 44M drwxrwxrwx 1 zhongwei zhongwei 4.0K Apr 7 10:35 ./ drwxrwxrwx 1 zhongwei zhongwei 4.0K Apr 7 10:03 ../ -rwxrwxrwx 1 zhongwei zhongwei 20M Apr 7 10:35 nlu-20230407-100759-obtuse-rack.tar.gz*

测试:

rasa shell nlu 测试效果

greet intent,即,打招呼的意图:

Next message: 你好 { "text": "你好", "intent": { "name": "greet", "confidence": 0.9999979734420776 },

goodbye intent, 即,再见的意图:

Next message: 再见 { "text": "再见", "intent": { "name": "goodbye", "confidence": 0.9999972581863403 },

上面两个意料之中,至少可以说明已经支持中文了。而不是默认 en 的情况下,输入中文, 没有任何的回复。

比较让我吃惊的是下面这个的意图识别:

Next message: 我拒绝 { "text": "我拒绝", "intent": { "name": "deny", "confidence": 0.9226003289222717 },

我在 deny intent 的语料配置中,并没有设置“拒绝”这个词,但是依然准测的识别出来了。说明引入了预训练的中文语言模型,但是不知道是 pipeline 哪个配置引入的。 后续了解一下。

也有不满意的情况:

Next message: 你好啊 { "text": "好啊", "intent": { "name": "affirm", "confidence": 0.4897577464580536 }, "entities": [], "text_tokens": [ [ 0, 1 ], [ 1, 2 ] ], "intent_ranking": [ { "name": "affirm", "confidence": 0.4897577464580536 }, { "name": "greet", "confidence": 0.34744495153427124 },

实际上,第一候选意图应该是 greet,却被识别为了 affirm。还是不够智能,但是基本满足要求了。

支持中文回复

前面训练 nlu 模型的过程,只是支持了中文的解析,但是并不支持中文回复。

在 domain.yml 中添加中文回复:

version: "3.1" intents: - greet - goodbye - affirm - deny - mood_great - mood_unhappy - bot_challenge responses: utter_greet: - text: "你好!吃了么?" utter_cheer_up: - text: "Here is something to cheer you up:" image: "https://i.imgur.com/nGF1K8f.jpg" utter_did_that_help: - text: "Did that help you?" utter_happy: - text: "Great, carry on!" utter_goodbye: - text: "再见" utter_iamabot: - text: "我是一个机器人,你可以叫我小远子" session_config: session_expiration_time: 60 carry_over_slots_to_new_session: true 重新训练

由于之前用 rasa train nlu 训练出来的模型只是解析,并不包含回复逻辑,所以需要重新训练。

注意,不要带 nlu 参数:

> rasa train The configuration for policies was chosen automatically. It was written into the config file at 'config.yml'. 2023-04-08 09:43:08 INFO rasa.engine.training.hooks - Starting to train component 'JiebaTokenizer'. 2023-04-08 09:43:08 INFO rasa.engine.training.hooks - Finished training component 'JiebaTokenizer'. Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache Loading model cost 0.493 seconds. Prefix dict has been built successfully. 2023-04-08 09:43:10 INFO rasa.nlu.featurizers.dense_featurizer.lm_featurizer - Model weights not specified. Will choose default model weights: rasa/LaBSE All model checkpoint layers were used when initializing TFBertModel. All the layers of TFBertModel were initialized from the model checkpoint at rasa/LaBSE. If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training. 2023-04-08 09:43:39 INFO rasa.engine.training.hooks - Starting to train component 'DIETClassifier'. /home/zhongwei/.local/lib/python3.8/site-packages/rasa/utils/train_utils.py:528: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss. rasa.shared.utils.io.raise_warning( Epochs: 100% 300/300 [00:32 ls -lah models/ drwxrwxrwx 1 zhongwei zhongwei 4.0K Apr 8 09:44 ./ drwxrwxrwx 1 zhongwei zhongwei 4.0K Apr 7 17:28 ../ -rwxrwxrwx 1 zhongwei zhongwei 24M Apr 8 09:44 20230408-094308-burning-dessert.tar.gz* -rwxrwxrwx 1 zhongwei zhongwei 20M Apr 7 10:35 nlu-20230407-100759-obtuse-rack.tar.gz* rasa shell

再次启动 rasa shell,会看到同时启用了 rasa server, 并加载了新训练的模型文件。

> rasa shell 2023-04-08 09:46:57 INFO root - Connecting to channel 'cmdline' which was specified by the '--connector' argument. Any other channels will be ignored. To connect to all given channels, omit the '--connector' argument. 2023-04-08 09:46:57 INFO root - Starting Rasa server on http://0.0.0.0:5005 2023-04-08 09:46:57 INFO rasa.core.processor - Loading model models/20230408-094308-burning-dessert.tar.gz... 2023-04-08 09:46:59 INFO rasa.nlu.featurizers.dense_featurizer.lm_featurizer - Model weights not specified. Will choose default model weights: rasa/LaBSE All model checkpoint layers were used when initializing TFBertModel. All the layers of TFBertModel were initialized from the model checkpoint at rasa/LaBSE. If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training. /home/zhongwei/.local/lib/python3.8/site-packages/rasa/utils/train_utils.py:528: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss. rasa.shared.utils.io.raise_warning( 2023-04-08 09:47:43 WARNING rasa.shared.utils.common - The UnexpecTED Intent Policy is currently experimental and might change or be removed in the future 🔬 Please share your feedback on it in the forum (https://forum.rasa.com) to help us make this feature ready for production. 2023-04-08 09:47:50 INFO root - Rasa server is up and running. Bot loaded. Type a message and press enter (use '/stop' to exit): 中文对话测试 Your input -> 你好 你好!吃了么? Your input -> 你是机器人么 我是一个机器人,你可以叫我小远子 input -> 你是谁 我是一个机器人,你可以叫我小远子

果然支持中文回复了。

rasa train nlu 异常

rasa.engine.exceptions.GraphSchemaValidationException: Component 'JiebaTokenizer' requires the following packages which are currently not installed: jieba.

解决:

pip3 install jieba

rasa.engine.exceptions.GraphSchemaValidationException: Component 'LanguageModelFeaturizer' requires the following packages which are currently not installed: transformers.

解决:

pip3 install transformers 参考 https://rasa.com/docs/rasa/language-support/

tags: rasa

关于作者

我是来自山东烟台的一名开发者,有敢兴趣的话题,或者软件开发需求,欢迎加微信 zhongwei 聊聊, 查看更多联系方式。 大象工具微信公众号

相关文章 对话机器人 Rasa 基于规则添加一个意图对应的回复 Rasa NLU pipeline 组件列表 Rasa 项目的 gitignore 配置 开源对话机器人框架 Rasa 的安装 yolov5 模型导入 Android 后编译的 apk 文件体积巨大 yolov5 pytorch mobile 模型导入 Android,实现图片目标检测 让对话机器人 Rasa 支持中文 (AMP 版) 爱评论不评论


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3