让对话机器人 Rasa 支持中文 |
您所在的位置:网站首页 › h5转weights › 让对话机器人 Rasa 支持中文 |
Rasa 安装之后,默认是不支持中文对话的。 学习、配置的策略查到的示例,pipeline 配置各不相同,不动手试,难以知道相互间的优劣。 所以,先从能运行的最简单配置开始。例如使用《Rasa 实战:构建开源对话机器人》这本书上的推荐的中文 pipeline。 里面有个医疗机器人的 nlu 配置示例。当然,只包含了 nlu 部分的配置,即识别意图和实体,没有回复配置。 最简单的中文配置打开项目根目录下的 config.yml 配置文件,修改如下: recipe: default.v1 language: zh pipeline: - name: JiebaTokenizer - name: LanguageModelFeaturizer model_name: "bert" model_weight: "bert-base-chinese" - name: "DIETClassifier" language 需要由 en 修改为 zh,即中文。 pipeline 可以参考我整理的 Rasa NLU pipeline 组件列表。 什么是 NLUNLU(Natural Language Understanding)是自然语言理解的缩写。 rasa 中 nlu 的作用: Rasa NLU 模块的主要功能是解析用户输入数据,识别出用户输入的实体、意图等关键信息,同时也可以添加诸如情感分析等自定义模块。 配置 nlu.yml修改 data/nlu.yml,在已有的英文语料基础上,增加一些中文的语料。 version: "3.1" nlu: - intent: greet examples: | - hey - hello - hi - hello there - good morning - good evening - moin - hey there - let's go - hey dude - goodmorning - goodevening - good afternoon - 你好! - 您好! - 在么! - 在吗! - 喂! - intent: goodbye examples: | - cu - good by - cee you later - good night - bye - goodbye - have a nice day - see you around - bye bye - see you later - 拜拜! - 再见! - 拜! - 退出。 - 结束。 - exit - intent: affirm examples: | - yes - y - indeed - of course - that sounds good - correct - 是的 - 是 - intent: deny examples: | - no - n - never - I don't think so - don't like that - no way - not really - 不 - 不是的 - 不是 重新训练模型data 目录下的各种 yml 配置文件里存储的就是训练数据,例如 nlu.yml。 rasa train nlu期间下载 tf_model.h5 1.88G,怎么这么大。。。(这个文件是 BERT 模型引入的。BERT,Bidirectional Encoder Representations from Transformers,是一种基于 TensorFlow 框架的模型。BERT 模型使用 Transformer 架构来学习文本表示,可以用于各种自然语言处理任务,如文本分类、命名实体识别、问答等。TensorFlow 是一个广泛使用的机器学习框架,可用于训练和部署各种深度学习模型。tf_model.h5 是使用 TensorFlow 框架训练的模型文件,其中 .h5 表示它是一个 HDF5 格式的文件。) 但是训练出来的模型文件,只有 20M。 > ls -lah models/ total 44M drwxrwxrwx 1 zhongwei zhongwei 4.0K Apr 7 10:35 ./ drwxrwxrwx 1 zhongwei zhongwei 4.0K Apr 7 10:03 ../ -rwxrwxrwx 1 zhongwei zhongwei 20M Apr 7 10:35 nlu-20230407-100759-obtuse-rack.tar.gz*测试: rasa shell nlu 测试效果greet intent,即,打招呼的意图: Next message: 你好 { "text": "你好", "intent": { "name": "greet", "confidence": 0.9999979734420776 },goodbye intent, 即,再见的意图: Next message: 再见 { "text": "再见", "intent": { "name": "goodbye", "confidence": 0.9999972581863403 },上面两个意料之中,至少可以说明已经支持中文了。而不是默认 en 的情况下,输入中文, 没有任何的回复。 比较让我吃惊的是下面这个的意图识别: Next message: 我拒绝 { "text": "我拒绝", "intent": { "name": "deny", "confidence": 0.9226003289222717 },我在 deny intent 的语料配置中,并没有设置“拒绝”这个词,但是依然准测的识别出来了。说明引入了预训练的中文语言模型,但是不知道是 pipeline 哪个配置引入的。 后续了解一下。 也有不满意的情况: Next message: 你好啊 { "text": "好啊", "intent": { "name": "affirm", "confidence": 0.4897577464580536 }, "entities": [], "text_tokens": [ [ 0, 1 ], [ 1, 2 ] ], "intent_ranking": [ { "name": "affirm", "confidence": 0.4897577464580536 }, { "name": "greet", "confidence": 0.34744495153427124 },实际上,第一候选意图应该是 greet,却被识别为了 affirm。还是不够智能,但是基本满足要求了。 支持中文回复前面训练 nlu 模型的过程,只是支持了中文的解析,但是并不支持中文回复。 在 domain.yml 中添加中文回复: version: "3.1" intents: - greet - goodbye - affirm - deny - mood_great - mood_unhappy - bot_challenge responses: utter_greet: - text: "你好!吃了么?" utter_cheer_up: - text: "Here is something to cheer you up:" image: "https://i.imgur.com/nGF1K8f.jpg" utter_did_that_help: - text: "Did that help you?" utter_happy: - text: "Great, carry on!" utter_goodbye: - text: "再见" utter_iamabot: - text: "我是一个机器人,你可以叫我小远子" session_config: session_expiration_time: 60 carry_over_slots_to_new_session: true 重新训练由于之前用 rasa train nlu 训练出来的模型只是解析,并不包含回复逻辑,所以需要重新训练。 注意,不要带 nlu 参数: > rasa train The configuration for policies was chosen automatically. It was written into the config file at 'config.yml'. 2023-04-08 09:43:08 INFO rasa.engine.training.hooks - Starting to train component 'JiebaTokenizer'. 2023-04-08 09:43:08 INFO rasa.engine.training.hooks - Finished training component 'JiebaTokenizer'. Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache Loading model cost 0.493 seconds. Prefix dict has been built successfully. 2023-04-08 09:43:10 INFO rasa.nlu.featurizers.dense_featurizer.lm_featurizer - Model weights not specified. Will choose default model weights: rasa/LaBSE All model checkpoint layers were used when initializing TFBertModel. All the layers of TFBertModel were initialized from the model checkpoint at rasa/LaBSE. If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training. 2023-04-08 09:43:39 INFO rasa.engine.training.hooks - Starting to train component 'DIETClassifier'. /home/zhongwei/.local/lib/python3.8/site-packages/rasa/utils/train_utils.py:528: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss. rasa.shared.utils.io.raise_warning( Epochs: 100% 300/300 [00:32 ls -lah models/ drwxrwxrwx 1 zhongwei zhongwei 4.0K Apr 8 09:44 ./ drwxrwxrwx 1 zhongwei zhongwei 4.0K Apr 7 17:28 ../ -rwxrwxrwx 1 zhongwei zhongwei 24M Apr 8 09:44 20230408-094308-burning-dessert.tar.gz* -rwxrwxrwx 1 zhongwei zhongwei 20M Apr 7 10:35 nlu-20230407-100759-obtuse-rack.tar.gz* rasa shell再次启动 rasa shell,会看到同时启用了 rasa server, 并加载了新训练的模型文件。 > rasa shell 2023-04-08 09:46:57 INFO root - Connecting to channel 'cmdline' which was specified by the '--connector' argument. Any other channels will be ignored. To connect to all given channels, omit the '--connector' argument. 2023-04-08 09:46:57 INFO root - Starting Rasa server on http://0.0.0.0:5005 2023-04-08 09:46:57 INFO rasa.core.processor - Loading model models/20230408-094308-burning-dessert.tar.gz... 2023-04-08 09:46:59 INFO rasa.nlu.featurizers.dense_featurizer.lm_featurizer - Model weights not specified. Will choose default model weights: rasa/LaBSE All model checkpoint layers were used when initializing TFBertModel. All the layers of TFBertModel were initialized from the model checkpoint at rasa/LaBSE. If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training. /home/zhongwei/.local/lib/python3.8/site-packages/rasa/utils/train_utils.py:528: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss. rasa.shared.utils.io.raise_warning( 2023-04-08 09:47:43 WARNING rasa.shared.utils.common - The UnexpecTED Intent Policy is currently experimental and might change or be removed in the future 🔬 Please share your feedback on it in the forum (https://forum.rasa.com) to help us make this feature ready for production. 2023-04-08 09:47:50 INFO root - Rasa server is up and running. Bot loaded. Type a message and press enter (use '/stop' to exit): 中文对话测试 Your input -> 你好 你好!吃了么? Your input -> 你是机器人么 我是一个机器人,你可以叫我小远子 input -> 你是谁 我是一个机器人,你可以叫我小远子果然支持中文回复了。 rasa train nlu 异常rasa.engine.exceptions.GraphSchemaValidationException: Component 'JiebaTokenizer' requires the following packages which are currently not installed: jieba. 解决: pip3 install jiebarasa.engine.exceptions.GraphSchemaValidationException: Component 'LanguageModelFeaturizer' requires the following packages which are currently not installed: transformers. 解决: pip3 install transformers 参考 https://rasa.com/docs/rasa/language-support/tags: rasa 关于作者
我是来自山东烟台的一名开发者,有敢兴趣的话题,或者软件开发需求,欢迎加微信 zhongwei 聊聊,
查看更多联系方式。
|
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |