#保姆级# StanfordCoreNLP

您所在的位置:网站首页 用树形图分析句法结构 #保姆级# StanfordCoreNLP

#保姆级# StanfordCoreNLP

2023-03-30 22:57| 来源: 网络整理| 查看: 265

句法分析(Constituency Parsing)是NLP领域非常重要、也是相对基础的解析任务。

句法分析的任务,就是解析出句子中的短语结构、短语之间的层次句法关系。

通过Stanford提供的CoreNLP,很容易实现句法分析。比如下面这句话:

which province is jack and john electorated in?

 首先,我们可以通过斯坦福提供的在线网站 http://corenlp.run/ 对这句话进行解析,如下:

点击Submit后,可以constituency parse的结果:

 那么,如何通过代码来实现呢?StanfordCoreNLP获取句法解析树的Java代码如下:

class NLPUtils{ private static Properties properties = new Properties(); private static StanfordCoreNLP pipeline; static{ init(); } /** * 成分分析、语法分析 * @param singleNL */ public static void constituencyAnalysis(String singleNL){ if (StringUtils.isBlank(singleNL)){ return; } CoreDocument document = getCoreDocument(singleNL); CoreSentence sentence = document.sentences().get(0); Tree constituencyParse = sentence.constituencyParse(); System.out.println(constituencyParse); } /** * 获取 NL 的 CoreDocument 对象 * * @param NL * @return */ public static CoreDocument getCoreDocument(String NL) { return pipeline.processToCoreDocument(NL); } /* * 初始化 * */ private static void init() { properties.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,depparse,parse"); properties.setProperty("ner.additional.regexner.ignorecase", "true"); properties.setProperty("ner.applyNumericClassifiers", "true"); properties.setProperty("ner.buildEntityMentions", "true"); properties.setProperty("ner.useNERSpecificTokenization", "true"); pipeline = new StanfordCoreNLP(properties); } public static void main(String[] args) { String singleNL = "which province is jack and john electorated in ?"; constituencyAnalysis(NL); } }

 上面的代码执行结果如下:

(ROOT (SBARQ (WHNP (WDT which) (NN province)) (SQ (VBZ is) (NP (NP (NNP jack) (CC and) (NNP john)) (PP (NP (NNP electorated)) (IN in)))) (. ?)))

有兴趣的朋友可以通过代码将上面的结果转化为多叉树的结构。

【提示】

通过CoreSentence.constituencyParse()获取到解析树后,再去获取到它的iterator:

Iterator iterator = constituencyParseTree.stream().iterator(); while (iterator.hasNext()){ Tree next = iterator.next(); System.out.println(next); System.out.println(next.numChildren()); } }

输入结果如下,将输入结果同文章开头的图进行比较,便可以得到一些思路:

(ROOT (SBARQ (WHNP (WDT which) (NN province)) (SQ (VBZ is) (NP (NP (NNP jack) (CC and) (NNP john)) (PP (NP (NNP electorated)) (IN in)))) (. ?))) 1 (SBARQ (WHNP (WDT which) (NN province)) (SQ (VBZ is) (NP (NP (NNP jack) (CC and) (NNP john)) (PP (NP (NNP electorated)) (IN in)))) (. ?)) 3 (WHNP (WDT which) (NN province)) 2 (WDT which) 1 which 0 (NN province) 1 province 0 (SQ (VBZ is) (NP (NP (NNP jack) (CC and) (NNP john)) (PP (NP (NNP electorated)) (IN in)))) 2 (VBZ is) 1 is 0 (NP (NP (NNP jack) (CC and) (NNP john)) (PP (NP (NNP electorated)) (IN in))) 2 (NP (NNP jack) (CC and) (NNP john)) 3 (NNP jack) 1 jack 0 (CC and) 1 and 0 (NNP john) 1 john 0 (PP (NP (NNP electorated)) (IN in)) 2 (NP (NNP electorated)) 1 (NNP electorated) 1 electorated 0 (IN in) 1 in 0 (. ?) 1 ? 0



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3