Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node

来源 :计算机科学技术学报(英文版) | 被引量 : 0次 | 上传用户:fengliufeng
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Semi-Markov conditional random fields (Semi-CRFs) have been successfully utilized in many segmentation problems,including Chinese word segmentation (CWS).The advantage of Semi-CRF lies in its inherent ability to exploit properties of segments instead of individual elements of sequences.Despite its theoretical advantage,Semi-CRF is still not the best choice for CWS because its computation complexity is quadratic to the sentence's length.In this paper,we propose a simple yet effective framework to help Semi-CRF achieve comparable performance with CRF-based models under similar computation complexity.Specifically,we first adopt a bi-directional long short-term memory (BiLSTM) on character level to model the context information,and then use simple but effective fusion layer to represent the segment information.Besides,to model arbitrarily long segments within linear time complexity,we also propose a new model named Semi-CRF-Relay.The direct modeling of segments makes the combination with word features easy and the CWS performance can be enhanced merely by adding publicly available pre-trained word embeddings.Experiments on four popular CWS datasets show the effectiveness of our proposed methods.The source codes and pre-trained embeddings of this paper are available on https://github.com/fastnlp/fastNLP/.
其他文献
  在危重病患者救治过程中常常需要应用血管活性药物以改善心血管机能和全身微循环,维持稳定的血流动力学,从而保证重要脏器系统的血液灌注。随着临床危重病监测技术的不断发
有一天当我得知孟京辉要弃小剧场而搞音乐剧的时候,我就知道我会在生命的某个转弯处遇见他。果然。  我不认识孟京辉,居住在北京却不能让孟京辉认识我……举个说明,一群人在老黄的饭桌上热议孟导的一个新戏,当中有鸡贼好事者问:孟京辉那人,是不是有点儿……?所有人都天经地义地觉得我应该跟孟京辉很熟,应该跟他有关系,所以脑袋齐刷刷地转向我,期待影子老师的点评,但我不认识孟京辉,你可以想见那情那景,身为一个职业音