首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于SBERT-Attention-LDA与ML-LSTM特征融合的烟草问句意图识别方法
引用本文:朱波,黎魁,邱兰,黎博.基于SBERT-Attention-LDA与ML-LSTM特征融合的烟草问句意图识别方法[J].农业机械学报,2024,55(5):273-281.
作者姓名:朱波  黎魁  邱兰  黎博
作者单位:昆明理工大学;武汉工程大学
基金项目:中国烟草总公司云南省烟草公司重点项目(2021530000241012)
摘    要:针对烟草领域中问句意图识别存在的特征稀疏、术语繁多和捕捉文本内部的语义关联困难等问题,提出了一种基于SBERT-Attention-LDA(Sentence-bidirectional encoder representational from transformers-Attention mechanism-Latent dirichlet allocation)与ML-LSTM(Multi layers-Long short term memory)特征融合的问句意图识别方法。该方法首先基于SBERT预训练模型和Attention机制对烟草问句进行动态编码,转换为富含语义信息的特征向量,同时利用LDA模型建模出问句的主题向量,捕捉问句中的主题信息;然后通过更改后的模型级特征融合方法ML-LSTM获得具有更为完整、准确问句语义的联合特征表示;再使用3通道的卷积神经网络(Convolutional neural network,CNN)提取问句混合语义表示中隐藏特征,输入到全连接层和Softmax函数中实现对问句意图的分类。基于烟草行业权威网站上获取的数据集开展了实验验证,实验结果表明,所提方法相比其他几种深度学习结合注意力机制的方法精确率、召回率和F1值上有显著提升,与BERT和ERNIE(Enhanced representation through knowledge integration and embedding)-CNN模型相比提升明显,F1值分别提升2.07、2.88个百分点。

关 键 词:烟草问句分类  自然语言处理  特征融合  自注意力机制
收稿时间:2023/12/26 0:00:00

Tobacco Interrogative Intent Recognition Based on SBERT-Attention-LDA and ML-LSTM Feature Fusion
ZHU Bo,LI Kui,QIU Lan,LI Bo.Tobacco Interrogative Intent Recognition Based on SBERT-Attention-LDA and ML-LSTM Feature Fusion[J].Transactions of the Chinese Society of Agricultural Machinery,2024,55(5):273-281.
Authors:ZHU Bo  LI Kui  QIU Lan  LI Bo
Institution:Kunming University of Science and Technology; Wuhan Engineering University
Abstract:Aiming at the problems of feature sparsity, terminology and difficulty in capturing semantic associations within the text in question intention recognition in the tobacco domain, a feature fusion method based on sentence-bidirectional encoder representational from transformers-Attention mechanism-latent dirichlet allocation (SBERT-Attention-LDA) and multi layers-long short term memory (ML-LSTM) feature fusion was proposed. The method first dynamically encoded the tobacco question based on the SBERT pre-training model combined with the Attention mechanism and converted it into semantic-rich feature vectors, and at the same time, the topic vector of the question was modelled by using the LDA model to capture the topic information in the question; and then the joint feature representation with more complete and accurate question semantics was obtained by using the modified model-level ML-LSTM feature fusion method; and then the three-layer LSTM and ML-LSTM feature fusion method was used to identify the intention of the question. Then a 3-channel convolutional neural network (CNN) was used to extract the hidden features in the hybrid semantic representation of the question and fed them into the fully connected layer and Softmax function to achieve the classification of the question intent. Compared with the enhanced representation through knowledge integration and embedding (BERT and ERNIE) CNN models, the improvement was obvious (the F1 values were improved by 2.07 percentage points and 2.88 percentage points, respectively), which supported the construction of the Q&A system for tobacco websites.
Keywords:classification of tobacco questions  natural language processing  feature fusion  self-attention mechanis
点击此处可从《农业机械学报》浏览原始摘要信息
点击此处可从《农业机械学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号