大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2018.09】基于主动学习的文本分类

已有 166 次阅读 2020-9-29 20:46 |系统分类:科研笔记|文章来源:转载

本文为荷兰埃因霍芬理工大学(作者:Šostak, T.)的硕士论文,共51页。

 

缺乏足够的训练数据一直是机器学习中的一个问题。即使有足够的数据,数据仍然需要由领域专家手动注释来构建模型。主动学习通过减少构建足够模型所需的标记数据量来加快注释过程,从而节省了人工注释者的成本和时间。

 

这篇论文将在不同的数据集上对已有的和新的主动学习方法进行基准测试,并提出一个主动学习系统的实现。这些数据集包含文本形式的自然语言,并接受文本分类任务。情感分析是文本分类的一个特例,它通常需要不同的技术,如深度学习来处理复杂的句法和语义。因此,本文还将探讨主动学习对情绪分析的影响。在飞利浦研究所的项目执行期间,主动学习系统被整合到现有的数据管理系统中。该系统为人类注释者提供直观的用户界面,并与主动学习微服务进行通信。本文的研究结果很有希望,不确定性抽样与支持向量机或深度学习算法相结合,可以保证所需的标记减少一半。这些结果的关键是正确选择实例,并使用主动学习抽样技术专门为文本分类任务量身打造相应的方法。

 

The lack of sufficient training data hasalways been an issue in machine learning. Even when there is enough data, thedata still has to be manually annotated by a domain expert to build a model.Active learning quicken up the annotation process by reducing the amount oflabeled data needed to build a sufficient model, thus saving the cost and timeof the human annotators. This thesis will benchmark well established and newactive learning approaches on different datasets and will also present theimplementation of an active learning system. The datasets contain naturallanguage in form of text and are subject to text classification tasks.Sentiment analysis (classification of polarity) is a special case of textclassification, which usually requires different techniques, such as deeplearning, to deal with complex syntax and semantics. Therefore, this thesiswill also investigate the impact of active learning on sentiment analysis. Theactive learning system was incorporated into an existing data management systemduring the project execution at Philips Research. The system provides humanannotators with an intuitive user interface and communicates with an activelearning microservice. The results presented in this thesis are very promisingand techniques such as uncertainty sampling combined with support vectormachine or deep learning algorithms guarantee the reduction of needed labels byhalf. The key to such results is the proper selection of instances with the useof active learning sampling techniques which are tailored specifically for textclassification task.

 

 

1. 引言

2. 主动学习技术

3. 文本分类

4. 实验

5. 主动学习系统

6. 结论

附录系统截图

附录B T-SNE表征:语句云


更多精彩文章请关注公众号:205328s611i1aqxbbgxv19.jpg




http://blog.sciencenet.cn/blog-69686-1252628.html

上一篇:[转载]【信息技术】【2017.12】用于可靠实时通信的改进高级加密标准算法
下一篇:[转载]【电子技术】【2014.11】基于STM32F429的CMOS教学摄像机

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2020-10-25 19:04

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部