本文为荷兰埃因霍芬理工大学(作者:Šostak, T.)的硕士论文,共51页。






The lack of sufficient training data hasalways been an issue in machine learning. Even when there is enough data, thedata still has to be manually annotated by a domain expert to build a model.Active learning quicken up the annotation process by reducing the amount oflabeled data needed to build a sufficient model, thus saving the cost and timeof the human annotators. This thesis will benchmark well established and newactive learning approaches on different datasets and will also present theimplementation of an active learning system. The datasets contain naturallanguage in form of text and are subject to text classification tasks.Sentiment analysis (classification of polarity) is a special case of textclassification, which usually requires different techniques, such as deeplearning, to deal with complex syntax and semantics. Therefore, this thesiswill also investigate the impact of active learning on sentiment analysis. Theactive learning system was incorporated into an existing data management systemduring the project execution at Philips Research. The system provides humanannotators with an intuitive user interface and communicates with an activelearning microservice. The results presented in this thesis are very promisingand techniques such as uncertainty sampling combined with support vectormachine or deep learning algorithms guarantee the reduction of needed labels byhalf. The key to such results is the proper selection of instances with the useof active learning sampling techniques which are tailored specifically for textclassification task.



1. 引言

2. 主动学习技术

3. 文本分类

4. 实验

5. 主动学习系统

6. 结论


附录B T-SNE表征:语句云



