||
光学字符识别和简单的数字识别任务一直是机器学习研究的热点。
Optical character recognition, and the simpler digit recognition task, has been the focus of much ML research.
关于这个主题,我们有两个数据集。
We have two datasets on this topic.
第一个数据集解决的是更普遍的OCR任务,只涉及一小部分词汇:(请注意,每个单词的第一个字母都被删除了,因为首字母是大写字母,会使任务更难完成。)
The first tackles the more general OCR task, on a small vocabulary of words: (Note that the first letter of each word was removed, since these were capital letters that would make the task harder for you.)
项目建议:
在普遍的OCR情况下,使用HMM利用相邻字母之间的相关性来提高识别的准确性。(由于邮政编码在相邻的数字之间没有这样的限制,所以HMMs在数字识别中可能没有帮助。)
Use an HMM to exploit correlations between neighboring letters in the general OCR case to improve accuracy. (Since ZIP codes don't have such constraints between neighboring digits, HMMs will probably not help in the digit case.)
数据集下载地址:
http://ai.stanford.edu/~btaskar/ocr/更多精彩文章请关注微信号:
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-9-24 14:10
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社