已有 2013 次阅读 2018-8-1 19:39 |个人分类:论文写作|系统分类:科研笔记

  1. 摘要 

1. In this paper, we present a method that combines the ideas of the two types of methods while avoiding their shortcomings.

2. We leverage recent advances of deep convolutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem.

3. In order to measure the characterness we develop three novel cues that are tailored for character detection, and a Bayesian method for their integration.


In particular, strokelets possess four distinctive advantages over conventional representations, which are called the URGE properties:

To deliver reproducible research, we will make the source code publicly available, and hope it would be useful to other researchers.

There has been a rich body of works concerning text recognition in natural images in recent years [30, 29, 23, 21, 37, 34].

Those scene text detectors can be split into two branches.

Our motivations mainly come from two observations:

This problem includes two sub-tasks, namely text detection and text-line/word recognition.

This is the first attempt to show the effectiveness of exploiting convolutional sequence with sequence labeling model for this challenging task.

To ease the computational burden in texture-based methods, the combination of edge and gradient feature-based methods are proposed.

To remedy this issue, we propose an algorithm that can detect and recognize texts with different orientations.

In this line of works, many choose to filter a redundant set of character proposals and recognize them with a classifier.

Morever, the practicability and applicability of the current alorithms are very limited. 

This is a sever shortcomings, since a consderable portion of the texts in real-world scenes are non-horizontal.

The explosive growth of smartphones and online social media have led to the accumulation of large amounts of visual data, in particular, the massive and increasing collections of video on the Internet and social networks.

Despite this, numerous advanced techniques for video text extraction have proliferated impressively over the past decade.

Some recent work strives to detect and localize objects with pixel-wise precision, which somewhat blurs the boundaries between object detection and semantic segmentation [23, 21].

To widen the scope of document analysis based approaches, there are methods proposed for text detection from natural scene images [13-18]. These approaches directly or indirectly rely on the features of connected components and the shapes of characters to achieve a good accuracy.

In summary, developing a method that works well for text detection in both video and natural scene images is challenging, and hence it is a research issue worth exploring.


The rest of the paper is organized into four parts.

In summary, contributions of our work comprise the following.

Tracking-based-Detection and Tracking-based-Recognition techniques are then surveyed and highlighted in Sections IV-B and IV-C, respectively.

Section 5 outlines the estimation procedure for the prior parameters and the parameters of the observation model.

Section 6 illustrates the pre and postprocessing steps of the method and Section 7 presents the experiments we performed on scanned document images in order to evaluate the performance of the proposed method. 

Section 8 finally concludes.


These approaches are good at recall but poor at precision because the proposed features are sensitive to background complexity leading to more false positives.

However, the performance of the approach depends on the edge image of the input image.

In light of the above discussions, we can notice that connected component based approaches are good for studying geometrical features of text components, edge and gradient based approaches are good for finding inter and intra characters symmetry, while texture based approaches are good for text detection from complex backgrounds.

Therefore, the contributions of the proposed approach are three folds: 

To tackle the problems of multi-size and multi-contrast texts, we combine Laplacian with wavelet sub-bands at different levels through fusion. This is illustrated in Fig. 1, where (a) is a sample text line image, (b) is the profile given by the Laplacian alone, (c) is the profile given by the Laplacian with Fourier, and (d) is the profile given by the Laplacian with wavelet.


These approaches mainly follow the pipeline of conventional OCR techniques by first involving a character-level segmentation, then followed by an isolated character classifier and post-processing

for recognition.

The main inspiration for approaching this issue comes from the recent success of recurrent neural networks (RNN) for handwriting recognition (Graves, Liwicki, and Fernandez 2009; Graves and Schmidhuber 2008), speech recognition (Graves and Jaitly 2014), and language translation (Sutskever, Vinyals, and Le 2014).

As we will further specify later, all modules in the rectification network are differentiable

To the best our of knowledge, we are the first to present a saliency detection model which measures the characterness of image regions.

Two phases of experiments are conducted separately in order to evaluate the characterness model and scene text detection approach as a whole.

This dataset contains 229 images harvested from natural scenes.

To overcome this unfavorable fact, ICDAR 2011 competition adopts the DetEval software [66] which supports one-to-one matches, one-to-many matches and many-to-one matches.

To ease the computational burden, other methods prefer to use a publicly available OCR for recognition rather than using expensive classifier.

In view of this, MSER detects multiple overlapping sub-parts of the same components instead of a single component.


The highest accuracy was achieved with T =500. In all the following experiments except the last one,

strokelet count T is fixed at 500.

The proposed algorithm achieves recognition accuracy of 80.33%, 88.48% and 75.89%on ICDAR 2003(FULL), ICDAR 2003(50) and SVT respectively, outperforming the competing methods

of [29, 25, 21, 20, 37, 27], but still behind those in [31, 11].

The performance gains achieved by the proposed method are mainly due to two reasons: 

Their combination leads to higher performance (80.2%).


The flow diagram of the proposed method can be seen in Fig. 2.

The optimization procedure

is sketched in Algorithm 1


2 刘伟 杨正瓴

该博文允许注册用户评论 请点击登录 评论 (2 个评论)


Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2022-5-29 19:22

Powered by

Copyright © 2007- 中国科学报社