||
同义词
attain强调获得某种结果和到达某种程度
attain a degree
achieve一般指完成某种壮举,成就某种事业
achieve the goal
implement是指履行某种义务
implement the proposal
accomplish是指完成某种(军事等)任务
意思介于achieve与complete之间,既指完成了某事又有成就感的感情色彩
accomplish the task
complete一般指完成项目或者作业等
complete the homework
satisfying令人满意的 修饰东西
satisfied使人满意的 修饰人
当要表达“在某人的帮助下”时用with one's help.
当要表达借助某人/某物时用with the help of sb/sth.
With one's help意思是“在某人的帮助下”,其中的one's是形容词性物主代词或名词所有格.
With the help of则是加名词性物主代语或名词.
with the help of sb. == with one’s help 在某人的帮助下
reach 到达,达到
exceed to 超过
surpass指优越性方面的超过
exceed数量方面的超过
连接词
first(ly),
second(ly),
third(ly),.
漂亮的说法:
Above all/First of all/
Furthermore,/What's more,/in addition,/moreover/Meanwhile/Thus,
Last but not least/Finally
To the best of our knowledge
Without loss of generality
To bridge the semantic gap
Specifically/More specifically/Therefore/Along this way/Most recently/For example/In this work/With the above definition/As LIOP and other LBP-like methods/Obviously/As expected/In order to alleviate this problem/In order to deal with the problem of /Due to this observation/To investigate/ For this purpose/Among the proposed one/Generally speaking/similarly/Overall speaking/to approach this problem/more recently/To overcome this problem/in order to/in contrast//in addition to that/to overcome this effect/to simplify the description/ More importantly/To handle this issue/in essence/to be definite/conversely/
句型
摘要
To permit a fast search, we introduce a two-level cascade.
It has a wide spectrum of promising homeland security applications.
Recently sparse representation has received a lot of attention in computer vision.
Driven by the wide range of applications, scene text detection and recognition have become active research topics in computer vision.
引言
1.Our work is fundamentally different from the previous one.
2.Our proposed two descriptors have the following characteristics, making them both distinctive and robust to many image transformations:
3.Many literatures can be found on this topic.
4.A fall event usually lasts for a short period of time.
5. This line of methods can be further grouped into two categories: two-view methods and multi-view methods.
6.It remains an open problem, despite intensive research during the past decade.
7. to take advantage of the temporal information, we represent actions as time series and compare them using global alignment kernel.
7.Motivated by the success of higher-order encoding method, we propose the second-order LASC vector based on the Fisher information metric to further improve the performance of the proposed LASC method.
8. Cross-view action recognition from 3D videos remains an under explored area.
9. These feature representations can be divided into two groups: global and local representations.
10.Actions are categorized into two types: shared actions observed in both views and orphan actions that ar only observed in the source view.
11. A vast quantity of work continued in the vein, using high degree-of-freedom 3D models of people, rendering them in the image plane, and comparing them with image data.
12. Object and image recognition has undergone a rapid progress in last decade due to advances in both features design and kernel methods [15] in machine learning.
13.Human action recognition is of crucial importance for a wide range of applications such as
intelligent surveillance, automatic video content analysis, human computer interaction,
to name just a few.
14.Recently several researchers have tried to employ multi-task learning (MTL) for human action
recognition and leveraged the shared knowledge among them to improve the generalization ability of model learning[1], [2] as shown in Fig. 1b.
15.Feature learning algorithms have enjoyed a string of successes in other fields (for instance, achieving high performance in visual recognition [6] and audio recognition [7]).
16. Armed with this tool, we will produce results showing the effect on recognition performance as we increase the number of learned features.
文章结构目录
The paper is structured in the following manner.
The paper is structured as follows:
Rest of paper is organized as follows.
The remaining sections of the paper are organized as follows.
The outline of this overview follows:
The remaining of this paper is organized as follows.
The remainder of this paper is organized as follows.
Section II reviews the work related to motion detection including modeling of environments, segmentation of motion, classification of moving objects.
Section III discusses tracking of objects, and Section IV details understanding and description of behaviors.
Section 3 describes the essential qualities of a shape representation;
the effectiveness of the ordinal measure is ascertained in Section 3;
Then, Section III details the covariance descriptor of the cuboids and the proposed DPCM.
In Section III, we elaborate on the details of the proposed model.
Section 3 covers the experimental setup,results, and analysis of the proposed approach for MIVIA action,NATOPS gesture, SBU Kinect interaction, and Weizmann datasets.
Section 4 gives an overview of 3D shape searching techniques.
In section 4 we present the result and compare with other state-of-art MKL
methods for object recognition.
In Section 5, we outline the procedure used for feature encoding and classification
Section 5 delves in detail into stateof-the-art.
Sections 6 and 7 compare the various techniques with respect to efficiency and effectiveness and
describe future trends.
The details, evaluations, and discussion of the four recognition scenarios are shown in Sections IV–VII.
Sections V and VI cover, respectively, personal identification at a distance and fusion of data from multiple cameras.
In Section 5 we present the evaluation results and conclude in Section 6.
Section VII analyzes some possible directions for future research.
The last section summarizes the paper.
Finally, Section 6 deals with conclusions and future work.
Lastly, conclusions are drawn in Section VI.
Finally, we draw conclusions in Section 7.
Last, Section VIII gives a conclusion of the paper.
Our system proceeds in several stages:
算法
1.Recently, to alleviate this problem, some researchers have proposed to use
2.These methods work well on monotonic illumination changes, but they do not make significant improvements on overall performance.
3.To further explore the effectiveness of ordinal information, two intensity order patterns are proposed in this paper, namely LIOP and OIOP.
4.It is worth noting that different from other methods such as , we
5.To overcome these problems, here we propose a LIOP for feature extraction.
6.Before the formal definition of LIOP, some basic mappings are first introduced.
7.From the intensity order based patch division, we can observe that it actually quantizes the intensity order of pixel.
8.To give a better understanding of our method, we visualize the OIOP computation in Figure3.
9.To further study the performance of intensity order based descriptors for dealing with complex illumination changes, we captured two additional image sequences....
10. In this section, we evaluate all the tested descriptors on the Patch dataset.
11.Local descriptors n the literature can be roughly divided into two categories concerning rotation invariance:
12.In order to give more insight into the influence of the estimated orientatin on the matching performance of local descriptor, we conducted some image matching experiments.
13.To deal with multiview cases, the parwise stragey is usually exploited, resulting in multiple two-view models.
14. For this purpose, the MCCA was proposed to obtain one common space for v view.
15. Some other methods attempted to decompose the variations of each view.
16. Moreover, inspired by the observation that different view shares similar structures, a constraint enforcing the consistency of the multiple linear transforms is introduced to achieve a more robust common space.
17.Building upon the assumption that data are drawn from a union of linear subspaces, the subspace segmentation or clustering methods study how to estimate the number, dimensions, and basis of the subspaces.
18 In light of the various consequences detailed in section III, it seems important to deal with label noise.
19.The motivation for introducing these MR filters sets is two-fold.
20.The efficiency, effectiveness and flexibility makes this model a suitable technique for
gesture pattern identification.
21.The essential information conveyed by the video can be usually captured by analyzing the boundary of each object as it changes with time .
22. To make the representation more discriminative,
we discard the atoms in the shared dictionary D0v which are only useful for the feature
representation but would dilute the differences between different actions.
Then PCA is adopted to generate a low dimension representation of each view.
We retain the components representing 95% of the variance.
23.Laplacian regularization and Hessian regularization terms were taken into account to derive the sparse representations varying smoothly along the geodesics of data manifold.
24.In order to leverage the benefits of multiple descriptors, a dictionary is learned for each view,
and the corresponding sparse representations of those descriptors are fused in a low dimensional
feature space together with temporal information.
25.Introducing low-cost devices such as Kinect sensors has triggered many research activities for
achieving concise descriptions in recognition task due to the iravailability of depth sequences
along side the RGB data.
26.It can be proved by considering the dual of AKM, Eq. (3), which can be rewritten as follows:
27.This highlights the importance of learning in fusion methods.
28.We tie together the action prediction loss and the relation prediction losses in one single objective as shown below:
29.In the past decades, the machine learning problem has evolved from conventional single-view learning problem,to cross-view learning, cross-domain learning and multitask learning,
where a large number of algorithms have been proposed in the literature.
30.The higher the score difference, the better is the ability of the classifier to distinguish between
targets.
31.We leverage two key insights to build our model, along with the known fact that
semantic relations help training visual models:
32.However, we could compensate for this sparsity in supervision by leveraging the rich
semantic relationship between different actions.
33.Up to now, Concerning this last scenario (shopping), we observed strong discontinuities and
gaps (pedestrian occlusions) in several trajectories.
34.However, the performances of RGB-SV by Fisher vector is slightly worse than BoW,
since both discriminative viewpoint and modality augment the challenge for cross-domain learning.
35.With these local salient points, the bag-of-word (BoW) approach [4], [17] can be leveraged to
construct a histogram-based representation and lots of work have demonstrated that the BoW
representation can work successfully with SVM classifier.
36. By this mechanism, we label each pixel with a score according to whether that pixel is part of a block of text.
37.Functionally, the framework consists of four components: a feature pyramid network(FPN) [46] as backbone, a region proposal network (RPN) [31] for generating text proposals, a Fast R-CNN [31] for bounding boxes regression, a mask branch for text instance segmentation and charactersegmentation.
实验部分
1.All the parameters are listed in Table 1.
2.To investigate the effect of these parameters, we performed image matching experiments on 90 paris of images provided by MM with different parameter settings listed by the 3-rd column of Table 1.
3.It can be found that in figure d that the performance of MIOP slightly degrades along with teh decrease of D.
4. Fore a compromise between discriminability and dimensionality, we select D=128 to make MIOP have the same dimension as SIFT.
5.The results are shown in Figure 7.
6.For all parameter settings, OIOP consitently outperforms *OIOP.
7. This sufficiently demonstrates the effectiveness of the proposed learing based quantization.
8.The detailed results on Harris-Affine regin are show in figure 9.
9.For OIOP, it is better or at least comparable to LIOP in most tested cases.
10. For a comprehensive study, we also conduct the experiments on other four popular affine covariant regions: ………。
11.Beside the Oxford dataset and Pathc dataset, we have also evaluated our descriptors on teh 3D objects datast proposed by M and P.
12.As our experiments show, one single support regon is not enough to distinguish incorrect matches from correct ones in general.
13. As shown in figure, we choose support regions as the N nested regins centered at the interest point with an equal increment of radius.
14.It can be seen from fig 9 that the performances of MROGH and MRRID are improved when the number of support regions is increased.
15.To evaluate the performance of the proposed descriptors, we conducted extensive experiments on image matching.
16.We further conducted experiments on object recognition to show the effectiveness of the proposed descriptors.
17. To make the comparison as fair as possible, for LASC with M subspaces of d-dimension, we use the dictionary of Md visual words for LLC such that both have coding vector of same size.
18.In sharp contrast, the values of dr are densely populated, indicating that the nearest subspaces can not be determined reliably.
19.To analyze why various proximity measures perform differently, we randomly select 50 features from the set of training features, and compute the values of proximity measures of the these features to all the affine subspaces.
20.The results in Table I tell that all the four selection schemes outperform RS-based method, which indicates the discriminative detectors are indeed discovered.
21. Also, the deep method VLAD3 [76] has a prominent improvement and achieves the accuracy 96.6% with the combination of iDTs[3.]
22.Naturally, with only one model per class (K = 1) in the restricted case, since the motion field
in Fig. 2 (b) also explains quite well the straight trajectories,
the classification accuracy is essentially the same as that of random guessing
实验部分之--方法评价
Obviously, SMP+FS outperforms other two methods on the four datasets.
Noteworthily , the performance of VMP has slight reduction when combined with FS on Olympic Sports and HMDB51 datasets.
This maybe because that the positions of the objects in the videos are various for the drastic camera motion on the two datasets.
The performance on the KTH and UCF50 has slight gain, which maybe because those most actions are placed in the center of the videos.
Meanwhile, SMP+SF outperforms SMP on these four datasets, which indicates our SMP scheme is more space-time robust than other two pooling strategies, by combined with FS.
The results demonstrate once again space-time context is discriminative for action classification. On the other hand, feature alignment and selection can further enhance the space-time robustness.
Our method achieves the best F-measure over all methods.
Again, we see that accuracy climbs as a function of the number of features.
This is comparable or superior to other (purpose-built) systems tested on the same problem.
实验部分之--参数描述
1.When investigating the parameter L, we fix lambda =0.1, beta=0.9 for KTH and lambda=0.5, beta =0.8 for HMDB51. We report the classification accuracy in Fig. 8.
2.We can see that the accuracy increases before L=2 but descends fast later. The results show that robustness of our representation descends after L>2.
3.The classification accuracy first increases with increasing neighborhood size reaches a maximum for a 7*7 neighbourhood using 2440 textons and then goes down slightly for 9*9 and 11*11. This indicates that the optimal neighbourhood size for the CUReT dataset is around 7*7.
4. When classifying the original textures, both the MRF and MR8 classifiers achieve 100% accuracy.
5.When the zoomed in images are added to just the test set, the accuracy rates drop to 93.48% and 81.25% respectively.
6. When the zoomed in images are added to both the test and training sets, the accuracy rates go back to 100%and 99.46% respectively.
7. However, the classification rate of this method is 87%, far inferior to the 95.6% achieved by the multiple image method.
8.The classification rate with 9 K-Medoid selected models per texture is almost as good as using all 46 models.
9. The classification rate decreased only slightly to 98.19% signifying that our textons are sufficiently versatile.
10.Note that the number of samples in gure 1(c) is equal to m which is the same as the number of
samples in K1 and K2.
11.For the sake of clarity, we present the optimization steps for one group of data X and its corresponding coefficients C .
12.λ is a tuning parameter of our model and it actually controls the sparseness degree of reconstruction coefficient matrices.
13.It is important to note that AKM on its own is giving 61.0%, however, when it is used together with grouping stage it is performing 1.8% better.
14.It is because linear combination within grouping stage gives good representative kernel with less noisy or redundant data.
15.However, due to use of the signicant examples only, we are using 3 to 4 times less samples per base kernel.
16.They outperform the MKL baseline by more than 2.5% and multiclass MKL by 0.6%.
17. All parameters have been set experimentally, but most remained identical for all sequences.
18.We achieve state-of-the-art results on this dataset by combining the stacking kernel with the
4 base kernels using CLF.
19.This combination performs 10% better than multiclass MKL and 6.6%, 7% and 2.3% better than MKL baseline, CLF baseline and stacking, respectively.
20.Specially, on ICDAR2015, evaluated at a single scale, our method achieves an F-Measure of 0.86 on the detection task and outperforms the previous top performers by 13:2% -25:3% on the end-to-end recognition task.
21.Fig. 6 shows how the number of correct matches of all images is influenced by these two parameters. It is clear that LBD and MSLD share similar rules: the performance increases fast at the beginning with the increment of m or w, then reaches the best performance when m ¼ 9 and w is about 7 or 9, after that there is a steady performance decrease.
讨论部分
1. To further clarify the significance of our method, this section will discuss in details the difference between our MvDA and previous closely related methods.
图片部分
Fig.13 shows the comparative results.
Fig.2 shows the curve of the mAP of LASC vs. number of nearest supspaces.
Fig.6 presents the mAP of the respective methods vs. dictionary size.
The results are displayed in Fig. 5.
The process is repeated to produce a binary tree of height 4 detailed in Fig. 7.
Fig. 8 shows the history of one particular day.
The curve in Fig.7 demonstrates that our proposed ASPL model requires relatively larger annotations when the training iteration number is small.
The complementary pyramids in Fig. 1 depicts our view of video surveillance.
Each object is identified and tracked to describe their activity and produce video annotations as outlined in the flow diagram of Fig. 2.
Fig. 2 (b, c) displays examples of motion field estimates (blue arrows) obtained by the proposed EM algorithm, together with a set of observed trajectories (yellow).
The plots in Fig. 14 show that the best accuracies are achieved for an “intermediate” value.
Fig. 7 compares the performances of STL and MTL methods including L21, CMTL, rMTFL, RMTL, and SparseTrace.
Fig. 7 shows that rMTFL usually performs better than RMTL, which illustrates that the joint feature learning and group discovery by rMTFL can benefit improving the multitask performance
comparing against RMTL and CMTL.
We can observe from Fig. 6 that under the same condition, the overlapping area between inter-class and intra-class distance distributions
for ? 2 is always smaller than that for ? 6.
表格部分
Table1 summarizes comparison results with other methods besides FV.
Results are given in Table 3.
Table 4 summarizes our results.
Table 4 comparese our algorithms with existing methods.
Table 6 shows the average accuracy of the proposed method in the three settings.
As shown in Tables 1 and 2, our approach yields a much better performance for all 20 combinations fro the first two modes.
The results in Table I tell that all the four selection schemes outperform RS-based method, which indicates the discriminative detectors are indeed discovered.
From the results in Table VI on HMDB51[33], we obtain recognition accuracy by 64.1% for multilevel representation, which outperforms the best mid-level methods [58]by 1%. On the other hand, TDD[47] integrates the advantages of hand-crafted and deep-learned features and achieves the prominent performance 65.6%, which is still forereached by us with 0.6%.
Moreover, as illustrated in Table 3, with the increase of user annotations over time, ASPL can automatically assign more reliable pseudo-labels to the unlabeled samples selected in the self-paced way.
The maximal recognition rate of each classifier and the corresponding dimension are listed in Table 1.
Table 2 shows that NC-CL1C achieves comparable results with SRC, but the former is much faster that the later.
Table VII reviews a list of exemplary activity path modeling systems and their associated activity
analyses.
The actual identification rates are indicated in Table 2.
The values in Table II (top) show perfect accuracy is obtained for K = 2 and that good generalization to the test set occurs.
Table VI summarizes the results, where each cell contains the mean accuracy.
The experimental results are presented in Table IV for different combinations of detectors and descriptors on the frontal/side views in the RGB/depth modalities, respectively.
Table III reports the performance of five representative cross-domain learning methods.
结论
1.This paper intensively explores ordinal information for feature description.
2.The key contribution includes the following aspects:
3.Our main contributions include:
4.It has the following important properties:
5. This paper has shown a novel, probabilistic method for background subtraction.
http://blog.sciencenet.cn/home.php?mod=space&uid=645848&do=blog&id=813630
[转载]SCI论文典型句式[引]
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-27 06:40
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社