博文

图像检索

已有 4900 次阅读 2011-8-18 10:23 |个人分类:图像检索|系统分类:科研笔记

Image Retrieval: Ideas, Influences, and Trends of the New Age

Abstract:

1) We survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and discuss the spawning of related sub-fields in the process.

考察了与图像检索和自动标注有关的300多个理论或经验性的重要贡献点，并套了你了在这一过程中与其相关的子领域的发展。

2) We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real-world.

讨论了现在构建实用型图像检索系统遇到的挑战。

3）We also conjecture what the feature may hold for image retrieval research..

未来图像检索领域可能出现的特征。

一, Introduction

1) Background:

a> Never express yourself more clearly than you are able to think. Sometimes, it’s hard to express your desire in precise word, and it’s better to through visual interpretation.

人们有时候很难不用图像而用文字来表达清楚其所需要的视觉事物。

b> The interpretation of what we see is hard to characterize, and even harder to teach machine.

人的视觉理解比较难于刻画，更不容易训练机器。

2）Scope of Content-based image retrieval

Any technology that in principle helps organize digital picture archives by their visual content.

任何依据数字图像的视觉内容来进行存档的技术都可归于基于内容的图像检索。

3）Shortcomings of CBIR as a real-world technology

a> Semantic gap

All current approaches is the reliance on visual similarity for judging semantic similarity, which may be problematic due to the semantic gap between low-level content and higher-level concepts.

b> Sensory gap

The limitations in recording.

4) Image Retrieval grows very fast

5) Achievement in early years

a> Different classification of image search

(1) According to the domain, we can classify image retrieval as narrow and broad.

Narrow image domains usually have limited variability and better-defined visual characteristics.

Broad image domains, typically web image, have high variability and unpredictability.

(2) According to the goal, we can classify broad image retrieval into search by association, aimed search and category search.

b>Progress in each aspect of image retrieval

(1) Extraction of visual content from images is split into two parts, namely image processing and feature construction. The question asked here is what features to extract that will help perform meaningful retrieval.

图像视觉内容的提取分成两个部分，图像处理和图像特征构造。这里的问题是什么样的特征有利于进行有意义的检索。

(2) Once image features were extracted, the question remained as to how they could be indexed and matched against each other for retrieval.

特征提取之后，就是如何为其建索引，以及彼此之间匹配以尽心检索。

Similarity measures were grouped as:

Feature-based matching;

Object silhouette based matching;

Structural feature matching;

Salient feature matching;

Matching at semantic level

c> Prominent systems in this area

IBMQBIC

VIRAGE

NEC AMORE

MIT Photobook

Columbia VisualSEEK and WebSeek

UCSB NeTra

Stanford WBIIS

二，Image Retrieval in the real-world

We devote this section to understanding image retrieval in the real-world and discuss user-expectations, system constraints and requirements, and research effort to make image retrieval a reality not-so-far in the future.

这一节主要是针对实用性的图像检索系统，考虑其用户期望、系统限制和要求，以及未来需要努力的研究方向。

For a real-world image retrieval system needs to make clear the following things:

1) From user’s perspective

a> Clarity of the user about what she wants 用户清楚自己需要什么

b> Where does the user want to search 用户清楚自己希望到哪去搜索

c> In what form does the user have her query. 用户以何种形式来进行查询

2) From system’s perspective

a> How does the user wish the results to be presented?

b> Where does the user desire to search

c> What is the nature of the user’s input/interaction?

2.1 User Intent

Augment search type based on user intent can be like this:

1) Browser

A user who don’t have clear end-goal

2) Surfer

A user surfing with a moderate clarity of an end-goal

3) Searcher

Very clear about what she is searching for in the system.

2.2 Data Scope

Along this dimension, we classify search data into the following categories:

1) Personal collection

2) Domain-specific collection

3) Enterprise collection

4) Achieves

5) Web

2.3 Query Modalities and Processing

From the user’s perspective

1) Keywords

2) Free-text

3) Image: User wishes to search for an image similar to a query image

4) Graphics: A hand-drawn or computer-generated picture or graphics could be presented as query

5) Composite: These are methods that involve using one or more of the above modalities for querying a system. This also covers interactive querying such as in relevance feedback system.

From the system’s perspective

1) Text-based: Text based query processing usually boils down to performing one or more simple keyword based searches and retrieving matching pictures.

2) Content-based: Lies at the heart of all CBIR systems, Processing of query involves extraction of visual features and/or segmentation and search in the visual feature space for similar images. An appropriate feature representation and a similarity measure to rank pictures, given a query, are essential here.

3) Composite: Composite processing may involve both content and text-based processing in varying proportions.

4) Interactive-simple

5) Interactive-composite

2.4 Visualization

Presentation of search results is perhaps one of the most important factors in the acceptance and popularity of an image retrieval system.

1) Relevance-ordered

Results are ordered by some numeric measure of relevance to the query.

2) Time-ordered

Pictures are shown in a chronological ordering rather than by relevance

3) Clustered

4）Hierarchical

Images are arranged in a tree order.

5) Composite

2.5 Real-world Image Retrieval Systems

1) Google image search and Yahoo! have been applied in use.

2) Domain based search engine, Riya.

3) CBIR technology is used in family album management.

4) Automatic linguistic indexing of pictures-real-time (ALIPR), an automatic image annotation system.

三, Image Retrieval Techniques: Addressing the core problem

We still do not yet have a universally acceptable algorithmic means of characterizing human vision, more specifically in the context of interpreting images.

By the nature of its task, the CBIR technology boils down to two intrinsic problems:

1) How to mathematically describe an image

Because the original representation of an image, which is an array of pixel values, corresponds poorly to our visual response, let alone semantic understanding of the image. We refer to the mathematical description of an image for retrieval purpose as its signature.

2) How to assess the similarity between a pair of images based on their abstracted descriptions.

3.1 Extraction of visual signature

Feature extraction used as the pre-process of subsequent image analysis tasks such as similarity estimation, concept detection, or annotation.

1) Image segmentation

To acquire a region-based signature, a key step is to segment images. Reliable segmentation is especially critical for characterizing shapes within images, without which the shape estimates are largely meaningless.

a> Main methods

(1) k-means clustering

(2) Normalized cuts criterion

(3) Methods used in medical image collection

b> Shortcoming

Main issues plague current techniques are:

(1) Computational complexity

(2) Reliability of good segmentation

(3) Acceptable segmentation quality assessment methods.

c> Reduce sensitivity to segmentation

(1) Involve every generated segment of an image in the matching process to obtain soft similarity measures.

(2)Characterize spatial arrangement of color and texture using block-based 2-D multi-resolution hidden Markov models.

(3)Use perceptual grouping principles to hierarchically extract image structures.

2) Major types of features

A feature is defined to capture a certain visual property of an image, either globally for the entire image, or locally for a small group of pixels.

Most commonly used features include those reflecting color, texture, shape, and salient points in an image.

In global extraction, features are computed to capture overall characteristics of an image. The advantage of global extraction is the high speed for both extracting features and computing similarity. And their disadvantage is too rigid to represent an image; they can be over sensitive to location and hence fail to identify important visual characteristics.

In local feature extraction, a set of features are computed for every pixel using its neighborhood.

a> Color features

b> Texture features

Texture features are intended to capture the granularity and repetitive patterns of surfaces within in a picture.

A popular way to form texture features is by using the coefficients of a certain transform on the original pixel values or more sophisticatedly, statistics computed from those coefficients.

c> Shape features

Shape is a key attribute of segmented image regions, and its efficient and robust representation plays an important role in retrieval.

In general, there is a trend shifts from global shape representation to local descriptors.

d> Features based on local invariants

Such as corner points or interest points.

3) Construction of signature from features

According to mathematical formulations, we summarize the types of signatures roughly into vectors and distributions.

Our discussion will focus on region-based signature and its mathematical connection with histograms because it is the most exploited type image signature.

We note that however, that distributions extracted from a collection of local feature vectors can be of other forms, for instance, a continuous density function, or even a spatial stochastic model.

4) Adaptive Image signature

We categorize image signatures according to their adaptivity into static, image-wise adaptive, and user-wise adaptive.

Image-wise adaptive signatures vary according to the classification of images.

3.2 Image similarity using visual signature

1) According to different key motivating factors, measures can be summarized as follows:

a> Agreement with semantics

b> Robustness to noise (invariant to perturbations)

c>Computational efficiency (ability to work real-time and in large-scale)

d>Invariance to background (allowing region-based querying)

e>Local linearity

2) Various techniques can be grouped according to their design philosophies, as follows:

a>Treat features as vectors, non-vector representation, or ensembles

b>Using region-based similarity, global similarity, or a combination of both

c>Computing similarities over linear space or non-linear manifold

d>Role played by image segments in similarity computation

e>stochastic, fuzzy, or deterministic similarity measures

f>use of supervised, semi-supervised, or unsupervised learning

We will start discussion on the region-based signature since its widespread use occurred in the current decade. The technical emphasis on region-based signature is the definition of distance between single vectors.

Many different metrics from different perspective have been developed.

3.3 Clustering and classification

Clustering methods fall roughly into three types:

1> pair-wise distance based

2>optimization of an overall clustering quality measure

3>statistical modeling

Image categorization is advantageous when the image database is well-specified. Classification method can be divided into two major branches:

1>Discriminative modeling

2>Generative modeling

Discriminative modeling approaches are more direct at optimizing classification boundaries. The generative modeling approaches are easier to incorporate prior knowledge and can be used more conveniently when there are many classes.

Clustering is a hard problem with two unknowns, the number of clusters, and the clusters themselves. Classification is more systematic, the availability of comprehensive training data is often scarce.

3.4 Relevance feedback based search paradigms

Relevance feedback is a query modification technique which attempts to capture the user’s precise needs through iterative feedback and query refinement. We group them here based on the nature of the advancements made, resulting in sets of techniques that have pushed the frontier in a common domain, which include:

1>Learning-based advancements

2>Feedback specification novelties

3>User-driven methods

4>Probabilistic methods

5>Region-based methods

6>Other advancements

3.5 Multimodal fusion and retrieval

Media relevant to the broad area of multimedia retrieval and annotation includes, but is not limited to, images, text, free-text, graphics, video, and any conceivable combination of them.

四, CBIR offshoots: Problems and applications of the new age

1) Words and pictures

1>Joint Word-picture modeling approach

2>Supervised Categorization Approach

2) Stories and pictures

3) Aesthetics and Pictures (美学与图片)

4) Art, Culture and Pictures

5) Web and Pictures

6) Security and Pictures

7) Epilogue

五, Evaluation Strategies

六, Discussion and Conclusions

转载本文请联系原作者获取授权，同时请注明本文来自王方圆科学网博客。
链接地址：https://blog.sciencenet.cn/blog-613779-476656.html

下一篇：视觉词袋模型（Bag of Visual Word）

收藏 IP: 159.226.20.*| 热度|

当前推荐数：2 推荐人：陈绥阳 金小伟

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

博文发布时间已经超过87600小时，评论已关闭。

王方圆

扫一扫，分享此博文

Alooker的个人博客分享 http://blog.sciencenet.cn/u/Alooker

博文

图像检索

当前推荐数：2 推荐人：陈绥阳 金小伟

该博文允许注册用户评论请点击登录评论 (0 个评论)

王方圆

全部精选博文导读

相关博文

Alooker的个人博客分享 http://blog.sciencenet.cn/u/Alooker

博文

图像检索

当前推荐数：2 推荐人： 陈绥阳 金小伟

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

王方圆

全部精选博文导读

相关博文

当前推荐数：2 推荐人：陈绥阳金小伟

该博文允许注册用户评论请点击登录评论 (0 个评论)