博文

postgresql中plantree内存结构

已有 7426 次阅读 2010-5-2 03:01 |个人分类:postgresql|系统分类:科研笔记| postgresql, plantree

上次写完了 postgresql中querytree内存结构，接下来就是又让你激动，又让你觉得神秘的plantree了。

先看看src/backend/optimizer的readme：
Optimizer
=========
These directories take the Query structure returned by the parser, and generate a plan used by the executor.

The /plan directory generates the actual output plan,
the /path code generates all possible ways to join the tables,
and /prep handles various preprocessing steps for special cases.
/util is utility stuff.
/geqo is the separate "genetic optimization" planner--- it does a semi-random search through the join tree space, rather than exhaustively considering all possible join trees.

哈，所以可以先看看相应的目录下的文件，有个大概印象。
在看看readme下面的：
Optimizer Data Structures
-------------------------

PlannerGlobal - global information for a single planner invocation
PlannerInfo - information for planning a particular Query (we make a separate PlannerInfo node for each sub-Query)
RelOptInfo - a relation or joined relations
RestrictInfo - WHERE clauses, like "x = 3" or "y = z"
(note the same structure is used for restriction and
join clauses)
Path - every way to generate a RelOptInfo(sequential,index,joins)
SeqScan - a plain Path node with pathtype = T_SeqScan
IndexPath - index scans
BitmapHeapPath - top of a bitmapped index scan
TidPath - scan by CTID
AppendPath - append multiple subpaths together
ResultPath - a Result plan node (used for FROM-less SELECT)
MaterialPath - a Material plan node
UniquePath - remove duplicate rows
NestPath - nested-loop joins
MergePath - merge joins
HashPath - hash joins

EquivalenceClass - a data structure representing a set of values known equal
PathKey - a data structure representing the sort ordering of a path

上面把optimizer用到的数据结构都说明了用途。

我们再看看8.4 的document：
在43.5.1. Generating Possible Plans有：
The planner/optimizer starts by generating plans for scanning each individual relation (table) used in the query. The possible plans are determined by the available indexes on each relation. There is always the possibility of performing a sequential scan on a relation, so a sequential scan plan is always created.
看样子 sequential scan plan是必须的了，由于我们pois表没建索引，所以我们可以想象，我们的plan是没有index相关的path的。等有时间再写一篇涉及index和整个优化器有关的文章。

好，有了这些基本概念，我们开工。
再次写一下我们的前提条件：
假定我们数据库中已经有如下表，并填充了数据：
CREATE TABLE pois
(
uid integer not null,
catcode VARCHAR(32) not null,
);
现在我们用psql发送请求：select catcode from pois;
程序已经执行完了querytree_list = pg_analyze_and_rewrite(parsetree, query_string, NULL, 0);
接下来就是执行我们神秘的 stmt = (Node *) pg_plan_query(query, cursorOptions, boundParams);了。

首先看调用堆栈：
postgresql Thread [C/C++ Attach to Application]
gdb Debugger (5/1/10 10:16 AM) (Suspended)
Thread [1] (Suspended)
17 set_plain_rel_pathlist() /home/postgres/develop/.../src/backend/optimizer/path/allpaths.c:264
16 set_rel_pathlist() /home/postgres/develop/.../src/backend/optimizer/path/allpaths.c:201
15 set_base_rel_pathlists() /home/postgres/develop/.../src/backend/optimizer/path/allpaths.c:157
14 make_one_rel() /home/postgres/develop/.../src/backend/optimizer/path/allpaths.c:93
13 query_planner() /home/postgres/develop/.../src/backend/optimizer/plan/planmain.c:252
12 grouping_planner() /home/postgres/develop/.../src/backend/optimizer/plan/planner.c:1006
11 subquery_planner() /home/postgres/develop/.../src/backend/optimizer/plan/planner.c:481
10 standard_planner() /home/postgres/develop/.../src/backend/optimizer/plan/planner.c:190
9 pg_plan_query() /home/postgres/develop/.../src/backend/tcop/postgres.c:697
8 pg_plan_queries() /home/postgres/develop/.../src/backend/tcop/postgres.c:756
7 exec_simple_query() /home/postgres/develop/.../src/backend/tcop/postgres.c:921
6 PostgresMain() /home/postgres/develop/.../src/backend/tcop/postgres.c:3617
5 BackendRun() /home/postgres/develop/.../src/backend/postmaster/postmaster.c:3449
4 BackendStartup() /home/postgres/develop/.../src/backend/postmaster/postmaster.c:3063
3 ServerLoop() /home/postgres/develop/.../src/backend/postmaster/postmaster.c:1387
2 PostmasterMain() /home/postgres/develop/.../src/backend/postmaster/postmaster.c:1040
1 main() /home/postgres/develop/.../src/backend/main/main.c:188

我之所以把堆栈写到 set_plain_rel_pathlist()来，是因为这是建立我们最终顺序扫描path的调用函数。具体调用代码为：
static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte){
if (rel->reloptkind == RELOPT_BASEREL && relation_excluded_by_constraints(root, rel, rte)) {
set_dummy_rel_pathlist(rel);
return;
}
check_partial_indexes(root, rel);
set_baserel_size_estimates(root, rel);
if (create_or_index_quals(root, rel)) {
check_partial_indexes(root, rel);
set_baserel_size_estimates(root, rel);
}
add_path(rel, create_seqscan_path(root, rel));
create_index_paths(root, rel);
create_tidscan_paths(root, rel);
set_cheapest(rel);
}
add_path(rel, create_seqscan_path(root, rel));即为实际增加我们的path的调用，本身该调用非常简单，就是估算从硬盘读page和时间+处理所有tuple的时间，然后填充path。

我把程序执行完下面这条语句之前的数据结构画张图，如下：
line1064： result_plan = optimize_minmax_aggregates(root,
tlist,
best_path);

下图是最终的内存结构图，为完整起见，我把PlannerInfo及相应的数据结构都保留下来：

图太大，picasa不让传超过1M的图片，只好上传到filefront.com/，看大图的可以从这里下载。
下图是只包含plan tree的图：

至此，全部结束。

加我私人微信，交流技术。

转载本文请联系原作者获取授权，同时请注明本文来自孙鹏科学网博客。
链接地址：https://blog.sciencenet.cn/blog-419883-317833.html

上一篇：NBA 2010 APR 21 PLAYOFF MAV VS SPUR 2ND
下一篇：Postgresql Executor 阶段内存结构

收藏 IP: 223.72.72.*| 热度|

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

博文发布时间已经超过87600小时，评论已关闭。

孙鹏

扫一扫，分享此博文

hillpig的个人博客分享 http://blog.sciencenet.cn/u/hillpig 畅想ing,思考ing,前行ing Email:bluevaley@gmail.com

博文

postgresql中plantree内存结构

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

孙鹏

全部作者的其他最新博文

全部精选博文导读

相关博文

hillpig的个人博客分享 http://blog.sciencenet.cn/u/hillpig 畅想ing,思考ing,前行ing Email:bluevaley@gmail.com

博文

postgresql中plantree内存结构

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

孙鹏

全部作者的其他最新博文

全部精选博文导读

相关博文

该博文允许注册用户评论请点击登录评论 (0 个评论)