博文

PG9.1+pgpool-II3.1--之Parallel Query

已有 8878 次阅读 2012-3-2 10:54 |个人分类:postgresql|系统分类:科研笔记| query

本教程是PostgreSQL Cluster系列教程的一部分，该系列包括：

PostgreSQL9.1 PITR示例（该教程主要阐述DBA如何基于WAL日志做备份恢复）
PostgreSQL9.1 Warm-Standby ---之基于拷贝WAL文件的方法 (file-based log shipping)
PostgreSQL9.1 Warm-Standby ---之基于流复制的方法 (streaming replication)
PostgreSQL9.1 Warm-Standby ---之基于同步复制的方法 (Synchronous Replication)
PostgreSQL9.1 Hot-Standby ---之基于拷贝WAL文件的方法
PostgreSQL9.1 Hot-Standby ---之基于流复制的方法
PostgreSQL9.1 Hot-Standby ---之基于同步复制的方法
PG9.1+pgpool-II3.1--之HA (Hot-Standby+Streaming Replication)
PG9.1+pgpool-II3.1--之Load Balancing (when meeting large amounts of requests)
PG9.1+pgpool-II3.1--之Parallel Query (when meeting large amounts of data)
PostgreSQL9.1 HA --- 之Slony

为避免后面出现的一些混乱，我们先澄清一些概念。
第一个问题，当我们想配置pgpool-II的parallel mode时，我们是否可以开启replication/load balancing/master-slave模式呢？先不忙着回答，我们先看看pgpool-II的manual和tutorial。根据Parallel Mode节：
This mode activates parallel execution of queries. Tables can be split, and data distributed to each node. Moreover, the replication and the load balancingfeatures can be used at the same time. In parallel mode, replication_mode and load_balance_mode are set to true in pgpool.conf, master_slave is set to false, and parallel_mode is set to true.
我们知道，我们可以同时设定：parallel_mode, replication_mode, load_balance_mode为true，而 master_slave只能为false。
再说说replication_mode，该模式是指一个SQL语句例如create table...语句同时由pgpool发给多个后面的数据库节点。那具体到一个表，是否可以同时设定为partitioned, replicated呢？根据3. Your First Parallel Query：
Data within the different range is stored in two or more data base nodes in a parallel Query. This is called a partitioning. Moreover you could replicate some of tables among database nodes evenin the parallel query mode.
我们知道在parallel mode下是可以replicate某些表的（注意，没说是partioned表），我们又根据3.1. Configuring Parallel Query：
Attention: The replication is not done for the table that does the partitioningthough a parallel Query and the replication can be made effective at the same time.Attention: You can have both partitioned tables and replicatedtables. However a table cannot be a partioned and replicated one atthe same time. Because the data structure of partioned tables andreplicated tables are different, "bench_replication" database createdin section "2.
知道一个表不能同时是partioned又是replicated，因为二者的数据结构不同。那接下来我们又提出第2个问题，当我们想使用
createdb mydb --port=9999;
psql -p 9999 mydb;
mydb=# create table foo();
创建一个将来可以partioned表时，该DDL语句在所有的数据库节点上将执行吗(因为我们不想一个个单独在每个后端的数据库上建该表吧)？我们来做个简单的实验来看看，假定我们已经设置了parallel_mode, replication_mode, load_balance_mode为true， master_slave为false，当然还有一些别的设置：然后执行：
psql -p 9999 mydb;
mydb=# create table foo(a int);
在每个节点（我们共有2个后端的数据库节点）上的结果是什么呢？结果发现pgpool将会在每个数据库节点上创建该表（表名相同，此处不再列出结果）。

根据上面的分析，我们可以设计这次做实验的数据库中的表为：

create table branches(bid bigint PRIMARY KEY); #银行分支
create table customers(cid bigint PRIMARY KEY,bid bigint REFERENCES branches(bid)); #银行储户
insert into branches select * from generate_series(0,10);
insert into customers select a.*,(random()*(10^1))::integer from generate_series(1,1000000) as a ;

分布/复制规则是：

表branches是replicated，即需要在pgpool_catalog.replicate_def表中定义复制规则，这是因为其不大，可以在各个数据库节点上复制一份，所花存储和计算代价都不大
表customers是分布式的（为避免混淆，我们此处不使用分区partioned的概念，而使用distributed），即需要在系统SystemDB的pgpool_catalog.dist_def表中定义分布规则，例如我们可以让cid为偶数的存储在节点数据库0中，奇数的存在节点数据库1中。

用过greenplum的分区或者用PostgreSQL通过inherites分区的朋友肯定会有想法，例如我们继续对customers通过继承的方式进行分区：
create table customers(cid bigint PRIMARY KEY,create_ts timestamp,bid bigint REFERENCES branches(bid)); #假定我们多了create_ts字段
create table customers_20120301() INHERITS(customers);
create table customers_20120302() INHERITS(customers);
CREATE OR REPLACE FUNCTION customers_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.create_ts >= DATE '2012-03-01' AND
NEW.create_ts < DATE '2012-03-02' ) THEN
INSERT INTO customers_20120301 VALUES (NEW.*);
ELSIF ( NEW.create_ts >= DATE '2012-03-02' AND
NEW.create_ts < DATE '2012-03-03' ) THEN
INSERT INTO customers_20120302 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER insert_customers_trigger
BEFORE INSERT ON customers
FOR EACH ROW EXECUTE PROCEDURE customers_insert_trigger();
那么请问customers_20120301怎么个分布方式？是不是和branches和customers一样，还得要么在pgpool_catalog.replicate_def 要么在pgpool_catalog.dist_def系统表中定义分布规则?如果我继续在customers_20120301上建索引的话，索引又分布在哪里？请朋友们看完了本文，熟悉了pgpool-II回去自己做实验看看，之后咱们再切磋切磋。
好， customers_20120301,customers_20120302的事情我们不管，我们还是把精力放在简单一点的branches 和customers上，如何在pgpool_catalog.replicate_def 和pgpool_catalog.dist_def定义规则待会下文会讲，我们开始上张图以说明我们的系统架构：

首先说我们的环境，建议你都先创建好这些目录：

/home/postgres/db/master/pgsql 目录是master数据库的目录，端口为6432
/home/postgres/db/standby/pgsql 目录是一台standy数据库的目录，端口为7432
/usr/local/pgsql 目录是pgpool-II使用的SystemDB数据库的目录，端口为5432
/home/postgres/var/run/pgpool 是pgpool-II运行时放pid的目录
假定pgpool-II-3.1.2.tar.gz,pgpoolAdmin-3.1.1.tar.gz放在/home/postgres/develop下
本系列教程中使用的都是RHEL 6

在动手配置之前，我们得说说各种用户角色。

在5432端口的PostgreSQL上使用SystemDB时需要先创建pgpool账户，并使用pgpool数据库。
由于我们要装pgpoolAdmin3.1.1，其必定要与pgpool通讯，这时通讯的用户名密码存储在pgpool的配置文件/usr/local/etc/pcp.conf里，密码用md5生成（具体怎么生成参考：1.3. Configuring PCP Commands），这里我们的用户名密码都为postgres，示例如下：
# USERID:MD5PASSWD
postgres:e8a48653851e28c69d0506508fb27fc5
实际上pgpoolAdmin运行在Tomcat下，而Tomcat运行时用户是apache，pgpoolAdmin还需要使用apache用户来运行/usr/local/bin/pcp_*等等命令，这时需要给这些命令赋予apache用户相应的权限。
pgpool还使用apache用户来检测各后端数据库的状态，所以还需要在各后端数据库中创建该用户

好，下面我们来一步步安装配置。
1. 安装pgpool-II3.1.2和pgpoolAdmin3.1.1 （此处和PG9.1+pgpool-II3.1--之HA (Hot-Standby+Streaming Replication)中的这一步一样，那个时候就提出安装时要对现在的parallel mode进行支持，哈，这里就省去好多事情了。其实相比于不使用parallel mode的模式，多的内容也不多）
2.千辛万苦我们把pgpool和pgpoolAdmin安装上了，接下来我们就得把Master和Standby的数据库安装上，这就很简单了，Master（此处和PG9.1+pgpool-II3.1--之HA (Hot-Standby+Streaming Replication)中的这一步一样）
3.然后进入激动人心的Parallel Mode的配置了
首先配置7432端口的PostgreSQL,6432的PostgreSQL，5432端口的PostgreSQL的postgresql.conf文件：
log_statement = 'all' #方便显示服务端正在进行的语句
然后依次启动7432端口的PostgreSQL,6432的PostgreSQL，5432供SystemDB使用的PostgreSQL，pgpool-II，Toomcat（以使用pgpoolAdmin，可以不用启动），即：
/home/postgres/db/master/pgsql/bin/postmaster -D /home/postgres/db/master/pgsql/data
/home/postgres/db/standby/pgsql/bin/postmaster -D /home/postgres/db/standby/pgsql/data
/usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
pgpool -n &
sh /home/postgres/website/apache-tomcat-7.0.16/bin/startup.sh
若出现认证错误，请分别修改7432和6432的pg_hba.conf,例如可能pgpool通过ip连接各后台数据库不能够认证，则在pg_hba.conf中增加一行：
# IPv4 local connections:
host all all all trust
然后连接pgpool数据库入口，创建相应的表：
psql -p 9999 mydb;
create table branches(bid bigint PRIMARY KEY); #银行分支
create table customers(cid bigint PRIMARY KEY,bid bigint REFERENCES branches(bid)); #银行储户
然后在SystemDB中创建相应的规则：
psql -p 5432 -U pgpool pgpool;
INSERT INTO pgpool_catalog.dist_def VALUES (
'mydb',
'public',
'customers',
'cid',
ARRAY['cid', 'bid'],
ARRAY['bigint', 'bigint'],
'pgpool_catalog.dist_def_customers'
);
CREATE OR REPLACE FUNCTION pgpool_catalog.dist_def_customers(anyelement)
RETURNS integer AS $$
SELECT CASE WHEN $1%2 = 0 THEN 0
WHEN $1%2 = 1 THEN 1
ELSE 0
END;
$$ LANGUAGE sql;
INSERT INTO pgpool_catalog.replicate_def VALUES (
'mydb',
'public',
'branches',
ARRAY['bid'],
ARRAY['bigint']
);
设置pgpool的parallel mode：
listen_addresses = '*'

backend_hostname0 = 'localhost'
backend_port0 = 6432
backend_weight0 = 1
backend_data_directory0 = '/home/postgres/db/master/pgsql/data'
backend_flag0= 'ALLOW_TO_FAILOVER'
backend_hostname1 = 'localhost'
backend_port1 = 7432
backend_weight1 = 1
backend_data_directory1 = '/home/postgres/db/standby/pgsql/data'
backend_flag1= 'ALLOW_TO_FAILOVER'

pid_file_name = '/home/postgres/pgpool/pgpool.pid'
logdir = '/home/postgres/pgpool'

sr_check_user = 'apache'
sr_check_password = ''

replication_mode = on

load_balance_mode = on
ignore_leading_white_space = on
white_function_list = ''
black_function_list = 'currval,lastval,nextval,setval'

parallel_mode = on

4.检验
插入数据：
psql -p 9999 mydb;
insert into branches select * from generate_series(0,10);
insert into customers values(1,1);#看看7432服务端是不是只在这上面插入数据？
insert into customers values(2,1); #看看6432服务端是不是只在这上面插入数据？
那么我们若使用：
insert into customers select a.*,(random()*(10^1))::integer from generate_series(1,1000000) as a ;
ERROR: pgpool2 sql restriction
DETAIL: cannot use SelectStmt in InsertStmt
哇，报错了，这就是pgpool-II的限制，所以在结束本文之前，特别想提醒各位朋友的是pgpool-II是有一些限制的，详见Restrictions，所以你真正想在生产环境中使用pgpool-II除了要对配置熟悉外，还一定要对这些限制熟悉。
至此结束。

加我私人微信，交流技术。

转载本文请联系原作者获取授权，同时请注明本文来自孙鹏科学网博客。
链接地址：https://blog.sciencenet.cn/blog-419883-543170.html

上一篇：PG9.1+pgpool-II3.1--之Load Balancing
下一篇：RHEL 6.3 U盘启动

收藏 IP: 223.72.72.*| 热度|

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

博文发布时间已经超过87600小时，评论已关闭。

孙鹏

扫一扫，分享此博文

hillpig的个人博客分享 http://blog.sciencenet.cn/u/hillpig 畅想ing,思考ing,前行ing Email:bluevaley@gmail.com

博文

PG9.1+pgpool-II3.1--之Parallel Query

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

孙鹏

全部作者的其他最新博文

全部精选博文导读

相关博文

hillpig的个人博客分享 http://blog.sciencenet.cn/u/hillpig 畅想ing,思考ing,前行ing Email:bluevaley@gmail.com

博文

PG9.1+pgpool-II3.1--之Parallel Query

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

孙鹏

全部作者的其他最新博文

全部精选博文导读

相关博文

该博文允许注册用户评论请点击登录评论 (0 个评论)