PostgreSQL fdw详解

您所在的位置:网站首页 foreigner是什么 PostgreSQL fdw详解

PostgreSQL fdw详解

#PostgreSQL fdw详解| 来源: 网络整理| 查看: 265

postgres fdw是一种外部访问接口,它可以被用来访问存储在外部的数据,这些数据可以是外部的pg数据库,也可以oracle、mysql等数据库,甚至可以是文件。

目前支持的fdw外部数据源: https://wiki.postgresql.org/wiki/Foreign_data_wrappers

FDW一般用于哪些场景呢?例如: sharding,例如pg_sharedman插件,就是使用postgres_fdw和pg_pathman的插件来实现数据的分片。 同步数据、etl、数据迁移等等。

postgres_fdw使用举例: 这里创建2个数据库db01,db02,2个用户user01,user02分别用来作为本地和远端的数据库和用户。

bill=# create user user01 superuser password 'bill'; CREATE ROLE bill=# create database db01 owner=user01 TEMPLATE=template0 LC_CTYPE='zh_CN.UTF-8'; CREATE DATABASE bill=# create user user02 superuser password 'bill'; CREATE ROLE bill=# create database db02 with owner=user02 TEMPLATE=template0 LC_CTYPE='zh_CN.UTF-8'; CREATE DATABASE

接下来在远端的db02下面创建表:

bill=# \c db02 user02 You are now connected to database "db02" as user "user02". db02=# create table table1 (id int, crt_Time timestamp, info text, c1 int); CREATE TABLE db02=# create table table2 (id int, crt_Time timestamp, info text, c1 int); CREATE TABLE db02=# insert into table1 select generate_series(1,1000000), clock_timestamp(), md5(random()::text), random()*1000; INSERT 0 1000000 db02=# insert into table2 select generate_series(1,1000000), clock_timestamp(), md5(random()::text), random()*1000; INSERT 0 1000000

需要注意,想要使用fdw访问数据需要先确保:网络通,数据库防火墙(pg_hba,conf)正常,远端数据库的用户必须有表的相关权限。

然后在本地db01创建server:

db01=# CREATE SERVER db02 db01-# FOREIGN DATA WRAPPER postgres_fdw db01-# OPTIONS (host '192.168.7.xxx', port '1921', dbname 'db02'); CREATE SERVER db01=# select * from pg_foreign_server ; oid | srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions -------+---------+----------+--------+---------+------------+--------+-------------------------------------------- 50361 | db02 | 50345 | 50350 | | | | {host=192.168.7.xxx,port=1921,dbname=db02} (1 row)

配置user mapping:

db01=# CREATE USER MAPPING FOR user01 db01-# SERVER db02 db01-# OPTIONS (user 'user02', password 'bill'); CREATE USER MAPPING

然后就可以创建forein table了: –方法一:批量导入,这种比较常见,可以一次导入一个模式下的所有表

db01=# import foreign schema public from server db02 into sch1; IMPORT FOREIGN SCHEMA db01=# \det sch1.* List of foreign tables Schema | Table | Server --------+--------+-------- sch1 | table1 | db02 sch1 | table2 | db02 (2 rows)

–方法二:单个创建

db01=# CREATE FOREIGN TABLE sch1.table1 ( db01(# id int, crt_Time timestamp, info text, c1 int) db01-# SERVER db02 db01-# OPTIONS (schema_name 'public', table_name 'table1'); CREATE FOREIGN TABLE

查询:

db01=# select count(*) from sch1.table1; count --------- 1000000 (1 row) db01=# select count(*) from sch1.table2; count --------- 1000000 (1 row)

我们可以explain verbose来查看sql在远端怎么执行的:

db01=# explain verbose select count(*) from sch1.table1; QUERY PLAN ---------------------------------------------------- Foreign Scan (cost=108.53..152.69 rows=1 width=8) Output: (count(*)) Relations: Aggregate on (sch1.table1) Remote SQL: SELECT count(*) FROM public.table1 (4 rows)

pushdown: 我们在本地执行的语句并不是所有的都能pushdown到远端执行,目前只支持: 内置数据类型、immutable操作符、immutable函数。 这也比较好理解,因为pg允许用户自己定义数据类型、操作符这些,如果我们在本地定义了一种新的操作符但是在远端却没有,自然无法push到远端。另外如果本地和远端都创建了同样的extension,那么这个extension自带的操作符和函数是可以pushdown的。

–projection

db01=# explain verbose select id from sch1.table2; QUERY PLAN ---------------------------------------------------------------------- Foreign Scan on sch1.table2 (cost=100.00..197.75 rows=2925 width=4) Output: id Remote SQL: SELECT id FROM public.table2 (3 rows)

–where

db01=# explain verbose select * from sch1.table1 where id=1; QUERY PLAN --------------------------------------------------------------------------------- Foreign Scan on sch1.table1 (cost=100.00..124.33 rows=6 width=48) Output: id, crt_time, info, c1 Remote SQL: SELECT id, crt_time, info, c1 FROM public.table1 WHERE ((id = 1)) (3 rows)

–agg

db01=# explain verbose select count(*) from sch1.table1; QUERY PLAN ---------------------------------------------------- Foreign Scan (cost=108.53..152.69 rows=1 width=8) Output: (count(*)) Relations: Aggregate on (sch1.table1) Remote SQL: SELECT count(*) FROM public.table1 (4 rows)

–join

db01=# explain verbose select t1.* from sch1.table1 t1 inner join sch1.table2 using (id) limit 2; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Foreign Scan (cost=100.00..100.99 rows=2 width=48) Output: t1.id, t1.crt_time, t1.info, t1.c1 Relations: (sch1.table1 t1) INNER JOIN (sch1.table2) Remote SQL: SELECT r1.id, r1.crt_time, r1.info, r1.c1 FROM (public.table1 r1 INNER JOIN public.table2 r2 ON (((r1.id = r2.id)))) LIMIT 2::bigint (4 rows)

–limit pg12之前limit操作不在远端执行。

db01=# explain verbose select * from sch1.table2 limit 10; QUERY PLAN --------------------------------------------------------------------------------- Foreign Scan on sch1.table2 (cost=100.00..100.37 rows=10 width=48) Output: id, crt_time, info, c1 Remote SQL: SELECT id, crt_time, info, c1 FROM public.table2 LIMIT 10::bigint (3 rows)

–sort

db01=# explain verbose select * from sch1.table2 order by id desc limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Foreign Scan on sch1.table2 (cost=100.00..100.40 rows=10 width=48) Output: id, crt_time, info, c1 Remote SQL: SELECT id, crt_time, info, c1 FROM public.table2 ORDER BY id DESC NULLS FIRST LIMIT 10::bigint (3 rows)

控制参数: 例如前面提到的可以在本地和远端都创建了同样的extension,那么这个extension自带的操作符和函数是可以pushdown的,我们需要通过控制参数extensions声明。

db01=# alter server db02 options (add extensions 'dblink'); ALTER SERVER db01=# alter server db02 options (set extensions 'dblink'); ALTER SERVER db01=# alter server db02 options (drop extensions ); ALTER SERVER

pull 有些时候需要将远端的数据pull到本地来进行操作。

db01=# create table t as select * from sch1.table1; SELECT 1000000 db01=# explain verbose select count(*) from t join sch1.table1 t1 on (t.id=t1.id and t1.c1=1); QUERY PLAN ----------------------------------------------------------------------------------------- Aggregate (cost=28038.83..28038.83 rows=1 width=8) Output: count(*) -> Hash Join (cost=147.05..27831.98 rows=82738 width=0) Hash Cond: (t.id = t1.id) -> Seq Scan on public.t (cost=0.00..21341.70 rows=1103170 width=4) Output: t.id, t.crt_time, t.info, t.c1 -> Hash (cost=146.86..146.86 rows=15 width=4) Output: t1.id -> Foreign Scan on sch1.table1 t1 (cost=100.00..146.86 rows=15 width=4) Output: t1.id Remote SQL: SELECT id FROM public.table1 WHERE ((c1 = 1)) (11 rows)

DML

db01=# explain verbose update sch1.table1 set crt_time=now() where id=1; QUERY PLAN ---------------------------------------------------------------------------------------------- Update on sch1.table1 (cost=100.00..124.78 rows=6 width=54) Remote SQL: UPDATE public.table1 SET crt_time = $2 WHERE ctid = $1 -> Foreign Scan on sch1.table1 (cost=100.00..124.78 rows=6 width=54) Output: id, now(), info, c1, ctid Remote SQL: SELECT id, info, c1, ctid FROM public.table1 WHERE ((id = 1)) FOR UPDATE (5 rows)

连接池 因为pg是多进程的结构,对于每一个连接都要fork一个新的进程,那么通过fdw访问远端数据是不是会建立很多个进程呢?使用postgres_fdw在一个session中如果使用的是同样的foreign server那么是可以重用同一个进程的。

并行 postgres_fdw是允许并行操作的,比如对大表进行操作是可以用到并行的。 https://www.postgresql.org/docs/12/fdwhandler.html

事务 使用fdw访问数据是怎么保证数据的一致性呢?在这一方面并没有使用两阶段提交的方式,而是规定: 1、如果本地事务的隔离级别是SERIALIZABLE,那么远端事务的隔离级别也是SERIALIZABLE; 2、如果本地事务的隔离级别是其它的,那么远端事务的隔离级别都是REPEATABLE READ。



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3