实际并行workers数量不等于postgresql.conf中设置的max_parallel_workers_per_gather数量

news/2024/7/9 19:58:31 标签: PostgreSQL

1 前言

  • 本文件的源码来自PostgreSQL 14.5,其它版本略有不同

PostgreSQL的并行workers是由compute_parallel_worker函数决定的,compute_parallel_worker是估算扫描所需的并行工作线程数,并不是您在postgresql.conf中设置的max_parallel_workers_per_gather数量,compute_parallel_worker会根据heap_pages、index_pages、max_workers(max_parallel_workers_per_gather)来决定并行工作线程数量。

2 源码和调用位置

compute_parallel_worker共有4个地方调用

src\backend\optimizer\path\allpaths.c(801,21)
src\backend\optimizer\path\allpaths.c(3724,21)
src\backend\optimizer\path\costsize.c(707,33)
src\backend\optimizer\plan\planner.c(5953,21)

compute_parallel_worker的声明

src\include\optimizer\paths.h(59,12)

compute_parallel_worker的实现

src\backend\optimizer\path\allpaths.c(3750,1)

compute_parallel_worker的源码

/*
 * Compute the number of parallel workers that should be used to scan a
 * relation.  We compute the parallel workers based on the size of the heap to
 * be scanned and the size of the index to be scanned, then choose a minimum
 * of those.
 *
 * "heap_pages" is the number of pages from the table that we expect to scan, or
 * -1 if we don't expect to scan any.
 *
 * "index_pages" is the number of pages from the index that we expect to scan, or
 * -1 if we don't expect to scan any.
 *
 * "max_workers" is caller's limit on the number of workers.  This typically
 * comes from a GUC.
 * "max_workers"就是postgresql.conf中max_parallel_workers_per_gather的值
 */
int
compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
						int max_workers)
{
	int			parallel_workers = 0;

	/*
	 * If the user has set the parallel_workers reloption, use that; otherwise
	 * select a default number of workers.
     * 不需要优化,直接来自表级存储参数parallel_workers
     * 详见第3节直接使用postgresql.conf中设置的max_parallel_workers_per_gather数量
	 */
	if (rel->rel_parallel_workers != -1)
		parallel_workers = rel->rel_parallel_workers;
	else
	{
		/*
		 * If the number of pages being scanned is insufficient to justify a
		 * parallel scan, just return zero ... unless it's an inheritance
		 * child. In that case, we want to generate a parallel path here
		 * anyway.  It might not be worthwhile just for this relation, but
		 * when combined with all of its inheritance siblings it may well pay
		 * off.
		 */
		if (rel->reloptkind == RELOPT_BASEREL &&
			((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
			 (index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
			return 0;

		if (heap_pages >= 0)
		{
			int			heap_parallel_threshold;
			int			heap_parallel_workers = 1;

			/*
			 * Select the number of workers based on the log of the size of
			 * the relation.  This probably needs to be a good deal more
			 * sophisticated, but we need something here for now.  Note that
			 * the upper limit of the min_parallel_table_scan_size GUC is
			 * chosen to prevent overflow here.
			 */
			heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
			while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
			{
				heap_parallel_workers++;
				heap_parallel_threshold *= 3;
				if (heap_parallel_threshold > INT_MAX / 3)
					break;		/* avoid overflow */
			}

			parallel_workers = heap_parallel_workers;
		}

		if (index_pages >= 0)
		{
			int			index_parallel_workers = 1;
			int			index_parallel_threshold;

			/* same calculation as for heap_pages above */
			index_parallel_threshold = Max(min_parallel_index_scan_size, 1);
			while (index_pages >= (BlockNumber) (index_parallel_threshold * 3))
			{
				index_parallel_workers++;
				index_parallel_threshold *= 3;
				if (index_parallel_threshold > INT_MAX / 3)
					break;		/* avoid overflow */
			}

			if (parallel_workers > 0)
				parallel_workers = Min(parallel_workers, index_parallel_workers);
			else
				parallel_workers = index_parallel_workers;
		}
	}

	/* In no case use more than caller supplied maximum number of workers */
	parallel_workers = Min(parallel_workers, max_workers);

	return parallel_workers;
}

3 直接使用postgresql.conf中设置的max_parallel_workers_per_gather数量

如果要使用指定数量的并行工作线程数,必须使用表级存储参数parallel_workers。

alter table tab set (parallel_workers=8);

注意:如果不设置表级存储参数parallel_workers,实际的并行工作线程数由compute_parallel_worker根据会根据heap_pages、index_pages、max_workers来决定并行工作线程数量。
因此会出现实际并行工作数量不等于postgresql.conf中设置的max_parallel_workers_per_gather数量的情况。


http://www.niftyadmin.cn/n/4986219.html

相关文章

六、DataGrip的基础使用

创建新数据库 1、点击MySQL图标&#xff0c;右键点击新建&#xff0c;然后选择框架(数据库) 2、输入数据库名称&#xff1a; 此处schema代表框架&#xff0c;和database(数据库)是同一性质的东西。 创建新的表 1、右键点击数据库&#xff0c;点击新建&#xff0c;再点击表 2…

抖音视频删了怎么在电脑上找回来

【昨天整理电脑文件时&#xff0c;不小心将剪辑好的抖音作品误删了&#xff0c;但是回收站中找不回来了&#xff0c;这些视频是我花了很多心血制作的&#xff0c;如果没了真的十分可惜&#xff01;希望大家能帮帮我&#xff0c;告诉我应该如何恢复这些文件。】 现在人们都喜欢…

Ansible自动化运维工具(二)

目录 &#xff08;6&#xff09;copy模块 &#xff08;7&#xff09;file模块 ​编辑​编辑&#xff08;8&#xff09;hostname模块 &#xff08;9&#xff09;ping模块 &#xff08;10&#xff09;yum 模块 &#xff08;11&#xff09;service/system模块 ​编辑 ​…

YII项目在Docker中运行缓慢

缓慢问题分析 请求YII的api时间请求原生查询时间win10 上运行docker上的php api异常慢ubuntu 中拉代码git报错 请求YII的api时间 请求原生查询时间 win10 上运行docker上的php api异常慢 链接阿里数据的 入口直接返回的 网上有说是docker的dns解析慢&#xff1b; 也有说是…

Mac 卸载 PyCharm 方法

Mac 系统下 PyCharm 没有一键卸载程序&#xff0c;也没有完全卸载的插件&#xff0c;若要彻底删除&#xff0c;除了在应用&#xff08;Application&#xff09;里删除 PyCharm 到垃圾桶外&#xff0c;还需要在终端&#xff08;Terminal&#xff09;执行删除相应的文件及文件夹。…

线程和之间的通讯方式、进程之间的通讯方式、线程之间如何同步

通信是指线程之间以何种机制来交换信息&#xff0c;同步是指程序中用于控制不同线程间操作发生相对顺序的机制 进程由线程组成&#xff0c;所以进程中有的通讯机制线程中全都有 线程的通讯方式&#xff1a; 1. 锁机制&#xff1a;包括互斥锁、条件变量、读写锁 互斥锁提供了以…

小游戏分发平台如何以技术拓流?

2023年&#xff0c;小游戏的发展将受到多方面的影响&#xff0c;例如新技术的引入、参与小游戏的新玩家以及游戏市场的激烈竞争等。首先&#xff0c;新技术如虚拟现实&#xff08;VR&#xff09;、增强现实&#xff08;AR&#xff09;和机器人技术都可以带来新颖的游戏体验。其…

2022年下半年系统架构设计师真题(下午带答案)

试题一 (25分) 某电子商务公司拟升级其会员与促销管理系统&#xff0c;向用户提供个性化服务&#xff0c;提高用户的粘性。在项目立项之初&#xff0c;公司领导层一致认为本次升级的主要目标是提升会员管理方式的灵活性&#xff0c;由于当前用户规模不大&#xff0c;业务也相对…