时序数据库性能测试工具 TSBS

在这里插入图片描述

TSBS 是一个时序数据处理（数据库）系统的性能基准测试平台，提供了 IoT、DevOps 两个典型应用场景，它由 Timescale 开源并负责维护。作为一个性能基准测试平台，TSBS 具有便捷、易用、扩展灵活等特点，涵盖了时序数据的生成、写入（加载）、多种类别的典型查询等功能，并能够自动汇总最终结果。由于其开放开源的特点，得到了众多数据库厂商的支持，作为专业的产品性能基准测试平台被若干数据库厂商广泛使用。

01 TSBS 性能测试原理

首先，我们来了解一下 TSBS 进行性能测试测试的本质原理。

PostgreSQL 在执行查询时会为每个查询生成一个对应的查询计划，对于数据库而言选择正确的计划来匹配查询结构和数据属性对于性能来说十分关键，因此 PostgreSQL 中使用了一个复杂的规划器 Planner 来尝试选择最好查询计划。

在 PostgreSQL 中可以使用 EXPLAIN 命令查看规划器为每个查询生成的查询规划的具体内容。

在 TimescaleDB 的查询性能测试中，单次查询耗时可以直接使用 EXPLAIN ANALYZE 命令显示 TIMING 选项包含的每个计划节点的实际启动时间 Planning time 和总执行时间 Execution time

TSBS 的性能测试很大程度上也是基于这一功能实现的，下面我们来了解一下 EXPLAIN 命令。

1.1 EXPLAIN 命令的使用

在 PostgreSQL 中，EXPLAIN 命令可以输出 SQL 语句的查询计划，具体语法如下：

EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement

where option can be one of:

    ANALYZE [ boolean ]
    VERBOSE [ boolean ]
    COSTS [ boolean ]
    BUFFERS [ boolean ]
    TIMING [ boolean ]
    SUMMARY [ boolean ]
    FORMAT { TEXT | XML | JSON | YAML }

ANALYZE 是 EXPLAIN 命令中最为重要的选项，该选项为 TRUE 时 EXPLAIN 命令会实际执行该查询，然后显示真实的行计数和在每个计划结点中累计的真实运行时间

下面是和 ANALYZE 关联使用的几个选项，仅 ANALYZE 启用时下列参数才生效

TIMING 选项显示每个计划节点的实际启动时间和总的执行时间，默认为 TRUE
SUMMARY 选项在查询计划后面输出总结信息，默认为 TRUE
BUFFERS 选项显示关于缓存的使用信息，默认为 FALSE

还有十分常用的选项 VERBOSE 和 FORMAT：

VERBOSE 选项为 TRUE 会显示查询计划的附加信息，包括计划树中每个节点的输出列列表、模式限定表和函数名，默认为 FALSE
FORMAT 指定输出格式，默认为 TEXT，还有 XML | JSON | YAML 等格式输出更有利于通过程序解析SQL 语句的查询计划

1.2 EXPLAIN 命令的输出

从最简单的全表查询例子开始，可以看到查询计划中使用 Seq Scan 扫描全表

EXPLAIN SELECT * FROM tenk1;

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)

尝试连接两个表，并且增加 WHERE 条件，这就生成了更加复杂的查询计划树

EXPLAIN SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;

                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Nested Loop  (cost=4.65..118.62 rows=10 width=488)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=4.36..39.47 rows=10 width=244)
         Recheck Cond: (unique1 < 10)
         ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..4.36 rows=10 width=0)
               Index Cond: (unique1 < 10)
   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..7.91 rows=1 width=244)
         Index Cond: (unique2 = t1.unique2)

使用 EXPLAIN ANALYZE 显示计划结点执行时间和行计数之外的额外执行统计信息

EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2 ORDER BY t1.fivethous;

                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=717.34..717.59 rows=101 width=488) (actual time=7.761..7.774 rows=100 loops=1)
   Sort Key: t1.fivethous
   Sort Method: quicksort  Memory: 77kB
   ->  Hash Join  (cost=230.47..713.98 rows=101 width=488) (actual time=0.711..7.427 rows=100 loops=1)
         Hash Cond: (t2.unique2 = t1.unique2)
         ->  Seq Scan on tenk2 t2  (cost=0.00..445.00 rows=10000 width=244) (actual time=0.007..2.583 rows=10000 loops=1)
         ->  Hash  (cost=229.20..229.20 rows=101 width=244) (actual time=0.659..0.659 rows=100 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 28kB
               ->  Bitmap Heap Scan on tenk1 t1  (cost=5.07..229.20 rows=101 width=244) (actual time=0.080..0.526 rows=100 loops=1)
                     Recheck Cond: (unique1 < 100)
                     ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0) (actual time=0.049..0.049 rows=100 loops=1)
                           Index Cond: (unique1 < 100)
 Planning time: 0.194 ms
 Execution time: 8.008 ms

EXPLAIN 命令的输出可以看做是一个树形结构，我们称之为查询计划树，树的每个节点包括对应的节点类型，作用对象以及其他属性例如 cost, rows, width 等。如果只显示节点类型，上面的例子可以简化为如下结构：

Sort
└── Hash Join
    ├── Seq Scan
    └── Hash
        └── Bitmap Heap Scan
            └── Bitmap Index Scan

02 TSBS 安装

TSBS 工具是使用 Golang 语言开发的，并且可以通过 go get 命令快捷获取

2.1 安装 Golang

首先，需要安装 Golang ，可以到官网获取最新版本的 go 安装包：https://golang.google.cn/doc/install

wget https://golang.google.cn/dl/go1.20.7.linux-arm64.tar.gz
rm -rf /usr/local/go 
tar -C /usr/local -zxvf go1.20.7.linux-amd64.tar.gz

解压完成后，配置环境变量 vim ~/.bashrc 添加如下内容

export GOROOT=/usr/local/go 	# 设置为go安装的路径，有些安装包会自动设置默认的goroot
export GOPATH=$HOME/go-work  	# 默认的 Golang 项目的工作空间，如果有多个使用分号分隔
export PATH=$PATH:$GOROOT/bin # 可执行文件路径

配置完成之后，查看安装配置是否成功

go env			# 查看得到go的配置信息
go version	# 查看得到go的版本号

go mod 包管理工具使用（reference：https://zhuanlan.zhihu.com/p/482014524），使用 go mod 需要修改 go 的配置信息如下

go env -w GOBIN=/usr/local/go/bin					# go 的可执行文件目录
go env -w GO111MODULE=auto							# 如果为 off 不启用 module 功能到 GOPATH 路径下查找依赖包；on 表示启动 module 功能
go env -w GOPROXY=https://goproxy.cn,direct	# 设置国内代理

go mod 基础使用

# 初始化一个项目
mkdir Proj && cd Proj
go mod init Proj				# 初始化一个 module

# 删除错误或者不使用的 modules
go mod tidy

# 编辑 go.mod 文件
go mod edit

2.2 安装 TSBS

直接使用 go get 获取 tsbs 安装

go get github.com/timescale/tsbs
cd $GOPATH/src/github.com/timescale/tsbs
go mod tidy
make

使用任意 tsbs 工具验证安装成功，如下使用数据生成工具 tsbs_generate_data 执行成功如下：

$ tsbs_generate_data --help
Usage of tsbs_generate_data:
      --debug int                              Control level of debug output
      --file string                            Write the output to this path
      --format string                          Format to generate. (choices: cassandra, clickhouse, influx, mongo, siridb, timescaledb, akumuli, cratedb, prometheus, victoriametrics, timestream, questdb)
      --initial-scale uint                     Initial scaling variable specific to the use case (e.g., devices in 'devops'). 0 means to use -scale value
      --interleaved-generation-group-id uint   Group (0-indexed) to perform round-robin serialization within. Use this to scale up data generation to multiple processes.
      --interleaved-generation-groups uint     The number of round-robin serialization groups. Use this to scale up data generation to multiple processes. (default 1)
      --log-interval duration                  Duration between data points (default 10s)
      --max-data-points uint                   Limit the number of data points to generate, 0 = no limit
      --max-metric-count uint                  Max number of metric fields to generate per host. Used only in devops-generic use-case (default 100)
      --profile-file string                    File to which to write go profiling data
      --scale uint                             Scaling value specific to use case (e.g., devices in 'devops'). (default 1)
      --seed int                               PRNG seed (default: 0, which uses the current timestamp)
      --timestamp-end string                   Ending timestamp (RFC3339). (default "2016-01-02T00:00:00Z")
      --timestamp-start string                 Beginning timestamp (RFC3339). (default "2016-01-01T00:00:00Z")
      --use-case string                        Use case to generate.
pflag: help requested

03 TSBS 基础使用

3.1 生成测试数据

使用 tsbs_generate_data 工具生成测试数据，如下例子中生成 cpu 相关的时序数据，时段为 2023-08-01T00:00:00 到 2023-08-02T00:00:00Z 这一天内每隔 10 秒一条数据，这些数据通过管道传输到指定的 gzip 文件中

$ tsbs_generate_data --use-case="cpu-only" --seed=123 --scale=4000 \
    --timestamp-start="2023-08-01T00:00:00Z" \
    --timestamp-end="2023-08-02T00:00:00Z" \
    --log-interval="10s" --format="timescaledb" \
    | gzip > /tmp/timescaledb-data.gz

使用该工具生成测试工具时需要指定相关参数，部分参数含义如下：

–use-case：数据类型，分为 iot、cpu-only 和 devops
–seed：指定伪随机数生成器 PRNG 种子，例如 123
–scale：指定生成数据中的设备数量，例如 4000
–timestamp-start：数据生成的开始时间，例如 2023-08-01T00:00:00Z
–timestamp-end：数据生成的结束时间，例如 2023-08-02T00:00:00Z
–log-interval：数据之间的时间间隔，以秒为单位，例如10
–format：数据格式匹配指定数据库类型，例如 timescaledb (cassandra、clickhouse、cratedb、influx、mongo、siridb、timescaledb)

3.2 生成查询脚本

可以使用 tsbs_generate_queries 工具生成指定查询类型的脚本，也可以使用 scripts/generate_queries.sh 脚本批量生成多种类型的查询脚本

# 生成指定类型的查询脚本
$ tsbs_generate_queries --use-case="cpu-only" --seed=123 --scale=4000 \
    --timestamp-start="2023-08-01T00:00:00Z" \
    --timestamp-end="2023-08-02T00:00:00Z" \
    --queries=1000 --query-type="single-groupby-1-1-1" \
    --format="timescaledb" \
    | gzip > /tmp/timescaledb-queries.gz

# 批量生成多种类型的查询脚本
$ cd /home/randy/go/src/github.com/timescale/tsbs
$ FORMATS="timescaledb" SCALE=4000 SEED=123 \
    TS_START="2023-08-01T00:00:00Z" \
    TS_END="2023-08-02T00:00:01Z" \
    QUERIES=1000 QUERY_TYPES="single-groupby-1-1-1 single-groupby-1-1-12 single-groupby-1-8-1 single-groupby-5-1-1 single-groupby-5-1-12 single-groupby-5-8-1 cpu-max-all-1 cpu-max-all-8 double-groupby-1 double-groupby-5 double-groupby-all high-cpu-all high-cpu-1 lastpoint groupby-orderby-limit" \
    BULK_DATA_DIR="/tmp/bulk_queries" scripts/generate_queries.sh

生成查询脚本的参数与生成测试数据的指定参数大部分是一致的，不同的参数主要有：

–queries：指定生成查询的数量，例如 1000
–query-type：指定生成查询脚本的类型，例如 single-groupby-1-1-1 或 last-loc 等

cpu-only 的时序数据生成查询脚本的可选类型如下表:

Query type	Description
single-groupby-1-1-1	Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 1 hour
single-groupby-1-1-12	Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 12 hours
single-groupby-1-8-1	Simple aggregrate (MAX) on one metric for 8 hosts, every 5 mins for 1 hour
single-groupby-5-1-1	Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 1 hour
single-groupby-5-1-12	Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 12 hours
single-groupby-5-8-1	Simple aggregrate (MAX) on 5 metrics for 8 hosts, every 5 mins for 1 hour
cpu-max-all-1	Aggregate across all CPU metrics per hour over 1 hour for a single host
cpu-max-all-8	Aggregate across all CPU metrics per hour over 1 hour for eight hosts
double-groupby-1	Aggregate on across both time and host, giving the average of 1 CPU metric per host per hour for 24 hours
double-groupby-5	Aggregate on across both time and host, giving the average of 5 CPU metrics per host per hour for 24 hours
double-groupby-all	Aggregate on across both time and host, giving the average of all (10) CPU metrics per host per hour for 24 hours
high-cpu-all	All the readings where one metric is above a threshold across all hosts
high-cpu-1	All the readings where one metric is above a threshold for a particular host
lastpoint	The last reading for each host
groupby-orderby-limit	The last 5 aggregate readings (across time) before a randomly chosen endpoint

3.3 写入性能测试

tsbs 支持动态和静态数据两种写入场景：

使用 tsbs_load 工具模拟生产环境，动态生成数据并加载
预先生成的静态数据，可以使用 tsbs_load 工具加载数据文件，或者使用免配置文件的 tsbs_load_* 系列工具根据指定的数据库类型加载数据

tsbs_load 工具可以用于加载动态模拟数据或预先生成的数据文件，但是这种方式依赖于配置文件，具体使用方法参考：tsbs_load.md，配置文件样例参考：sample-configs

接下来主要介绍两种读取预先生成的数据写入到数据库实例中的方法：

使用 tsbs_load_<database> 系列工具将指定文件中的数据加载到指定数据库实例，可以用于实现向远程数据库实例写入数据
使用 scripts/load/load_<database>.sh 脚本将数据文件中的数据写入到本地数据库实例中，这个脚本的实质也是封装 tsbs_load_<database> 系列工具进行加载数据

# 使用 tsbs_load_timescaledb 工具向远程数据库实例写入数据
$ cat /tmp/timescaledb-data.gz | gunzip | tsbs_load_timescaledb \
--postgres="sslmode=require" --host="my.tsdb.host" --port=5432 --pass="password" \
--user="benchmarkuser" --admin-db-name=defaultdb --workers=8  \
--in-table-partition-tag=true --chunk-time=8h --write-profile= \
--field-index-count=1 --do-create-db=true --force-text-format=false \
--do-abort-on-exist=false

# 使用 load_timescaledb.sh 脚本向本地 timescaledb 实例写入数据
$ cd /home/randy/go/src/github.com/timescale/tsbs
$ NUM_WORKERS=2 BATCH_SIZE=10000 BULK_DATA_DIR=/tmp DATABASE_PORT=port \
DATABASE_USER=user DATABASE_NAME=dbname DATABASE_PWD=passwd \
    scripts/load/load_timescaledb.sh

tsbs_load_<database> 工具可指定参数，不同的数据库类型并不一致，在使用时可以使用 --help 命令查看相关参数的含义

$ tsbs_load_influx --help
Usage of tsbs_load_influx:
      --backoff duration            Time to sleep between requests when server indicates backpressure is needed. (default 1s)
      --batch-size uint             Number of items to batch together in a single insert (default 10000)
      --consistency string          Write consistency. Must be one of: any, one, quorum, all. (default "all")
      --db-name string              Name of database (default "benchmark")
      --do-abort-on-exist           Whether to abort if a database with the given name already exists.
      --do-create-db                Whether to create the database. Disable on all but one client if running on a multi client setup. (default true)
      --do-load                     Whether to write data. Set this flag to false to check input read speed. (default true)
      --file string                 File name to read data from
      --gzip                        Whether to gzip encode requests (default true). (default true)
      --hash-workers                Whether to consistently hash insert data to the same workers (i.e., the data for a particular host always goes to the same worker)
      --insert-intervals string     Time to wait between each insert, default '' => all workers insert ASAP. '1,2' = worker 1 waits 1s between inserts, worker 2 and others wait 2s
      --limit uint                  Number of items to insert (0 = all of them).
      --replication-factor int      Cluster replication factor (only applies to clustered databases). (default 1)
      --reporting-period duration   Period to report write stats (default 10s)
      --results-file string         Write the test results summary json to this file
      --seed int                    PRNG seed (default: 0, which uses the current timestamp)
      --urls string                 InfluxDB URLs, comma-separated. Will be used in a round-robin fashion. (default "http://localhost:8086")
      --workers uint                Number of parallel clients inserting (default 1)
pflag: help requested

数据写入过程中每 10 秒打印一次数据写入情况，每次打印的内容包括：

timestamp：结果输出时间，时间戳
metrics per second in the period：距离上一次输出期间每秒写入的 metric 数据量
total metrics inserted：已经插入的 metric 数据量
overall metrics per second：每秒总 metric 数据量
rows per second in the period：距离上一次输出期间每秒写入的记录行数 row
total number of rows：已经插入的记录行数
overall rows per second：每秒总记录行数

并在最后打印整体的性能测试结果包含 metric 数据和记录行 row 平均插入速度

time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
# ...
1518741528,914996.143291,9.652000E+08,1096817.886674,91499.614329,9.652000E+07,109681.788667
1518741548,1345006.018902,9.921000E+08,1102333.152918,134500.601890,9.921000E+07,110233.315292
1518741568,1149999.844750,1.015100E+09,1103369.385320,114999.984475,1.015100E+08,110336.938532

Summary:
loaded 1036800000 metrics in 936.525765sec with 8 workers (mean rate 1107070.449780/sec)
loaded 103680000 rows in 936.525765sec with 8 workers (mean rate 110707.044978/sec)

3.4 查询性能测试

在数据库实例完成数据写入并准备好查询脚本之后，可以使用 tsbs_run_queries_<database> 系列工具来执行查询脚本并生成性能测试结果

$ cat /tmp/queries/timescaledb-cpu-max-all-8-queries.gz | \
    gunzip | tsbs_run_queries_timescaledb --workers=8 \
        --postgres="sslmode=disable" --hosts="my.tsdb.host" --port=5432 \ 
        --user="benchmarkuser" --pass="passwd"

执行查询测试完成后会得到如下结果 min med mean max stddev sum count | 最小值中位数平均值最大值 stddev 总时间执行次数

run complete after 1000 queries with 8 workers:
TimescaleDB max cpu all fields, rand    8 hosts, rand 12hr by 1h:
min:    51.97ms, med:   757.55, mean:  2527.98ms, max: 28188.20ms, stddev:  2843.35ms, sum: 5056.0sec, count: 2000
all queries                                                     :
min:    51.97ms, med:   757.55, mean:  2527.98ms, max: 28188.20ms, stddev:  2843.35ms, sum: 5056.0sec, count: 2000
wall clock time: 633.936415sec

如果文章对你有帮助，欢迎一键三连 👍 ⭐️ 💬 。如果还能够点击关注，那真的是对我最大的鼓励 🔥 🔥 🔥 。

参考资料

GitHub - timescale/benchmark-postgres: Tools for benchmarking TimescaleDB vs PostgreSQL

GitHub - timescale/tsbs: Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data

TimescaleDB vs. PostgreSQL for time-series

基于 TSBS 标准数据集数据库>时序数据库 TimescaleDB、InfluxDB 与 TDengine 的性能对比测试

timescaledb简介及性能测试