【MogDB/openGauss误删未归档的xlog日志如何解决】

news/2024/7/9 21:45:22 标签: 数据库, postgresql, gaussdb

在使用MogDB/openGauss数据库的过程中,有时候大量业务,或者导数据会导致pg_xlog下的日志数量持续增长,此时如果xlog的产生频率太快,而来不及自动清理,极有可能造成pg_xlog目录的打满。如果对数据库的xlog不太了解的时候,可能造成误删未归档的xlog日志,或者更严重地,把对应操作还未写入数据文件的xlog也删除了。

本文将讲解了通常情况下pg_xlog下的xlog文件所处状态,并总结了数据已经落盘但未进行归档的xlog日志被误删时,日志周期产生缺失xlog日志报错和归档失败问题的几种解决方法。

一、pg_xlog下xlog文件的状态

通常情况下我们是不建议手动删除pg_xlog下的日志的,因为pg_xlog下的xlog有自动清理机制,可以根据需求配置参数调整清理速度。

而正常情况下,pg_xlog下应该存在如下的三种状态的xlog文件,在开启归档的情况下,可以进行相关讨论:

第一种:对应数据已经落盘,已经进行完归档。pg_xlog/archive_status中的状态为.done

可以手动删除,对数据库无影响,但是不建议手动删除,因为pg_xlog下的xlog有自动清理机制,可以根据需求配置参数调整清理速度

第二种:对应数据已经落盘,未进行归档。pg_xlog/archive_status中的状态为.ready

数据已落盘,但是未归档,删除pg_xlog下的xlog后,对当前的数据库里的数据无影响,但是如果想基于全量备份和连续的归档日志做PITR,则会缺少日志,而且归档会因为缺失被删除的这部分xlog而失败,后续归档都不成功,从而阻塞pg_xlog下xlog日志的正常的自动清理,数据库会打印相关报错:

DETAIL:  The failed archive command was: "cp pg_xlog/000000010000000200000071 /data/om3/data/archivedir/000000010000000200000071 "
cp: cannot stat 'pg_xlog/000000010000000200000071': No such file or directory

第三种:对应数据未落盘,未进行归档

刚写完xlog,但是数据还未落盘,此时删除xlog可能会丢数据,而且数据库可能服务出现问题,数据库无法启动,可能需要使用pg_resetxlog工具清理xlog,并重置pg_control文件中的一些其他控制信息,来保证数据库正常启动。pg_resetxlog将作为数据库修复的最后手段使用。而且修复而启动数据库后,可能会由于部分提交的事务,导致数据库和之前的数据不一致的情况。

image.png

二、处于第二种时,误删未归档的xlog日志报错如何解决

本篇测试内容使用的主要归档参数是archive_mode和archive_command。数据库版本是MogDB-3.0.5。

MogDB=# show archive_mode ;
 archive_mode 
--------------
 on
(1 row)

MogDB=# show archive_command ;
           archive_command           
-------------------------------------
 cp %p /data/om3/data/archivedir/%f 
(1 row)

MogDB=# show archive_dest ;
 archive_dest 
--------------
 
(1 row)

1、临时调整archive_command

如果是使用archive_command这个参数决定归档行为的时候,可以从archive_command命令下手,修改这个归档命令,骗过数据库说归档成功了。

如下环境已经模拟出了误删未归档的xlog的现象

om3@lmt0003 archive_status]$ rm ../000000010000000200000074

[om3@lmt0003 archive_status]$ cat  /data/om3/log/pg_log/dn_6001/postgresql-2023-11-01_115121.log | grep 000000010000000200000074|more
cp: cannot create regular file '/data/om3/data/archivedir/1/000000010000000200000074': No such file or directory2023-11-01 14:55:21.521 [unk
nown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] LOG:  archive command failed with exit code 1
2023-11-01 14:55:21.521 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] DETAIL:  The failed archive command was: "cp pg_xl
og/000000010000000200000074 /data/om3/data/archivedir/1/000000010000000200000074 " 
cp: cannot create regular file '/data/om3/data/archivedir/1/000000010000000200000074': No such file or directory2023-11-01 14:55:22.527 [unk
nown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] LOG:  archive command failed with exit code 1
2023-11-01 14:55:22.527 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] DETAIL:  The failed archive command was: "cp pg_xl
og/000000010000000200000074 /data/om3/data/archivedir/1/000000010000000200000074 " 
cp: cannot create regular file '/data/om3/data/archivedir/1/000000010000000200000074': No such file or directory2023-11-01 14:55:23.532 [unk
nown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] LOG:  archive command failed with exit code 1
2023-11-01 14:55:23.532 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] DETAIL:  The failed archive command was: "cp pg_xl
og/000000010000000200000074 /data/om3/data/archivedir/1/000000010000000200000074 " 
2023-11-01 14:55:23.532 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] WARNING:  xlog file "000000010000000200000074" cou
ld not be archived: too many failures

[om3@lmt0003 pg_xlog]$ cd archive_status/
[om3@lmt0003 archive_status]$ ls
00000001000000020000006F.done  000000010000000200000071.done  000000010000000200000073.done
000000010000000200000070.done  000000010000000200000072.done  000000010000000200000074.ready
[om3@lmt0003 archive_status]$ gsql -r
gsql ((MogDB 3.0.5 build 76182eb6) compiled at 2023-07-20 16:53:13 commit 0 last mr 1801 )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

MogDB=# select pg_switchover_xlog();
MogDB=# select pg_switch_xlog();
 pg_switch_xlog 
----------------
 2/750019D8
(1 row)

MogDB=# \q
[om3@lmt0003 archive_status]$ ls
00000001000000020000006F.done  000000010000000200000071.done  000000010000000200000073.done   000000010000000200000075.ready
000000010000000200000070.done  000000010000000200000072.done  000000010000000200000074.ready

1、修改postgresql.conf


[om3@lmt0003 archive_status]$ vi ../../postgresql.conf

archive_mode = on                                           
#archive_command = 'cp %p /data/om3/data/archivedir/%f '               
archive_command = 'ls -l /data/om3/data/ '           #别的命令也可以,只要执行的时候不报错就可以。达到骗过数据库的目的就可以。  

2.刷新配置

[om3@lmt0003 archive_status]$ gs_ctl reload

3.不产生error日志,并且archive_status的状态变为done


[om3@lmt0003 archive_status]$ ls
00000001000000020000006F.done  000000010000000200000071.done  000000010000000200000073.done   000000010000000200000075.ready
000000010000000200000070.done  000000010000000200000072.done  000000010000000200000074.ready
[om3@lmt0003 archive_status]$ ls
00000001000000020000006F.done  000000010000000200000071.done  000000010000000200000073.done   000000010000000200000075.ready
000000010000000200000070.done  000000010000000200000072.done  000000010000000200000074.ready
[om3@lmt0003 archive_status]$ ls
00000001000000020000006F.done  000000010000000200000071.done  000000010000000200000073.done  000000010000000200000075.done
000000010000000200000070.done  000000010000000200000072.done  000000010000000200000074.done

-----归档的报错之前大概每一分钟打印一次,每次打印多行。

[om3@lmt0003 archive_status]$ cat  /data/om3/log/pg_log/dn_6001/postgresql-2023-11-01_115121.log | grep 000000010000000200000074|tail -n 5
cp: cannot stat 'pg_xlog/000000010000000200000074': No such file or directory2023-11-01 15:00:17.297 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] LOG:  archive command failed with exit code 1
2023-11-01 15:00:17.297 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] DETAIL:  The failed archive command was: "cp pg_xlog/000000010000000200000074 /data/om3/data/archivedir/000000010000000200000074 " 
cp: cannot stat 'pg_xlog/000000010000000200000074': No such file or directory2023-11-01 15:00:18.302 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] LOG:  archive command failed with exit code 1
2023-11-01 15:00:18.302 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] DETAIL:  The failed archive command was: "cp pg_xlog/000000010000000200000074 /data/om3/data/archivedir/000000010000000200000074 " 
2023-11-01 15:00:18.302 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] WARNING:  xlog file "000000010000000200000074" could not be archived: too many failures
[om3@lmt0003 archive_status]$ date
Wed Nov  1 15:13:43 CST 2023

4、修改postgresql.conf为正常。

archive_mode = on                                           
archive_command = 'cp %p /data/om3/data/archivedir/%f '               
#archive_command = 'ls -l /data/om3/data/ '  

然后刷新配置。这样就一切恢复正常了。只是缺少了删除的这部分xlog以及欺骗数据库归档命令期间的xlog,参数调整回来的后续日志可以继续归档。也解决了持续产生日志报错的问题。

[om3@lmt0003 archive_status]$ gs_ctl reload

2、修改archive_status目录下误删的xlog对应的xxx.ready状态文件

如下环境已经模拟出了误删未归档的xlog的现象

om3@lmt0003 pg_xlog]$ rm 000000010000000200000071

image.png

日志出现相关报错
 

image.png


并且后续的xlog日志

image.png


 

image.png


查看日志打印频率,每一分钟打印一次,一次打印多行

image.png

手动将archive_status下日志提示的缺少的xlog对应的状态文件的xxx.ready改成xxx.done

om3@lmt0003 archive_status]$ cp 000000010000000200000071.ready 000000010000000200000071.ready_bak
om3@lmt0003 archive_status]$ mv 000000010000000200000071.ready 000000010000000200000071.done

日志不再报错,除了丢失的xlog外,后续日志可以正常进行归档。

image.png

image.png

3.删除archive_status目录下误删的xlog对应的xxx.ready状态文件

模拟误删未归档的xlog的现象

[om3@lmt0003 archive_status]$ ls
00000001000000020000006F.done  000000010000000200000071.done  000000010000000200000073.done  000000010000000200000075.done
000000010000000200000070.done  000000010000000200000072.done  000000010000000200000074.done

[om3@lmt0003 archive_status]$ gsql -r
gsql ((MogDB 3.0.5 build 76182eb6) compiled at 2023-07-20 16:53:13 commit 0 last mr 1801 )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

MogDB=# select pg_switch_xlog();
 pg_switch_xlog 
----------------
 2/7600BAC0
(1 row)

MogDB=# \q
[om3@lmt0003 archive_status]$ ls
00000001000000020000006F.done  000000010000000200000071.done  000000010000000200000073.done  000000010000000200000075.done
000000010000000200000070.done  000000010000000200000072.done  000000010000000200000074.done  000000010000000200000076.ready
[om3@lmt0003 archive_status]$ rm ../000000010000000200000076
[om3@lmt0003 archive_status]$ cat  /data/om3/log/pg_log/dn_6001/postgresql-2023-11-01_115121.log | grep 000000010000000200000076|tail -n 5cp: cannot create regular file '/data/om3/data/archivedir/1/000000010000000200000076': No such file or directory2023-11-01 15:25:37.642 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] LOG:  archive command failed with exit code 1
2023-11-01 15:25:37.642 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] DETAIL:  The failed archive command was: "cp pg_xlog/000000010000000200000076 /data/om3/data/archivedir/1/000000010000000200000076 " 
cp: cannot create regular file '/data/om3/data/archivedir/1/000000010000000200000076': No such file or directory2023-11-01 15:25:38.647 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] LOG:  archive command failed with exit code 1
2023-11-01 15:25:38.647 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] DETAIL:  The failed archive command was: "cp pg_xlog/000000010000000200000076 /data/om3/data/archivedir/1/000000010000000200000076 " 
2023-11-01 15:25:38.647 [unknown] [unknown] localhost 70393223549024 0[0:0#0] 0 [BACKEND] WARNING:  xlog file "000000010000000200000076" could not be archived: too many failures

MogDB=# select pg_switch_xlog();
 pg_switch_xlog 
----------------
 2/77001AC8
(1 row)

MogDB=# \q
[om3@lmt0003 archive_status]$ ls
00000001000000020000006F.done  000000010000000200000072.done  000000010000000200000075.done
000000010000000200000070.done  000000010000000200000073.done  000000010000000200000076.ready
000000010000000200000071.done  000000010000000200000074.done  000000010000000200000077.ready

在pg_xlog/archive_status下删除缺失的xlog对应的xxx.ready的状态文件

00000001000000020000006F.done  000000010000000200000072.done  000000010000000200000075.done
000000010000000200000070.done  000000010000000200000073.done  000000010000000200000076.ready
000000010000000200000071.done  000000010000000200000074.done  000000010000000200000077.ready

[om3@lmt0003 archive_status]$ mv 000000010000000200000076.ready 000000010000000200000076.ready_bak
[om3@lmt0003 archive_status]$ rm -rf 000000010000000200000076.ready

[om3@lmt0003 archive_status]$ ls
00000001000000020000006F.done  000000010000000200000072.done  000000010000000200000075.done
000000010000000200000070.done  000000010000000200000073.done  000000010000000200000076.ready_bak
000000010000000200000071.done  000000010000000200000074.done  000000010000000200000077.ready

发现日志已经不再报缺失xlog以及归档失败的error了,而且后续pg_xlog下的xlog日志可以正常进行归档。

[om3@lmt0003 archive_status]$ ls
00000001000000020000006F.done  000000010000000200000072.done  000000010000000200000075.done
000000010000000200000070.done  000000010000000200000073.done  000000010000000200000076.ready_bak
000000010000000200000071.done  000000010000000200000074.done  000000010000000200000077.ready
[om3@lmt0003 archive_status]$ tail -f  /data/om3/log/pg_log/dn_6001/postgresql-2023-11-01_115121.log | grep 000000010000000200000076^C

[om3@lmt0003 archive_status]$ ls
00000001000000020000006F.done  000000010000000200000072.done  000000010000000200000075.done
000000010000000200000070.done  000000010000000200000073.done  000000010000000200000076.ready_bak
000000010000000200000071.done  000000010000000200000074.done  000000010000000200000077.done

[om3@lmt0003 archive_status]$ cd ../../archivedir/
[om3@lmt0003 archivedir]$ ls 000000010000000200000077
000000010000000200000077

http://www.niftyadmin.cn/n/5157379.html

相关文章

第三方登录和第三方支付

第三方登录 在现代Web应用中,提供第三方登录选项已经变得非常普遍。用户可以使用其社交媒体或其他在线帐户(如Google、GitHub或Facebook)来访问您的应用程序,而无需创建新的用户名和密码。这提供了更好的用户体验,减少…

SpringBoot + 微信支付 --- 内网穿透ngrok(安装、使用) 及 支付通知-->接收支付通知和返回应答

目录 Native 下单1、内网穿透 ngrok1-1:注册下载2-2:使用方式3-3:测试 2、支付通知--接收支付通知和返回应答完整需求介绍:2-1、需求1:应答测试2-2、应答的代码:2-3、结果:测试:应答…

Javaweb之javascript的详细解析

JavaScript html完成了架子,css做了美化,但是网页是死的,我们需要给他注入灵魂,所以接下来我们需要学习JavaScript,这门语言会让我们的页面能够和用户进行交互。 1.1 介绍 通过代码/js效果演示提供资料进行效果演示&…

解决buildroot中fakeroot执行很慢的问题

在使用docker容器作为buildroot环境时,生成文件系统时,buildroot会使用fakeroot来设置一些文件权限。曾经生成文件系统是非常快的,最近使用docker容易,生成文件系统时,速度变的非常慢。尝试更换到ubuntu下,…

FPGA设计过程中有关数据之间的并串转化

1.原理 并串转化是指的是完成串行传输和并行传输两种传输方式之间的转换的技术,通过移位寄存器可以实现串并转换。 串转并,将数据移位保存在寄存器中,再将寄存器的数值同时输出; 并转串,将数据先进行移位&#xff0…

【JavaEE】实现简单博客系统-前端部分

文件目录&#xff1a; 展示&#xff1a; blog_list.html: <!DOCTYPE html> <html lang"cn"> <head><meta charset"UTF-8"><meta name"viewport" content"widthdevice-width, initial-scale1.0"><t…

【C/C++】什么是POD(Plain Old Data)类型

2023年11月6日&#xff0c;周一下午 目录 POD类型的定义标量类型POD类型的特点POD类型的例子整数类型&#xff1a;C 风格的结构体&#xff1a;数组&#xff1a;C 风格的字符串&#xff1a;std::array:使用 memcpy 对 POD 类型进行复制把POD类型存储到文件中&#xff0c;并从文…

Redis:Hash应用场景(一)

一、概述 Hash通过key-field-value结构实现了一个双层map的结构。可以应用于对象缓存。 就比如有一个user表&#xff1a; idnamebalance1abc1000 可以通过HMSET user 1:name def 1:balance 2000对两个缓存字段同时进行修改。 二、场景比较 还有一种方式是通过Object序列化成…