GeXiangDong

deploy maven artifacts to remote SSH servers by wagon-ssh-external component

发表于 2025-07-12

Maven’s wagon-ssh-external component enables deploying artifacts to remote SSH servers by leveraging an external SSH client. It requires an already installed SSH program on the system and allows Maven to interact with the remote server for deployments.

Key Concepts and Usage:

Wagon Provider:

wagon-ssh-external is a Wagon provider, meaning it implements the Wagon API for interacting with remote repositories.

External SSH Client:

This component relies on an existing SSH client (like OpenSSH) being installed on the machine where Maven is running.

Deployment:

It’s primarily used for deploying artifacts (JARs, WARs, etc.) and website content to SSH servers.

Configuration:

pom.xml

...
<build>
  <extensions>
    <extension>
      <groupId>org.apache.maven.wagon</groupId>
      <artifactId>wagon-ssh-external</artifactId>
      <version>3.5.3</version>
    </extension>
  </extensions>
</build>

<distributionManagement>
  <repository>
    <id>ssh-repository </id>
    <url>scpexe://your_username@ssh-host-name-or-ip/path/to/your/repository</url>
  </repository>
</distributionManagement>

...

in url, scpexe:// instead of scp://
if already configred ssh login to ssh server by rsa keys, then the above configuration will work without below setting.xml changed.
if need password or scp command … configure in setting.xml as below.

setting.xml

...

<settings>
  <servers>
    <server>
      <id>ssh-repository</id>
      <username>your_username</username>
      <privateKey>/path/to/your/private/key</privateKey>
      <password>or-password-here</password>
      <!-- Not needed if using pageant -->
      <configuration>
        <sshExecutable>plink</sshExecutable>  <!-- Or your SSH client -->
        <scpExecutable>pscp</scpExecutable>   <!-- Or your SCP client -->
        <sshArgs>-o StrictHostKeyChecking=no</sshArgs>  <!-- Example: no host key checking -->
      </configuration>
    </server>
  </servers>
</settings>

...

VS Code: Unable to resolve your shell environment: Unexpected exit code from spawned shell

发表于 2024-12-23

起因

nodejs的某个版本，需要在.zshrc里增加一行

1	export NODE_OPTIONS=--openssl-legacy-provider

否则启动 vue2 的 dev server 的 npm 命令会出错。

opensslErrorStack: [ 'error:03000086:digital envelope routines::initialization error' ],
library: 'digital envelope routines',
reason: 'unsupported',
code: 'ERR_OSSL_EVP_UNSUPPORTED'

VUE3 Vite 似乎不存在这个问题。

但是这句不能被 Visual Studio Code 很好的识别，每次启动Visual Studio Code 都会报错。

Unable to resolve your shell environment: Unexpected exit code from spawned shell (code 9, signal null)

这个错误对话框只能点击差号关掉，默认按钮是 restart，稍有不慎误点start就重启又来一次。

解决办法

VS Code 启动时会增加一些 VSCODE_ 的环境变量，例如：

VSCODE_CODE_CACHE_PATH
VSCODE_CWD
VSCODE_IPC_HOOK
VSCODE_NLS_CONFIG
VSCODE_PID

因此通过在 .zshrc 中判断是否有这些变量，有则不设置 NODE 用的变量，无则设置来避免 VSCode 解析出错。

1
2
3

if ! printenv | grep VSCODE_; then
  export NODE_OPTIONS=--openssl-legacy-provider
fi

PostgreSQL 查询没使用 Index Only Scan 的可能原因

发表于 2023-03-04

现象

服务器上记录了一个简单SQL耗时5秒，SQL如下：

1	select count(*) as cnt from site_user m where m.company_id = 'efbf0bbb-f02c-44cc-a04e-de6b36921435' and m.phone is not null;

site_user 表有索引包含 company_id, phone 两个字段，使用如下命令查看查询计划

1 2	explain(analyze, costs) select count(*) as cnt from site_user m where m.company_id = 'efbf0bbb-f02c-44cc-a04e-de6b36921435' and m.phone is not null;

获得查询计划如下：

                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=39193.81..39193.82 rows=1 width=8) (actual time=5649.018..5649.019 rows=1 loops=1)
   ->  Bitmap Heap Scan on site_user m  (cost=927.98..39137.18 rows=22652 width=0) (actual time=14.930..5645.707 rows=25829 loops=1)
         Recheck Cond: ((company_id = 'efbf0bbb-f02c-44cc-a04e-de6b36921435'::bpchar) AND (phone IS NOT NULL))
         Heap Blocks: exact=17346
         ->  Bitmap Index Scan on member_query  (cost=0.00..922.31 rows=22652 width=0) (actual time=11.634..11.634 rows=25829 loops=1)
               Index Cond: (company_id = 'efbf0bbb-f02c-44cc-a04e-de6b36921435'::bpchar)
 Planning Time: 10.359 ms
 Execution Time: 5649.175 ms
(8 rows)

这个查询计划不太正常，因为 member_query 索引已经包含了 company_id, phone ，查询可以使用 Index Only Scan ，却没有使用。

另外发现更换 company_id 的值，当值不存在或记录非常少的id时，查询计划是正常的，使用 Index Only Scan 。而其他数据量很大的 company_id 也这样。

原因

上面的查询计划显示，在执行 Bitmap Index Scan 后，查询了数据， Heap Blocks: exact=17346 表示从17346个Block读取了数据，这也是速度慢原因。

Visibility Map 用来决定是否能使用 Index Only Scan ，当 Visibility Map 每个恰当的反应数据时，就无法使用 index only scan 了。可通过 VACUUM 更新相应的表的 Visibility Map信息

关于 Visibility Map 可看下面的网页

https://www.postgresql.org/docs/current/storage-vm.html
https://stackoverflow.com/questions/62834678/why-does-postgres-still-do-a-bitmap-heap-scan-when-a-covering-index-is-used
https://www.modb.pro/db/447177

解决方案

要想提速有2个方向：

使用 Index Only Scan 【最佳】
无法 Index Only Scan 时要减少读取 Block 数【次之】

更新 Visibility Map 以便 Index Only Scan 生效

Index Only Scan 由 Visiblility Map 状态决定，Visibility Map 会由 postgreSQL 自动触发维护，未使用 Index Only Scan 原因应该是过多的 Visiblity Map 太多 page 不是visible 状态导致的。而尚未到达触发 auto vacuum 的阈值。可手动运行一下 vacuum

auto vacuum 阈值设置可看这里

auto vacuum 工作原理可看这里

1	VACUUM (VERBOSE, ANALYZE) site_user;

之后再运行 explain 查看查询方案

                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1241.14..1241.15 rows=1 width=8) (actual time=5.454..5.454 rows=1 loops=1)
   ->  Index Only Scan using member_query on site_user m  (cost=0.42..1183.06 rows=23236 width=0) (actual time=0.018..4.265 rows=25829 loops=1)
         Index Cond: (company_id = 'efbf0bbb-f02c-44cc-a04e-de6b36921435'::bpchar)
         Heap Fetches: 0
 Planning Time: 0.105 ms
 Execution Time: 5.473 ms
(6 rows)

已经是 Index Only Scan 了。

Heap Fetches: 0 表示 Visibility Map 中对应信息全部都是Visible，不需要从heap中查询了，当有部分数据更新，尚未 vacuum 前，这里的值会大于0。因为无需额外操作，为0时速度最快。

VACUUM

重排表内行存储的物理顺序以减少读取 Block

如果需要读取数据存储的连续（类似其他数据库的 Clustered index ），则可减少读取。 PostgreSQL 可通过 CLUSTER 命令来调整存储顺序。

1	CLUSTER [VERBOSE] table_name [ USING index_name ]

注意这个命令执行重组存储后插入/更新的数据，不会自动按照这个存储，需要再次运行这个命令。这和其他数据库的 Clustered indexe 不同。

CLUSTER

注意 VACUUM 和 CLUSTER 都会导致锁表，且耗时长，在生产服务器上慎用。

PostgreSQL: select max(x) vs select x ... order by x desc limit 1

发表于 2023-02-19

在通常状况下（不利用索引），用max来查询一个字段的最大值是最佳选择，使用oder by这种方式相比max肯定是一个糟糕至极的选择。

但是在恰当的利用索引的情况下，PostgreSQL order by limit 1 这种方式给出了不同的结果。

比较 I（where中包含一个字段）

环境

table

create table user_login_record(
  user_id         char(36) not null,
  login_date    timestamp  not null
);

上面这个表（请忽略它并不符合实际的业务或者编码设计规范），有2个字段，我们根据user_id 来查询最后一个 login_date

我们忽略掉无索引的状况（正如前文所说max在无索引时性能最佳），直接建立索引

1	create index user_login_date on user_login_record (user_id, login_date desc);

比较结果

我们来比较2个SQL的性能

max

explain (analyze, buffers, costs) 
select max(login_date) 
from user_login_record 
where user_id='952bd155-06b3-4792-82ec-4b86d06c86a7'

                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.45..0.46 rows=1 width=8) (actual time=0.074..0.075 rows=1 loops=1)
   Buffers: shared hit=4
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.41..0.45 rows=1 width=8) (actual time=0.071..0.071 rows=1 loops=1)
           Buffers: shared hit=4
           ->  Index Only Scan using user_login_date on user_login_record  (cost=0.41..32.76 rows=1077 width=8) (actual time=0.070..0.070 rows=1 loops=1)
                 Index Cond: ((user_id = '952bd155-06b3-4792-82ec-4b86d06c86a7'::bpchar) AND (login_date IS NOT NULL))
                 Heap Fetches: 0
                 Buffers: shared hit=4
 Planning Time: 0.124 ms
 Execution Time: 0.097 ms
(11 rows)

order by … desc limit 1

explain (analyze, buffers, costs) 
select login_date 
from user_login_record 
where user_id='952bd155-06b3-4792-82ec-4b86d06c86a7'
order by login_date desc 
limit 1

                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.41..0.44 rows=1 width=8) (actual time=0.036..0.036 rows=1 loops=1)
   Buffers: shared hit=4
   ->  Index Only Scan using user_login_date on user_login_record  (cost=0.41..30.06 rows=1077 width=8) (actual time=0.034..0.035 rows=1 loops=1)
         Index Cond: (user_id = '952bd155-06b3-4792-82ec-4b86d06c86a7'::bpchar)
         Heap Fetches: 0
         Buffers: shared hit=4
 Planning Time: 0.067 ms
 Execution Time: 0.049 ms
(8 rows)

结论

两个都利用了索引的 login_date 字段，只取了最后一条参与计算，但是max多了一个循环，速度略差。

比较 II（where中包含三个字段）

两个字读的状况我也测试比较过，和一个字段情形相同，max 查询也利用了索引，只取了最后一条，也是多了一个循环。单独拿出来3个字段，是因为3个字段的结果和1-2个字段不同。

环境

table

create table user_login_record(
  group_id       char(36) not null,
  user_id         char(36) not null,
  machine_id   char(32) not null
  login_date    timestamp  not null
);

上面这个表有4个字段，相比之前多了group_id, machine_id，可以理解为用户在某个组内用某台机器登陆的记录。我们根据 group_id, user_id和machine_id 来查询最后一个 login_date

依旧直接建立索引

1	create index user_login_date on user_login_record (group_id, user_id, machine_id, login_date desc);

比较结果

max

explain (analyze, buffers, costs) 
select max(login_date) 
from user_login_record 
where group_id='312bb069-fd27-4822-885d-c3ac67bfd8a1' 
  and user_id='952bd155-06b3-4792-82ec-4b86d06c86a7' 
  and machine_id='C635F8F44F1960778CE58869DF10150D';

                                                                                                   QUERY PLAN                                                                                                   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2.84..2.85 rows=1 width=8) (actual time=0.143..0.143 rows=1 loops=1)
   Buffers: shared hit=72
   ->  Index Only Scan using user_login_date on user_login_record  (cost=0.41..2.84 rows=1 width=8) (actual time=0.050..0.126 rows=98 loops=1)
         Index Cond: ((group_id = '312bb069-fd27-4822-885d-c3ac67bfd8a1'::bpchar) AND (user_id = '952bd155-06b3-4792-82ec-4b86d06c86a7'::bpchar) AND (machine_id = 'C635F8F44F1960778CE58869DF10150D'::bpchar))
         Heap Fetches: 98
         Buffers: shared hit=72
 Planning Time: 0.139 ms
 Execution Time: 0.163 ms
(8 rows)

order by … desc limit 1

explain (analyze, buffers, costs) 
select login_date
from user_login_record 
where group_id='312bb069-fd27-4822-885d-c3ac67bfd8a1' 
  and user_id='952bd155-06b3-4792-82ec-4b86d06c86a7' 
  and machine_id='C635F8F44F1960778CE58869DF10150D'
order by login_date desc
limit 1

                                                                                                 QUERY PLAN                                                                                                   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.41..2.84 rows=1 width=8) (actual time=0.042..0.042 rows=1 loops=1)
   Buffers: shared hit=4
   ->  Index Only Scan using user_login_date on user_login_record  (cost=0.41..2.84 rows=1 width=8) (actual time=0.041..0.041 rows=1 loops=1)
         Index Cond: ((group_id = '312bb069-fd27-4822-885d-c3ac67bfd8a1'::bpchar) AND (user_id = '952bd155-06b3-4792-82ec-4b86d06c86a7'::bpchar) AND (machine_id = 'C635F8F44F1960778CE58869DF10150D'::bpchar))
         Heap Fetches: 1
         Buffers: shared hit=4
 Planning Time: 0.124 ms
 Execution Time: 0.057 ms
(8 rows)

结论

两个都利用了索引，但利用方式有所不同，max没有利用login_date字段的排序，最后对符合条件的98条数据进行了筛选(rows=98 loops=1)；order by limit 1的sql则依旧利用了索引中的 login_date 字段，只取了一套数据(rows=1 loops=1)。
虽然这次max没多一次循环，但由于多了多条记录，相比另外一个sql性能降低了很多。

最后

随着PostgreSQL的升级（我的测试环境12和14版）以及不同环境的数据库会对sql采取不同优化措施。本文中的测试结果也可能会不同。每个人应该在自己的环境中做实际的测试来选择用那个sql以获得最佳的性能。

PostgreSQL now()函数

发表于 2022-01-05 更新于 2022-01-09

now()函数是postgresql 中用来获取当前时间的函数，需要注意postgresql提供了多个获取当时时间的函数，now只是其中一个。

函数	说明
now()	事务的开始时间，等同于 transaction_timestamp()
transaction_timestamp()	事务的开始时间，等同于now()，同一事务内多次调用返回相同个结果
statement_timestamp()	语句的开始执行时间，同一语句内多次调用结果相同
clock_timestamp()	时钟时间，如果一个SQL语句多次调用（例如有子查询）返回结果可能不同

根据上述说明，如果需要用timestamp字段排序，且顺序非常重要，例如余额变化，然后一个事务中可能有多个insert，用now()就会造成混乱。

可以通过sql测试下这几个函数

begin;
select now(), pg_sleep(0.1), transaction_timestamp() , pg_sleep(0.1), statement_timestamp(),  pg_sleep(0.1), clock_timestamp(), pg_sleep(0.1), statement_timestamp(), clock_timestamp();
select now(), pg_sleep(0.1), transaction_timestamp() , pg_sleep(0.1), statement_timestamp(),  pg_sleep(0.1), clock_timestamp(), pg_sleep(0.1), statement_timestamp(), clock_timestamp();
commit;

在PSQL内执行上面的SQL，会得到如下结果，从结果中可以看到，一个事务内，多个sql的多次now() transaction_timestamp()返回的结果是完全相同的；同一事务内多个语句中的statement_timestamp()返回结果不同，而同一语句中的statement_timestamp()返回结果相同；任意一个clock_timestamp()返回结果都可能不同（pg_sleep(0.1)是让postgresql睡眠0.1秒后继续执行）

tempdb =# begin;
BEGIN
tempdb =*# select now(), pg_sleep(0.1), transaction_timestamp(), pg_sleep(0.1), statement_timestamp(),  pg_sleep(0.1), clock_timestamp(), pg_sleep(0.1), statement_timestamp(), clock_timestamp();
              now              | pg_sleep |     transaction_timestamp     | pg_sleep |     statement_timestamp      | pg_sleep |        clock_timestamp        | pg_sleep |     statement_timestamp      |        clock_timestamp        
-------------------------------+----------+-------------------------------+----------+------------------------------+----------+-------------------------------+----------+------------------------------+-------------------------------
 2022-01-09 15:27:28.976989+08 |          | 2022-01-09 15:27:28.976989+08 |          | 2022-01-09 15:27:34.43345+08 |          | 2022-01-09 15:27:34.737544+08 |          | 2022-01-09 15:27:34.43345+08 | 2022-01-09 15:27:34.838665+08
(1 row)

tempdb=*# select now(), pg_sleep(0.1), transaction_timestamp(), pg_sleep(0.1), statement_timestamp(),  pg_sleep(0.1), clock_timestamp(), pg_sleep(0.1), statement_timestamp(), clock_timestamp();
              now              | pg_sleep |     transaction_timestamp     | pg_sleep |      statement_timestamp      | pg_sleep |       clock_timestamp        | pg_sleep |      statement_timestamp      |        clock_timestamp        
-------------------------------+----------+-------------------------------+----------+-------------------------------+----------+------------------------------+----------+-------------------------------+-------------------------------
 2022-01-09 15:27:28.976989+08 |          | 2022-01-09 15:27:28.976989+08 |          | 2022-01-09 15:27:43.321314+08 |          | 2022-01-09 15:27:43.62673+08 |          | 2022-01-09 15:27:43.321314+08 | 2022-01-09 15:27:43.727897+08
(1 row)

tempdb=*# commit;
COMMIT

一次java.lang.OutOfMemoryError异常的排查

发表于 2021-12-31

服务器上抛出了异常，java.lang.OutOfMemoryError，这种异常比较难解决，因为有可能大量占用内存的地方并不是抛出异常的位置，抛异常处可能是正常使用内存，只是内存没了。

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
        at java.base/java.lang.Thread.start0(Native Method)
        at java.base/java.lang.Thread.start(Thread.java:798)
        at me.chanjar.weixin.common.api.WxMessageInMemoryDuplicateChecker.checkBackgroundProcessStarted(WxMessageInMemoryDuplicateChecker.java:81)
        at me.chanjar.weixin.common.api.WxMessageInMemoryDuplicateChecker.isDuplicate(WxMessageInMemoryDuplicateChecker.java:89)
        at me.chanjar.weixin.mp.api.WxMpMessageRouter.isMsgDuplicated(WxMpMessageRouter.java:257)
        at me.chanjar.weixin.mp.api.WxMpMessageRouter.route(WxMpMessageRouter.java:172)
        at me.chanjar.weixin.open.api.impl.WxOpenMessageRouter.route(WxOpenMessageRouter.java:24)
        at me.chanjar.weixin.open.api.impl.WxOpenMessageRouter.route(WxOpenMessageRouter.java:20)
        at cn.devmgr.mall.wechatopen.WechatNotifyController.callback(WechatNotifyController.java:269)
        at jdk.internal.reflect.GeneratedMethodAccessor433.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
          ... ...

首先在抛出这种异常后，我先查看了下服务器的内存（之前已经写好的查询实例内存的接口），返回信息如下：

"memory": {
  "total": 161480704,
  "max": 536870912,
  "free": 50741880
}

这个结果说明了不是内存不足，实际上内存不足的exception一般是 java.lang.OutOfMemoryError: Java heap space，而这次的是 java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
所以判断 process/resource limits reached 引发的。

于是写了段代码查询当前实例的线程数：

1
2
3

for (Thread t : Thread.getAllStackTraces().keySet()) {
  // 统计每个线程的信息
}

发现抛出异常时，waiting状态的线程甚至达到了6000多个，这肯定是有些线程没有结束导致，查询自己的代码，没有搜到new Thread()，应该是调用其他类库不当导致，要找出原因还得写监控程序。

于是开始统计实例那所有的线程堆栈信息（这个异常虽然不容易重现，但可较容易观察到线程数量增常，这给查找原因带来了方便）

for (Thread t : Thread.getAllStackTraces().keySet()) {
    StackTraceElement[] elements = t.getStackTrace();
     //  按照堆栈分组统计各个线程的数量
}

按照堆栈分组，我采用的办法是把每个StackTraceElement的className, methodName, lineNumber拼接成一个字符串，用这个字符串作为key，判断是否属于一组线程。

发现了如下这个堆栈对应的线程数很多，且呈现出随时间增长不减退的趋势：

jdk.internal.misc.Unsafepark(Unsafe.java:-2)
java.util.concurrent.locks.LockSupportpark(LockSupport.java:194)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObjectawait(AbstractQueuedSynchronizer.java:2081)
java.util.concurrent.LinkedBlockingQueuetakeLinked(BlockingQueue.java:433)
java.util.concurrent.ThreadPoolExecutorgetTask(ThreadPoolExecutor.java:1054)
java.util.concurrent.ThreadPoolExecutorrunWorker(ThreadPoolExecutor.java:1114)
java.util.concurrent.ThreadPoolExecutor$Workerrun(ThreadPoolExecutor.java:628)
java.lang.Threadrun(Thread.java:829)

很遗憾的是，从这个堆栈里只能看出被创建了ThreadPoolExecutor，这个堆栈信息里所有类都是JDK的，没有任何第三方代码，还是没法知道是哪的程序。于是想找到线程的调用方，网上搜索到stackoverflow内有一个有趣的问答

https://stackoverflow.com/questions/18999444/how-to-find-out-who-create-a-thread-in-java

在此贴 Aaron Digulla给出了7种方法来查找线程的调用方，7种，引用如下：

Here is the list of approaches, sorted from quickest / most reliable to slowest / hardest:

1. If you have the source of the class, create an exception in the constructor (without actually throwing it). You can simply examine or print it when you need to know when the thread was created.
2. If you don't have the sources, the thread name can be a good hint who created it.
3. If the name hints to a generic service (like java.util.Timer), then you can create a conditional breakpoint in your IDE in the constructor. The condition should be the thread name; the debugger will then stop when someone creates a thread with this name.
4. If you don't have too many threads, set a breakpoint in the constructors of Thread.
5. If you have many threads, attach a debugger to the app and freeze it. Then examine the stack traces.
6. If everything else fails, get the source code for the Java runtime and add logging code in the classes you want to observe, compile a new rt.jar and replace the original one with your version. Don't try this in production, please.
7. If money isn't an issue, you can use dynamic tracing tools like Compuware APM or, if you're on Linux or Solaris, you can try SystemTap and dtrace, respectively.

我采用了第2种，修改了查询所有线程堆栈的地方，增加了线程名字，然后继续监控，发现了线程名就是一个第三方类库的类名，找到它源码，发现里面有new ExecutorService，而我代码多次new了这个类。修改代码，不再多次new它；部署后看监控程序 thread数目不再增长，解决。

附录：此次排查中使用的完整的监控代码


import com.zaxxer.hikari.HikariDataSource;
import com.zaxxer.hikari.HikariPoolMXBean;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import javax.sql.DataSource;
import java.sql.Connection;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

@RestController
@RequestMapping("/service-status")
public class ServiceStatusController {
  private static final Logger logger = LoggerFactory.getLogger(ServiceStatusController.class);

  @Autowired(required = false)
  private DataSource dataSource;

  @Value("${spring.application.name:no_name}")
  private String appName;

  @GetMapping
  public Map<String, Object> getStatus() {
    Map<String, Object> result = new HashMap<>();
    result.put("applicationName", appName);
    Map<String, Object> memory = new HashMap<>();
    Runtime rt = Runtime.getRuntime();
    memory.put("total", rt.totalMemory());
    memory.put("free", rt.freeMemory());
    memory.put("max", rt.maxMemory());
    result.put("memory", memory);
    result.put("availableProcessors", rt.availableProcessors());

    if (dataSource != null) {
      if (dataSource instanceof HikariDataSource) {
        HikariDataSource hds = (HikariDataSource) dataSource;
        if (hds.getHikariPoolMXBean() == null) {
          // 先获取一次才能取得 poolMXBean, 如果程序中有其他请求已经使用过数据库，则不需要这里的获取conn
          try {
            Connection conn = hds.getConnection();
            conn.close();
          } catch (SQLException sqlException) {
            logger.error("cannot get conn", sqlException);
          }
        }
        HikariPoolMXBean pool = hds.getHikariPoolMXBean();
        if (pool != null) {
          Map<String, Object> poolStatus = new HashMap<>();
          poolStatus.put("active", pool.getActiveConnections());
          poolStatus.put("idle", pool.getIdleConnections());
          poolStatus.put("total", pool.getTotalConnections());
          poolStatus.put("awaiting", pool.getThreadsAwaitingConnection());
          poolStatus.put("maximumPoolSize", hds.getMaximumPoolSize());
          poolStatus.put("minimumIdle", hds.getMinimumIdle());
          poolStatus.put("idleTimeout", hds.getIdleTimeout());
          result.put("dataSourcePool", poolStatus);
        }
      } else {
        logger.info("datasource is not hikari");
      }
    } else {
      logger.info("no datasource");
    }

    {
      Map<String, Integer> threadsStatus = new HashMap<>();
      for (Thread t : Thread.getAllStackTraces().keySet()) {
        String key = t.getState().toString().toLowerCase();
        Integer cnt = threadsStatus.get(key);
        if (cnt == null) {
          threadsStatus.put(key, 1);
        } else {
          threadsStatus.put(key, cnt + 1);
        }
      }
      threadsStatus.put("total", Thread.getAllStackTraces().size());
      result.put("threads", threadsStatus);
    }

    return result;
  }

  @GetMapping("/threads")
  public Map<String, ?> getThreadStackTraces() {
    logger.trace("getThreadStackTraces");
    Map<String, List<Map<String, Object>>> result = new HashMap<>();
    Map<String, Map<String, Object>> threadInfoByKey = new HashMap<>();
    for (Thread t : Thread.getAllStackTraces().keySet()) {
      StackTraceElement[] elements = t.getStackTrace();
      List<Map<String, Object>> stackList = new ArrayList<>();
      StringBuilder builder = new StringBuilder();
      int i = 0;
      for (StackTraceElement ele : elements) {
        i++;
        Map<String, Object> map = new HashMap<>();
        map.put("className", ele.getClassName());
        map.put("methodName", ele.getMethodName());
        map.put("lineNum", ele.getLineNumber());
        map.put("fileName", ele.getFileName());
        map.put("module", ele.getModuleName());
        stackList.add(map);

        if (i < 30) {
          builder.append(ele.getClassName());
          builder.append(ele.getMethodName());
        }
      }
      String status = t.getState().toString().toLowerCase();
      String key = builder.toString();
      Map<String, Object> threadInfo = threadInfoByKey.get(key);
      if (threadInfo == null) {
        threadInfo = new HashMap<>();
        threadInfo.put("stack", stackList);
        threadInfoByKey.put(key, threadInfo);
        List<Map<String, Object>> list = result.get(status);
        if (list == null) {
          list = new ArrayList<>();
          result.put(status, list);
        }
        list.add(threadInfo);
      }
      if (threadInfo.containsKey("counter")) {
        threadInfo.put("counter", (Integer) threadInfo.get("counter") + 1);
      } else {
        threadInfo.put("counter", 1);
      }
      if (threadInfo.containsKey("names")) {
        List<String> names = (List<String>) threadInfo.get("names");
        names.add(t.getName());
      } else {
        List<String> names = new ArrayList<>();
        names.add(t.getName());
        threadInfo.put("names", names);
      }
    }
    return result;
  }

}

spring boot中获取连接池使用状况（当前活动连接数等）

发表于 2021-12-07

spring boot (2.2.x) 中默认使用hikariCP作为连接池，配置如下

applicaiton.yml

spring:
  datasource:
    driverClassName: org.postgresql.Driver
    url: jdbc:postgresql://db-server:5433/mydb
    username: pgdbo
    password: sql
    hikari:
      maximum-pool-size: 20 #最多20个连接
      minimum-idle: 5   # 空闲时保持最小连接数
      idle-timeout: 10000  # 空闲连接存活时间
      connection-timeout: 8000 # 连接超时时间
      connection-test-query: select  1  # 测试sql

如果不配置，默认的超时时间10分钟，minimum-dile是10个

如果我们希望监视当前有多少个activeConnection，可以通过如下方法

@Autowired  private DataSource dataSource;
if (dataSource instanceof HikariDataSource) {
   HikariDataSource hds = (HikariDataSource) dataSource;
   if (hds.getHikariPoolMXBean() == null) {
     // 先获取一次才能取得 poolMXBean, 如果程序中有其他请求已经使用过数据库，则不需要这里的获取conn
     try {
       Connection conn = hds.getConnection();
       conn.close();
     } catch (SQLException sqlException) {
       logger.error("cannot get conn", sqlException);
     }
   }
   HikariPoolMXBean pool = hds.getHikariPoolMXBean();
   if (pool != null) {
     Map<String, Object> poolStatus = new HashMap<>();
     poolStatus.put("active", pool.getActiveConnections());
     poolStatus.put("idle", pool.getIdleConnections());
     poolStatus.put("total", pool.getTotalConnections());
     poolStatus.put("awaiting", pool.getThreadsAwaitingConnection());
     poolStatus.put("maximumPoolSize", hds.getMaximumPoolSize());
     poolStatus.put("minimumIdle", hds.getMinimumIdle());
     poolStatus.put("idleTimeout", hds.getIdleTimeout());
     result.put("dataSourcePool", poolStatus);
   }
 }

注意第4到第12行那部分，如果连接池一次也没被执行过（获取过连接），那么getHikariPoolMXBean() 会返回null，无法获取信息，需要手工执行一次getConnection

Redis中存储POJO，POJO增删属性后重新部署，未清除Redis缓存导致的反序列化失败

发表于 2021-12-07

现象

使用spring boot的时候，缓存是常用的服务之一，放在缓存里的数据经常是个pojo，Java类放入缓存默认是通过序列化实现存储的。有时候升级改代码会增删一些属性，如果部署前忘记把相应的缓存先清除一下，就会遇到反序列化失败的异常了，异常信息一般如下：

org.springframework.data.redis.serializer.SerializationException: Cannot deserialize; nested exception is org.springframework.core.serializer.support.SerializationFailedException: Failed to deserialize payload. Is the byte array a result of corresponding serialization for DefaultDeserializer?; nested exception is java.io.InvalidClassException: com.package-of-pojo.Xxxx; local class incompatible: stream classdesc serialVersionUID = -2364286648166609117, local class serialVersionUID = -8974455668551700477
    at org.springframework.data.redis.serializer.JdkSerializationRedisSerializer.deserialize(JdkSerializationRedisSerializer.java:84)
    at org.springframework.data.redis.serializer.DefaultRedisElementReader.read(DefaultRedisElementReader.java:48)
    at org.springframework.data.redis.serializer.RedisSerializationContext$SerializationPair.read(RedisSerializationContext.java:272)
    at org.springframework.data.redis.cache.RedisCache.deserializeCacheValue(RedisCache.java:260)
    at org.springframework.data.redis.cache.RedisCache.lookup(RedisCache.java:94)
    at org.springframework.cache.support.AbstractValueAdaptingCache.get(AbstractValueAdaptingCache.java:58)
    at org.springframework.cache.interceptor.AbstractCacheInvoker.doGet(AbstractCacheInvoker.java:73)
    at org.springframework.cache.interceptor.CacheAspectSupport.findInCaches(CacheAspectSupport.java:554)
    at org.springframework.cache.interceptor.CacheAspectSupport.findCachedItem(CacheAspectSupport.java:519)
    at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:401)
    at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:345)
    at org.springframework.cache.interceptor.CacheInterceptor.invoke(CacheInterceptor.java:61)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:747)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:93)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:747)
    at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:689)

解决方案

遇到这种异常，一般很容易解决，使用 redis-cli keys 'xxx*' | xargs -n 1 redis-cli del 这个命令去把所有的这个缓存查出来并且删了就好了。

但是这是一个很容易出现的问题，每次都手工去避免比较麻烦。是不是能够通过程序实现，如果遇到这种异常，自动清除redis内的对应内容并自动执行对应的方法，不从redis取了呢（例如 Cacheable 注解的方法，执行方法返回结果并将返回内容重新放入缓存服务器），这个思路是可行的，通过配置一个自定义的 CacheErrorHandler来实现。

自定义的CacheErrorHandler


import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.cache.Cache;
import org.springframework.cache.interceptor.CacheErrorHandler;
import org.springframework.core.serializer.support.SerializationFailedException;

/** 此类处理过的异常，spring 不会再次抛出了，除非这里的代码里再次抛出 */
public class CustomCacheErrorHandler implements CacheErrorHandler {
  private static final Logger logger = LoggerFactory.getLogger(CustomCacheErrorHandler.class);

  @Override
  public void handleCacheGetError(RuntimeException e, Cache cache, Object key) {
    logger.error("获取缓存数据时发生异常 cache-name: {}, cache-key:{}", cache.getName(), key, e);
    if (e instanceof SerializationFailedException) {
      logger.warn("序列化失败导致，清除该cache");
      cache.clear();
    }
  }

  @Override
  public void handleCachePutError(RuntimeException e, Cache cache, Object o, Object key) {
    logger.error("handleCachePutError cache-name: {}, cache-key:{}", cache.getName(), key, e);
  }

  @Override
  public void handleCacheEvictError(RuntimeException e, Cache cache, Object key) {
    logger.error("handleCacheEvictError cache-name: {}, cache-key:{}", cache.getName(), key, e);
  }

  @Override
  public void handleCacheClearError(RuntimeException e, Cache cache) {
    logger.error("handleCacheClearError cache-name: {}, cache-key:{}", cache.getName(), e);
  }
}

spring在redis里存储的key都是 User:1111 User:2222 这种类型，其中 User 是cache的name， 1111 / 2222 则是key，当pojo反序列化失败时，所有cache-name相同的条目都失效了，所以17行有 cache.clear() 把整个cache都清除了；如果不写这行，则每个key执行一次，从结果上看也没问题。

注册这个自定义个的CacheErrorHandler


@Configuration
public class CacheErrorHandlerConfig extends CachingConfigurerSupport {
  private static final Logger logger = LoggerFactory.getLogger(CacheErrorHandlerConfig.class);

  @Override
  public CacheErrorHandler errorHandler() {
    return new CustomCacheErrorHandler();
  }
}

PostgreSQL主从+负载均衡

发表于 2021-12-01 更新于 2021-12-24

PostgreSQL的读写分离是通过2部分实现的

主从集群（可以一主多从）
Pgpool-II 代理来分发读写操作到不同服务器

Pgpool-II 功能强大，还可实现一个表/数据库分发到不同服务器上等等，此处不做讨论

环境

三台服务器（可以把以下3个服务安装到一台，此处三台仅仅是为表达清晰）

Pgpool-II ， IP： 192.168.1.10
PostgreSQL 主服务器， IP: 192.168.1.11
PostgreSQL 从服务器, IP: 192.168.1.12

集群环境搭建

安装PostgreSQL

分别在主从服务器（192.168.1.11, 192.168.1.12) 上安装PostgreSQL, apt install postgresql

配置

以下操作主从两个服务器都需要

设置可以从网络访问

修改 postgresql.conf 文件（如果apt安装且版本12，在/etc/postgresql/12/main/目录下）

更改 listen_addresses 修改为*或者自己的IP地址

设置主从复制用户访问

修改 pg_hba.conf，在末尾增加一行：

1	host replication all 192.168.1.1/24 trust

这行是信任本地网络上所有复制用操作的连接

在从服务器上设置

在从服务器（192.168.1.12）上配置复制，修改 postgresql.conf 文件，找到 primary_conninfo, hot_standby, wal_level 并修改他们

1
2
3

primary_conninfo = 'host=192.168.1.11 port=5432 user=postgres password='
hot_standby = on 
wal_level = replica

这几项不设置也行，下面的pg_basebackup命令会在main目录下生成相应的配置

192.168.1.11是主数据库服务器IP，postgres是连接用户名，password是密码，因为我设置了信任所有本地网络用户，所以这里没密码，这行要根据自己的环境修改。

停止从服务器上的postgresql服务

1	systemctl stop postgresql

进入数据库所在目录（如果ubuntu, apt安装12版，在 /var/lib/postgresql/12），并删除main目录，之后运行pg_basebackup命令从主服务器拷贝数据库文件，最后别忘了把新的main目录改成postgresql的用户

cd /var/lib/postgrewsql/12
rm -r main
pg_basebackup -h 192.168.1.11 -p 5432 -U postgres  -Fp -Xs -Pv -R -D ./main
chown -R postgres:postgres main

如果pg_basebackup命令成功，会出现类似提示

pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/4000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_8071"
32514/32514 kB (100%), 1/1 tablespace                                         
pg_basebackup: write-ahead log end point: 0/4000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed

并且在main目录下有一个 standby.signal 文件

之后启用postgresql

1	systemctl start postgresql

这时，可以用psql连接到从服务器，进行以下测试了，正确的结果应该是查询的sql都没问题，更新的sql不能执行了，会提示事务是只读的。
也可以执行select pg_is_in_recovery();应该返回T，表示服务器处于恢复模式

其他配置（和主从无关）

仅仅为主从可忽略此部分

postgresql.conf

max_connections 默认为100，可根据需要增大最大连接数；从服务器的最大连接数不能小于主服务器，否则会启动失败
shared_buffers 在独立的数据库服务器上一般设置为物理内存的1/4
work_mem增大则对单个SQL的排序等效果明显

pg_hba.conf

可根据需要设置允许网络上某个用户连接及连接方式，例如把本地网络连接都允许且设置为trust，则可免去每次都输入密码的麻烦

负载均衡，Pgpool-II 搭建

以下操作都在 Pgpool-II （192.168.1.10）服务器上操作

安装

1	apt install pgpool2

配置

修改 /etc/pgpool2/pgpool.conf 文件

数据库节点配置

默认代理端口号是5433，可根据需要修改

增加2个节点，找到 backend_hostname0部分，按照下面内容修改

backend_hostname0 = '192.168.1.11'
backend_port0 = 5432
backend_weight0 = 1

backend_hostname1 = '192.168.1.12'
backend_port1 = 5432
backend_weight1 = 1

replication_mode = off

load_balance_mode = on

master_slave_mode = on

replication_mode 要设置成off。如果设置on，是由pgpool做复制操作，它会把所有更改SQL发送到每个节点，每个节点都执行一份，当有节点离线时，不会自动在上线后重发，需要设置很多东西，比较麻烦，所以还是用上面的postgresql自己内置的replication机制。

load_balance_mode 可是实现查询操作的负载均衡，如果off，所有sql都在backend0执行，如果on则根据 backend_weight权重比例来分配查询sql

master_slave_mode 指定服务器采用主从模式，backend0为主

pgpool 管理用配置

pgpool通过一系列pcp命令来维护/管理各个backend node（节点），默认管理端口时9898，如需修改在pgpool.conf内修改 pcp_port，也可修改pcp_listen_addresses来允许远程管理

首先需要给pcp管理创建一个用户，通过修改pcp.conf实现，例如增加一行

1	pcp:ac5c74b64b4b8352ef2f181affb5ac2a

则增加了一个用户，用户名pcp，密码sql（密码存储的是md5值）

backend node 的状态

节点的状态维护并不是默认自动的，需要使用pcp命令来管理

除了backend0外，其他新增节点默认是 unused 状态，需要用pcp_attach_node改变状态

1	pcp_attach_node -U pcp -p 9898 -h 127.0.0.1 1

pcp_attach_node 用于把节点状态变为可用，最后一个1是节点序号，在pgpool.conf设置backend时的需要，也可以用sql命令show pool_nodes查到。

成功执行pcp_attach_node后，如果连接节点没问题，节点状态会变更为 up，否则为down

每个down/unused状态不会自动变为up

up也不会自动变为down，这在分发sql时就会出现问题，把sql分发到了已经停止服务的节点不是我们想要的，为了避免需要配置 health check

在pgpool.conf内做如下配置

health_check_period = 10    #单位秒
health_check_timeout = 20
health_check_user = 'postgres'
health_check_pass = ''
health_check_database = 'tempdb'

需要把用户名、密码、数据等配制成自己环境所需，之后重启，pgpool就可以检查每个节点的状态是否依旧在线了，如果已经连接不到，则会自动变成down状态

(down, unused) => up 通过 pcp_attach_node

pgpool 配置中的2个参数

num_init_children

num_init_children 这个项目是启动多少个线程来接收pgpool的客户端（一般是我门开发的程序）的连接，当连接数超过这个值时，客户端就会等待直到有连接释放出来或者超时。

如果不使用pgpool时，直接连接postgresql，与postgresql的max_connections类似（在这个数目内都能正常连接，超过这个数就不正常了，区别是超过这个数后，连接postgresql的自动断了，连接pgpool的会进入等待队列）

默认值：100

如果自己的程序中使用了连接池，那么连接池的最大连接数不要超过这个值。

max_pool

这个有点特殊，和postgresql中没有对应的项目。它是每一个连接（num_init_children的每个线程）可以对应的往pgpool的 backend node的连接池数，如果使用多个数据库，这个一般设置成常用的数据库数目。让每个线程都有到每个数据库的连接池，这样速度最快。但要注意 num_init_children * max_pool 应该小于等于 backend node的max_connections

默认值：4

如果服务器上仅仅有一个数据库是用于生产的，那么改成1是个不错的选择。

常见问题

ERROR: canceling statement due to conflict with recovery

ERROR: canceling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed.

原因

产生这个错误的原因是：sql在从服务器上执行，但执行时间较长，执行过程中从服务器从主服务器同步数据，更改了sql中的数据。

解决办法

以下几个办法都行

优化sql，缩短执行时间，这是最好的办法
设置 hot_standby_feedback = on
增大 max_standby_archive_delay max_standby_streaming_delay 两个参数的值
把查询放到主服务器执行

2和3都有些缺点，2会增加主服务器负担，3会加大从服务器和主服务器之间数据差异。
法4则失去了负载均衡的价值。

java Graphics2D 画一个透明的圆，在现有图片上镂空一部分

发表于 2021-11-10 更新于 2022-10-25

首先这个标题不准确，需要的结果类似下图，注意中间圆形部分是透明的

镂空一个圆形

Graphics2D提供方法clearRect，可以清除出一个矩形区域，但没有clearCircle或者clearArc, clearEllipse等，通过清除的来实现不太可能。

实现思路可从描画部分入手，使用 Graphics2D 的 setClip 方法，在描画前把那个要剪出来的圆排除在外。参照如下代码

Area a = new Area(new Rectangle(0, 0, 1200, 1200));
Ellipse2D circle = new Ellipse2D.Float(100, 100, 1000, 1000);
a.subtract(new Area(circle));
g.setClip(a);
g.setColor(new Color(255, 255, 192));
g.fillRect(0, 0, 1200, 1200);

金保留文字部分，其余部分镂空

把一张图除文字外的部分剔除掉，实现如下图所示效果

也是利用 graphics2d 的 setClip 方法

// g 是新图的graphics2D，目标图（对应上图右侧），bgImg是原图（对应上图左侧）
FontRenderContext frc = g.getFontRenderContext();
Font font = arial.deriveFont(Font.BOLD, 260);
GlyphVector gv = font.createGlyphVector(frc, "10");
Rectangle2D box = gv.getVisualBounds();

// 计算文字轮廓放置位置（中央）
int xOff = (int) (bgImg.getWidth() - box.getWidth()) / 2;
int yOff = (int) (bgImg.getHeight() + box.getHeight()) / 2;
// xOff, yOff是文字轮廓左下角的坐标
Shape shape = gv.getOutline(xOff, yOff);
g.setClip(shape);
g.drawImage(bgImg, 0, 0, null);
g.setClip(null);

// 上面的例子图形文字有边框，如果需要，可用如下方式描画边框
g.setStroke(new BasicStroke(1f));
g.setColor(Color.RED);
g.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
g.draw(shape);