问题描述

使用StackExchange.Redis 作为Redis客户端SDK,连接Azure Redis服务,长期运行后发现,每天都偶发 Timeout Error。

错误消息如下:

 

  • StackExchange.Redis.RedisTimeoutException: Timeout performing HGETALL (15000ms), next: HGETALL new_town, inst: 0, qu: 0, qs: 17, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: xxxxxxxx.redis.cache.chinacloudapi.cn:6380, mc: 1/1/0, mgr: 10 of 10 available, clientName: xxxxxxxxxxxx, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=17,Free=8174,Min=2,Max=8191), v: 2.1.30.38891 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

 

  •  ERROR log - Timeout performing HSET (15000ms), next: HGET token, inst: 1, qu: 0, qs: 35, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: xxxxxxxx.redis.cache.chinacloudapi.cn:6380, mc: 1/1/0, mgr: 10 of 10 available, clientName: xxxxxxxxxxxx, IOCP: (Busy=0,Free=1000,Min=50,Max=1000), WORKER: (Busy=29,Free=8162,Min=100,Max=8191), v: 2.1.30.38891 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

 

  •  ERROR log - Timeout performing EXPIRE (15000ms), next: HGET token, inst: 0, qu: 0, qs: 35, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: xxxxxxxx.redis.cache.chinacloudapi.cn:6380, mc: 1/1/0, mgr: 10 of 10 available, clientName: xxxxxxxxxxxx, IOCP: (Busy=0,Free=1000,Min=50,Max=1000), WORKER: (Busy=29,Free=8162,Min=100,Max=8191), v: 2.1.30.38891 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

 

排查方向

在第一个错误中,可以发现 WORKER 的Busy 数量 远大于 Min 数量 WORKER: (Busy=17,Free=8174,Min=2,Max=8191),,所以可以通过设置Worker/IOCP的线程数来解决这个问题。详细的说明见文档:https://docs.azure.cn/zh-cn/azure-cache-for-redis/cache-management-faq#recommendation

 

而之后继续出现Timeout 问题,这需要从如下几个方面优化:

1)查看慢指令(slowlogs) : 

there are slowlogs like HGET, HGETALL, HSCAN on this cache. Some commands are more expensive than others to execute, depending on their complexity. Because Redis is a single-threaded server side system, the time needed to run some more time expensive commands may cause some latency or timeouts on client side, as server can be busy dealing with these expensive commands.
Please refer Troubleshoot Azure Cache for Redis latency and timeouts | Microsoft Learn

 

2)查看客户端CPU及网络带宽

Check client host CPU or Network bandwidth. Please refer https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-troubleshoot-timeouts#high-cpu-on-client-hosts

 

3)查看大键值(bigkeys)

  • Optimize your application for a large number of small values, rather than a few large values.
  • The preferred solution is to break up your data into related smaller values.

Please refer: https://docs.azure.cn/zh-cn/azure-cache-for-redis/cache-troubleshoot-timeouts#large-key-value

 

4)升级Azure Redis到更高的定价层

5)Additional suggestion:
The memory reservations are not configured properly: Maxmemory-reserved and Maxfragmentationmemory-reserved have only set 50 MB each. Recommend to update the maxmemory-reserved and maxfragmentationmemory-reserved [atleast equal to 10% of the cache size].
For more details refer : Best practices for memory management - Azure Cache for Redis | Microsoft Learn