【Spring Cloud Alibaba 温故而知新】(五)SpringCloud Sleuth + Zipkin:分布式日志追踪 | Eddie'Blog
【Spring Cloud Alibaba 温故而知新】(五)SpringCloud Sleuth + Zipkin:分布式日志追踪

【Spring Cloud Alibaba 温故而知新】(五)SpringCloud Sleuth + Zipkin:分布式日志追踪

eddie 417 2021-11-03

目录

8.1.1 SpringCloud Sleuth 是什么

  • SpringCloud Sleuth 必知必会
    • SpringCloud Sleuth 实现的功能是:它会自动为当前应用构建起各通信通道的跟踪机制
      1. 通过诸如 RabbitMQ、Kafka(或者其他任何 SpringCloud Sleuth 绑定器实现的消息中间件)传递的请求
      2. 通过 Zuul、Gateway 代理传递的请求
      3. 通过 RestTemplate 发起的请求
  • SpringCloud Sleuth 跟踪实现原理
    • 为了实现请求跟踪:当请求发送到分布式系统的入口端点时,只需要服务跟踪框架为该请求创建一个唯一的跟踪标识 Trace ID
    • 为了统计各处理单元的时间延迟,当请求到达各个服务组件时,或是处理逻辑到达某个状态时,也通过一个唯一标识来标记它的开始、具体过程以及结束,Span ID
graph LR A[Service A - traceId, spanIdA] --> B(Service B - traceId, spanIdA1) A --> C(Service C - traceId, spanIdA12) B --> D(Service D - traceId, spanIdA11)

Span ID 如果要计算时间延迟,可以通过 spanIdA11 - spanIdA1

8.1.2 Zipkin 是什么

  • Zipkin 的基础概念
    • Zipkin 解决微服务架构中的延迟问题,包括数据的收集、存储、查找和展现
    • Zipkin 有四大核心组件构成
      1. Collector:收集器组件
      2. Storge:存储组件
      3. API:RESTFul API,提供外部访问接口
      4. UI:Web UI,提供可视化查询页面

8.2.1 集成 SpringCloud Sleuth 实现微服务通信跟踪

8.2.1.1 集成步骤

  • 保证服务与服务之间存在跨进程通信
  • Maven 依赖

8.2.1.2 编写测试代码

sca-commerce-gateway 与 sca-commerce-alibaba-nacos-client 添加 Maven 依赖

<!-- 通过 Sleuth 实现链路跟踪 -->
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>

sca-commerce-alibaba-nacos-client 编写测试业务代码与控制层代码

SleuthTraceInfoService

package com.edcode.commerce.service;

import brave.Tracer;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;

/**
 * @author eddie.lee
 * @blog blog.eddilee.cn
 * @description 使用代码更直观的看到 Sleuth 生成的相关跟踪信息
 */
@Slf4j
@Service
@RequiredArgsConstructor
public class SleuthTraceInfoService {

    /** brave.Tracer 跟踪对象 */
    private final Tracer tracer;

    /**
     * 打印当前的跟踪信息到日志中
     */
    public void logCurrentTraceInfo() {

        log.info("Sleuth trace id: [{}]", tracer.currentSpan().context().traceId());
        log.info("Sleuth span id: [{}]", tracer.currentSpan().context().spanId());
    }
}

SleuthTraceInfoController

package com.edcode.commerce.controller;

import com.edcode.commerce.service.SleuthTraceInfoService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

/**
 * @author eddie.lee
 * @blog blog.eddilee.cn
 * @description 打印跟踪信息
 */
@Slf4j
@RestController
@RequestMapping("/sleuth")
@RequiredArgsConstructor
public class SleuthTraceInfoController {

    private final SleuthTraceInfoService traceInfoService;

    /**
     * 打印日志跟踪信息
     */
    @GetMapping("/trace-info")
    public void logCurrentTraceInfo() {
        traceInfoService.logCurrentTraceInfo();
    }
}

8.2.1.2 测试请求与查看控制台日志

发起请求

### 查看 Sleuth 跟踪信息
GET http://127.0.0.1:9001/edcode/scacommerce-nacos-client/sleuth/trace-info
Accept: application/json
sca-commerce-user: eyJhbGciOiJSUzI1NiJ9.eyJzY2EtY29tbWVyY2UtdXNlciI6IntcImlkXCI6MTEsXCJ1c2VybmFtZVwiOlwiZWRkaWVAcXEuY29tXCJ9IiwianRpIjoiZjQ3M2NhZjctY2RjMi00ZmE4LWExNzQtZjZhYmQ5ZDFjMzAzIiwiZXhwIjoxNjM1ODY4ODAwfQ.iTtQE2gHzjPxVP5SEFHrDBkvrzI-yt6oy-w1x--Q3ahhTvYLTiYnvndtIx7IIyYipr_ayZnAQyluPt3oiLaS80r9qByaN3zQF-6gBW_wu_fd0yd89hIjPnQeP1mY2NcchV2FaMUW7Jlq8CUDPurEhW4GUDXOqBXgmxai5UTu4yoXBUfyXUXznKTx697cGo5aoVKTAKvMReJg-77n5sQuafZNDu6pz2D1KMvEucNyZtbXw0JRIl1CsK777Jt3IG1bnOnwRBt8o1tkodZ3zJbfgTGVCHJmfEuUnXwdf4DLAq568pNVvylPLh4_r-UUGGxE6Az9XwOtl1w4vzK1M2ATzw
token: edcode

响应信息

GET http://127.0.0.1:9001/edcode/scacommerce-nacos-client/sleuth/trace-info

HTTP/1.1 200 OK
transfer-encoding: chunked
Content-Type: application/json
Date: Tue, 02 Nov 2021 13:04:55 GMT

{
  "code": 0,
  "message": "",
  "data": null
}

Response code: 200 (OK); Time: 1160ms; Content length: 35 bytes

查看日志

sca-commerce-gateway

2021-11-02 21:04:55.332  INFO [sca-commerce-gateway,353ea734cc43d6ee,353ea734cc43d6ee,true] 1060 --- [ctor-http-nio-2] c.netflix.config.ChainedDynamicProperty  : Flipping property: sca-commerce-nacos-client.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2021-11-02 21:04:55.347  INFO [sca-commerce-gateway,353ea734cc43d6ee,353ea734cc43d6ee,true] 1060 --- [ctor-http-nio-2] c.netflix.loadbalancer.BaseLoadBalancer  : Client: sca-commerce-nacos-client instantiated a LoadBalancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=sca-commerce-nacos-client,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
2021-11-02 21:04:55.353  INFO [sca-commerce-gateway,353ea734cc43d6ee,353ea734cc43d6ee,true] 1060 --- [ctor-http-nio-2] c.n.l.DynamicServerListLoadBalancer      : Using serverListUpdater PollingServerListUpdater
2021-11-02 21:04:55.372  INFO [sca-commerce-gateway,353ea734cc43d6ee,353ea734cc43d6ee,true] 1060 --- [ctor-http-nio-2] c.netflix.config.ChainedDynamicProperty  : Flipping property: sca-commerce-nacos-client.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2021-11-02 21:04:55.374  INFO [sca-commerce-gateway,353ea734cc43d6ee,353ea734cc43d6ee,true] 1060 --- [ctor-http-nio-2] c.n.l.DynamicServerListLoadBalancer      : DynamicServerListLoadBalancer for client sca-commerce-nacos-client initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=sca-commerce-nacos-client,current list of Servers=[192.168.3.192:8000],Load balancer stats=Zone stats: {unknown=[Zone:unknown;	Instance count:1;	Active connections count: 0;	Circuit breaker tripped count: 0;	Active connections per server: 0.0;]
},Server stats: [[Server:192.168.3.192:8000;	Zone:UNKNOWN;	Total Requests:0;	Successive connection failure:0;	Total blackout seconds:0;	Last connection made:Thu Jan 01 08:00:00 CST 1970;	First connection made: Thu Jan 01 08:00:00 CST 1970;	Active Connections:0;	total failure count in last (1000) msecs:0;	average resp time:0.0;	90 percentile resp time:0.0;	95 percentile resp time:0.0;	min resp time:0.0;	max resp time:0.0;	stddev resp time:0.0]
]}ServerList:com.alibaba.cloud.nacos.ribbon.NacosServerList@72186c8f
2021-11-02 21:04:55.592  INFO [sca-commerce-gateway,353ea734cc43d6ee,353ea734cc43d6ee,true] 1060 --- [ctor-http-nio-2] c.e.c.filter.GlobalElapsedLogFilter      : [/edcode/scacommerce-nacos-client/sleuth/trace-info] elapsed: [1034ms]
2021-11-02 21:04:56.358  INFO [sca-commerce-gateway,,,] 1060 --- [erListUpdater-0] c.netflix.config.ChainedDynamicProperty  : Flipping property: sca-commerce-nacos-client.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647

sca-commerce-alibaba-nacos-client

2021-11-02 21:04:55.543  INFO [sca-commerce-nacos-client,353ea734cc43d6ee,c85be2c1bb127558,true] 33740 --- [nio-8000-exec-1] c.e.c.service.SleuthTraceInfoService     : Sleuth trace id: [3836687777773377262]
2021-11-02 21:04:55.543  INFO [sca-commerce-nacos-client,353ea734cc43d6ee,c85be2c1bb127558,true] 33740 --- [nio-8000-exec-1] c.e.c.service.SleuthTraceInfoService     : Sleuth span id: [-4009361721548180136]

解析:[sca-commerce-nacos-client,353ea734cc43d6ee,c85be2c1bb127558,true]
第一行:service name
第二行:trace id
第三行:span id

8.3.1 搭建 Zipkin Server 实现对跟踪信息的收集

8.3.1.1 ZS搭建步骤

  • Tips:SpringCloud Finchley 版本(包含)之后,官方不建议自己搭建 Zipkin-Server,提供了已经打包好的jar文件(SpringBoot工程),直接下载启动即可
  • 下载地址
    1. 选择自己需要的版本即可
    2. 选择 *.exec.jar 结尾的 jar

8.3.1.2 Linux 终端

[root@localhost opt]# curl -sSL https://zipkin.io/quickstart.sh | bash -s 
Thank you for trying Zipkin!
This installer is provided as a quick-start helper, so you can try Zipkin out
without a lengthy installation process.

Fetching version number of latest io.zipkin:zipkin-server release...
Latest release of io.zipkin:zipkin-server seems to be 2.23.4

Downloading io.zipkin:zipkin-server:2.23.4:exec to zipkin.jar...
> curl -fL -o 'zipkin.jar' 'https://repo1.maven.org/maven2/io/zipkin/zipkin-server/2.23.4/zipkin-server-2.23.4-exec.jar'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 59.0M  100 59.0M    0     0  34146      0  0:30:14  0:30:14 --:--:-- 33309

Verifying checksum...
> curl -fL -o 'zipkin.jar.md5' 'https://repo1.maven.org/maven2/io/zipkin/zipkin-server/2.23.4/zipkin-server-2.23.4-exec.jar.md5'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    32  100    32    0     0     17      0  0:00:01  0:00:01 --:--:--    17
> md5sum -c <<< "$(cat zipkin.jar.md5)  zipkin.jar"
zipkin.jar: OK
Checksum for zipkin.jar passes verification

Verifying GPG signature of zipkin.jar...
> curl -fL -o 'zipkin.jar.asc' 'https://repo1.maven.org/maven2/io/zipkin/zipkin-server/2.23.4/zipkin-server-2.23.4-exec.jar.asc'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   833  100   833    0     0     91      0  0:00:09  0:00:09 --:--:--   180

GPG signing key is not known, skipping signature verification.
Use the following commands to manually verify the signature of zipkin.jar:

    gpg --keyserver keyserver.ubuntu.com --recv FF31B515
    # Optionally trust the key via 'gpg --edit-key FF31B515', then typing 'trust',
    # choosing a trust level, and exiting the interactive GPG session by 'quit'
    gpg --verify zipkin.jar.asc zipkin.jar


You can now run the downloaded executable jar:

    java -jar zipkin.jar

[root@localhost opt]# nohup java -jar zipkin.jar &
[1] 30238
[root@localhost opt]# nohup: ignoring input and appending output to ‘nohup.out’

8.3.1.3 Zipkin Web UI

访问:IP:9411
在这里插入图片描述

8.3.2 配置 Zipkin Server 实现对跟踪信息的收集

在这里插入图片描述

8.4.1 SpringCloud Sleuth 整合 Zipkin 实现分布式链路跟踪、收集

  • SpringCloud Sleuth 整合 Zipkin 步骤

    • 简单的两个步骤(Zipkin Server 使用 MySQL 实现跟踪数据持久化)
      • Maven 依赖
      • bootstrap.yml 中增加 Zipkin 的配置
  • 下载、安装 Kafka

8.4.1.1 下载与解压 Kafka

Downloads

https://kafka.apache.org/downloads
在这里插入图片描述

Linux 步骤

[root@localhost opt]# wget https://dlcdn.apache.org/kafka/3.0.0/kafka_2.13-3.0.0.tgz
[root@localhost opt]# tar -zxf kafka_2.13-3.0.0.tgz 
[root@localhost opt]# ls -la | grep kafka
drwxr-xr-x.  8 root root       134 Nov  3 01:48 kafka_2.13-3.0.0
-rw-r--r--.  1 root root  86396520 Sep 20 04:46 kafka_2.13-3.0.0.tgz

8.4.1.2 启动 zookeeper 与 Kafka

对外访问需要修改 Kafka 的 server.properties

找到 advertised.listeners 或者 自行添加 advertised.listeners 指定 IP 地址

[root@localhost kafka_2.13-3.0.0]# vim /opt/kafka_2.13-3.0.0/config/server.properties

advertised.listeners=PLAINTEXT://192.168.3.250:9092
后台启动 zookeeper
[root@localhost kafka_2.13-3.0.0]# nohup /opt/kafka_2.13-3.0.0/bin/zookeeper-server-start.sh config/zookeeper.properties &
[1] 31998
[root@localhost kafka_2.13-3.0.0]# nohup: ignoring input and appending output to ‘nohup.out’
后台启动 kafka
[root@localhost kafka_2.13-3.0.0]# nohup /opt/kafka_2.13-3.0.0/bin/kafka-server-start.sh config/server.properties &
[1] 32574
[root@localhost kafka_2.13-3.0.0]# nohup: ignoring input and appending output to ‘nohup.out’

8.4.1.3 运行 ZipKin 关联 Kafka 与 MySQL

[root@localhost opt]# nohup java -DKAFKA_BOOTSTRAP_SERVERS=127.0.0.1:9092 -jar zipkin.jar --STORAGE_TYPE=mysql --MYSQL_USER=root --MYSQL_PASS=123456 --MYSQL_HOST=127.0.0.1 --MYSQL_TCP_PORT=3306 --MYSQL_DB=zipkin &
[1] 601
[root@localhost opt]# nohup: ignoring input and appending output to ‘nohup.out’

连接kafka:-DKAFKA_BOOTSTRAP_SERVERS=127.0.0.1:9092

连接MySQL:--STORAGE_TYPE=mysql --MYSQL_USER=root --MYSQL_PASS=123456 --MYSQL_HOST=127.0.0.1 --MYSQL_TCP_PORT=3306 --MYSQL_DB=zipkin

8.4.1.4 检查 Linux 服务是否启动

[root@localhost opt]# ps -aux | grep -E 'nacos|zipkin|kafka|zookeeper'  
[root@localhost opt]# netstat -ltnp | grep -E '8848|9092|9411|2181' 
tcp6       0      0 :::8848                 :::*                    LISTEN      1932/java           
tcp6       0      0 :::9411                 :::*                    LISTEN      601/java            
tcp6       0      0 :::9092                 :::*                    LISTEN      32574/java          
tcp6       0      0 :::2181                 :::*                    LISTEN      31998/java 

8.4.1.4 IDEA 启动服务与测试发起请求

Maven 依赖 (zipkin 与 kafka)

<!-- 通过 Sleuth 实现链路跟踪 -->
<!--        <dependency>-->
<!--            <groupId>org.springframework.cloud</groupId>-->
<!--            <artifactId>spring-cloud-starter-sleuth</artifactId>-->
<!--        </dependency>-->
<!-- zipkin = spring-cloud-starter-sleuth + spring-cloud-sleuth-zipkin-->
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.kafka</groupId>
    <artifactId>spring-kafka</artifactId>
    <version>2.5.0.RELEASE</version>
</dependency>

修改 Gateway 与 Nacos-Client 服务的配置文件

spring:
  kafka:
    bootstrap-servers: ${KAFKA_SERVER:127.0.0.1}:${KAFKA_PORT:9092}
    producer:
      retries: 3
    consumer:
      auto-offset-reset: latest
  zipkin:
    sender:
      type: ${ZIPKIN_KAFKA_SENDER:web} # 默认是 web
    base-url: http://${ZIPKIN_URL:localhost}:${ZIPKIN_PORT:9411}/

两个服务都需要添加 kafka 与 zipkin 的连接信息

启动服务

  • 启动以下服务
    • AuthorityCenterApplication :7000/ # 如果 token 过期,需要重新签发
    • NacosClientApplication :8000/
    • GatewayApplication :9001/

请求测试

sca-commerce-gateway/src/main/resources/http/nacos-client.http

### 查询服务
GET http://127.0.0.1:9001/edcode/scacommerce-nacos-client/nacos-client/service-instance?serviceId=sca-commerce-gateway
Accept: application/json
sca-commerce-user: eyJhbGciOiJSUzI1NiJ9.eyJzY2EtY29tbWVyY2UtdXNlciI6IntcImlkXCI6MTEsXCJ1c2VybmFtZVwiOlwiZWRkaWVAcXEuY29tXCJ9IiwianRpIjoiMWU1MGI2ZWYtNmUzOS00YmY2LWJlMjktZDc4NWU3NWQyNmY1IiwiZXhwIjoxNjM1OTU1MjAwfQ.P7GxZuMUrgiMUbD4dNYzQiV3A6YkaFpvlzg8cpBdu_hvxqDsVEuuiYODQSzZPQeN3xTQPbJ70zkSY084HV7Vsk929en1lqNiX_dpQEuGSbz2JSPqyJuLZ6v7hRX9GI32sPrZAnaKVXMdeHUXCMMmaS1L3osimSvAlaoDE0n2UukDLgu83xRlL3bddHIJbmFD5BrV6Y-u9d-blqXPOpxFEYkdwS_XrljYiULTH7Olr71TAwODUPdttnmVhHPXB0_dnOG5DZMOC0OxqokHGZJ7CC86paE4TvdNPwqotB6u6zh_d_YCCBWM3t1LmKYB6E_bnz2taL5Q4AYHlRaZZotaAA
token: edcode ## HeaderTokenGatewayFilter

###

8.4.1.5 Zipkin Web UI

如何简单的使用

打开 http://192.168.3.250:9411

在这里插入图片描述
默认 all,然后直接查找,会显示所有的请求信息,点击其中一条

在这里插入图片描述就会看到该请求的所有经过哪些服务,耗时多少

同样,也可以在终端拿 trace Id:669b59f38adf2c38 去跟踪链路

在这里插入图片描述trace Id 搜索框:669b59f38adf2c38

在这里插入图片描述

如何查看服务之间的依赖关系

点击上方的【依赖】
在这里插入图片描述

8.5.1 Spring Cloud Sleuth 设置采样率、抽样收集策略

8.5.1.1 Spring Cloud Sleuth 采样收集

  • 收集跟踪信息是一把双刃剑,需要做好权衡
    • 收集的跟踪信息越多,越能反映出系统的实际运行情况
    • 高并发场景下,大量的请求调用会产生海量的跟踪日志信息,性能开销太大

开发与测试环境可以使用高的采样率,但是生产环境建议不要这么做。

  • 可以自由选择 Zipkin brave 自带的两个抽样策略
    • ProbabilityBasedSampler 采样率策略
      • 默认使用的策略,以请求百分比 的方式配置和手机跟踪信息:它的默认值为 0.1,代表手机 10% 的请求跟踪信息
      • spring.sleuth.sampler.probability=0.5
    • RateLimitingSampler 抽样策略
      • 限速采集,也就是说它可以用来限制每秒追踪请求的最大数量,优先级更高
      • spring.sleuth.sampler.rate=10 ## 一秒最大只有10个跟踪策略给采集

8.5.1.2 bootstrap.yml 配置 Sleuth

sca-commerce-alibaba-nacos-client

spring:
  sleuth:
    sampler:
      # RateLimitingSampler 抽样策略,设置了限速采样,spring.sleuth.sampler.probability 属性值无效
      rate: 100 # 每秒间隔接受的 trace 量
      # Probability 抽样策略
      probability: 1.0 # 采样比例,1.0 表示 100%, 默认:0.1

8.5.1.3 代码配置 Sleuth

package com.edcode.commerce.sampler;

import brave.sampler.RateLimitingSampler;
import brave.sampler.Sampler;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * @author eddie.lee
 * @blog blog.eddilee.cn
 * @description 使用配置的方式设定抽样率 (二选一)
 */
@Configuration
public class SamplerConfig {

    /**
     * 限速采集(推荐)
     */
    @Bean
    public Sampler sampler() {
        return RateLimitingSampler.create(100);
    }

//    /**
//     * 概率采集, 默认的采样策略, 默认值是 0.1
//     */
//    @Bean
//    public Sampler defaultSampler() {
//        return ProbabilityBasedSampler.create(0.5f);
//    }
}

代码Yaml文件配置是二选一,方便简洁肯定是 Yaml配置

限速采集概率采集无论是代码还是 Yaml文件 都是二选一

8.6.1 SpringCloud Sleuth+Zipkin 分布式日志追踪总结

8.6.1.1 SpringCloud Sleuth+Zipkin 逻辑架构图

  • 跟踪、收集所涉及的三个组件(模块)Sleuth、Zipkin、Brave
  • 三个组件之间的关系
    • Brave 是一个 tracer 库,提供的是 tracer 接口
    • Sleuth 采用了 Brave 作为 tracer 库
    • Sleuth 可以不使用 Zipkin

在这里插入图片描述

8.6.1.2 Brave 解读

  • Brave 的两个最基本、也是最核心的概念

    • trace:以看作是一个逻辑执行过程中的整个链条 (可以看作一棵树)
    • span:是 trace 跟踪的基本单位
  • Brave 中常用的数据结构以及说明

    • Tracing:工具类,用于生成 Tracer 类实例
    • Tracer:也是工具类,用于生成 Span
    • Span:实际记录每个功能块执行信息的类
    • TraceContext:记录 trace 的执行过程中的元数据信息类
    • Propagation:用于在分布式环境或者跨进程条件下的 trace 跟踪时实现 TraceContext 传递的工具类

8.6.1.2 SpringCloud Sleuth 如何实现跨服务Trace 追踪

  • SpringCloud Sleuth 实现跨服务 Trace 追踪
    • SpringCloud Sleuth 和 Brave 提供了很多不同的分布式框架的支持,例如 gRPC、Kafka、HTTP等

# SpringCloud