Cloud Native Logging Notes

/ dev

Cloud Native Logging

应用产生稳定消息流,用于描述其在给定时间内的行为。这些日志捕获了系统中的各种事件,比如操作成功、失败,健康信息等待。日志工具收集、存储、分析这些日志。

收集、存储和分析日志是构建现代平台的关键部分,一些日志工具处理从收集到分析的各个方面,而有一些则专注处理某一方面。

由于容器环境的出现,云原生环境中的日志处理和传统日志处理相差很大。而云原生的日志处理工具仅 Fluentd

BuzzwordsCNCF Projects
LoggingFluentd (graduated)

FluentBit

Fluentd

/FluentdFluent Bit
ScopeContainers / ServersEmbedded Linux / Containers / Servers
LanguageC & RubyC
Memory~40MB~650KB
PerformanceHigh PerformanceHigh Performance
DependenciesBuilt as a Ruby Gem, it requires a certain number of gems.Zero dependencies, unless some special plugin requires them.
PluginsMore than 1000 plugins availableAround 70 plugins available
LicenseAPACHE LICENSE V2APACHE LICENSE V2

Concepts

Event or Record

FluentBit收集到的每一条日志都是一个事件或记录。

每个事件都包含2个部分,时间戳和消息:

1
[TIMESTAMP, MESSAGE]
1
2
3
4
Jan 18 12:52:16 flb systemd[2222]: Starting GNOME Terminal Server
Jan 18 12:52:16 flb dbus-daemon[2243]: [session uid=1000 pid=2243] Successfully activated service 'org.gnome.Terminal'
Jan 18 12:52:16 flb systemd[2222]: Started GNOME Terminal Server.
Jan 18 12:52:16 flb gsd-media-keys[2640]: # watch_fast: "/org/gnome/terminal/legacy/" (establishing: 0, active: 0)

Filtering

对事件内容做修改的过程称为过滤。

Tag

由FluentBit收集的每一个事件都会添加标签,大部分标签都是通过配置,如果不手动配置,那么会以InputPlugin的名称自动给事件添加标签。

Structured Messages

FluentBit将事件处理成结构化数据:MessagePack,可以理解成压缩过的JSON

1
2
3
4
# No structured message
"Project Fluent Bit created on 1398289291"
# Structured message
{"project": "Fluent Bit", "created": 1398289291}

Buffering & Storage

FluentBit同时支持内存和文件系统两种缓冲方式,来保证性能和数据安全。

Chunks, Memory, Filesystem and Backpressure

Chunks

收集到的日志按照 Chunk 进行分组,一个 Chunk 约2M大小,默认所有的 Chunk 都在内存中创建。

Buffering and Memory

如果只配置内存缓冲,在输出有压力或者网络不佳的情况下可能导致内存占用暴涨,这个时候 InputPlugin 的 mem_buf_limit 配置可以对采集数据的内存进行限制,如果超过了,则会等待内存中的数据被成功处理掉才会继续采集(可能导致数据丢失和延迟)。

mem_buf_limit 主要用于限制内存占用和保证服务存活。

Filesystem buffering to the rescue

当启用文件系统缓冲,在 Chunk 创建的时候会同时在文件系统创建一个状态为 up 的备份。

FluentBit默认内存中的 Chunk 数量为 128,通过 storage.max_chunks_up 来进行配置。状态为 upChunk 随时准备被处理,所有状态为 down 的数据都在文件系统。

当InputPlugin同时配置了 mem_buf_limitstorage.typefilesystem ,当内存使用超过限制,所有的数据都会被存储在文件系统中。

Limiting Filesystem space for Chunks

存储在文件系统的 Chunk 根据不同的 Output Plugin输出到不同的地方。
Chunk 只有一份,不同的OutputPlugin只是不同的引用,可以通过outputPlugin的 storage.total_limit_size 来限制文件系统中 Chunk 数量。

Memory Estimating

1
(input mem_buf_limit + output) * 1.2

Configuration

Units

SuffixDescriptionExample
/When a suffix is not specified, it’s assumed that the value given is a bytes representation.Specifying a value of 32000, means 32000 bytes
k, K, KB, kbKilobyte: a unit of memory equal to 1,000 bytes.32k means 32000 bytes.
m, M, MB, mbMegabyte: a unit of memory equal to 1,000,000 bytes1M means 1000000 bytes
g, G, GB, gbGigabyte: a unit of memory equal to 1,000,000,000 bytes1G means 1000000000 bytes

Commands

CommandPrototypeDescription
@INCLUDE@INCLUDE FILEInclude a configuration file
@SET@SET KEY=VALSet a configuration variable
1
2
3
4
5
6
7
8
9
10
11
@SET my_input=cpu
@SET my_output=stdout

[SERVICE]
Flush 1

[INPUT]
Name ${my_input}

[OUTPUT]
Name ${my_output}

Record Accessor

1
2
3
4
5
6
7
8
9
10
11
{
"log": "some message",
"stream": "stdout",
"labels": {
"color": "blue",
"unset": null,
"project": {
"env": "production"
}
}
}
FormatAccessed Value
$log“some message”
$labels[‘color’]“blue”
$labels[‘project’][‘env’]“production”
$labels[‘unset’]null
$labels[‘undefined’]

Inputs

FluentBit支持多种日志收集,同时也支持cpu、硬盘等硬件指标收集。

Systemd

KeyDescriptionDefault
Pathjournal文件夹,不设置则使用默认路径
Systemd_Filter过滤日志,_SYSTEMD_UNIT=td-agent-bit.service 可以设置多个
Systemd_Filter_TypeSystemd_Filter设置多个之后匹配规则,可选 And OrOr
DBJournald 指针存储位置
Read_From_Tail是否从尾部读取日志Off

Tail

KeyDescriptionDefault
Path文件位置,逗号隔开配置多个
DB存储文件offset

Filters

Rewrite Tag

FluentBit可以将收集到的日志重写标签,生成新的日志,同时可以配置是否保留修改前的日志。

KeyDescription
RuleKEY REGEX NEW_TAG KEEP 需要匹配的key,匹配value的正则,新标签名称,是否保留原有日志
Emitter_Mem_Buf_Limit内存占用配置,默认10M
Emitter_Storage.type存储类型,memory or filesystem
Emitter_Name执行重写操作的是FluentBit的一个内部插件,这个配置的是这个插件的名称
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[SERVICE]
Flush 1
Log_Level info

[INPUT]
NAME dummy
Dummy {"tool": "fluent", "sub": {"s1": {"s2": "bit"}}}
Tag test_tag

[FILTER]
Name rewrite_tag
Match test_tag
# fluent 包含fluent字符串 重写之后的标签 不保留之前的日志
Rule $tool ^(fluent)$ from.$TAG.new.$tool.$sub['s1']['s2'].out false
Emitter_Name re_emitted

[OUTPUT]
Name stdout
Match from.*

Outputs

FluentBit支持各种输出,Loki、ES、InfluxDB、Kafka等后端存储,AmazonS3、Azure等云存储,或者是标准输出,也可以转发到另外的FluentBit或者Fluentd。

Loki

Key | Description | Default
host | Loki hostname or IP address | 127.0.0.1
port | Loki TCP port | 3100
labels | 标签,多个用逗号隔开,除了固定标签,也可以使用RecordAccesor | job=fluentbit
line_format | 格式化,json or key_value | json
auto_kubernetes_labels | 是否自动解析k8s标签 | off
label_keys | 同labels,但是不能自定义label名称 |

1
2
3
4
5
6
7
8
[OUTPUT]
Name loki
Host ${LOKI_HOST}
Port ${LOKI_PORT}
Match *
Labels job=fluentbit-kube, container=$kubernetes['container_name'], namespace=$kubernetes['namespace_name'], host=$kubernetes['host'], stream=$stream
# 使用label_keys 则只能配置 container=$kubernetes['container_name']
Auto_Kubernetes_Labels off

Deploy

1
helm install fluentbit-operator -n logging charts/fluentbit-operator/ --set containerRuntime=docker
DaemonSet配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
selector:
matchLabels:
k8s-app: fluent-bit-logging
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "2020"
prometheus.io/path: /api/v1/metrics/prometheus
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:1.8.10
imagePullPolicy: IfNotPresent
ports:
- containerPort: 2020
env:
- name: LOKI_HOST
value: "loki.logging.svc"
- name: LOKI_PORT
value: "3100"
- name: STORAGE_PATH
value: "/var/log/fluentbit/"
volumeMounts:
- name: storage-path
mountPath: /var/log/fluentbit
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
terminationGracePeriodSeconds: 10
volumes:
- name: storage-path
hostPath:
path: /var/log/fluentbit
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
serviceAccountName: fluent-bit
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- operator: "Exists"
effect: "NoExecute"
- operator: "Exists"
effect: "NoSchedule"
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: fluent-bit-read
rules:
- apiGroups: [""]
resources:
- namespaces
- pods
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-read
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit-read
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: logging
---
apiVersion: v1
kind: Service
metadata:
name: fluent-bit
namespace: logging
labels:
k8s-app: fluent-bit
spec:
type: ClusterIP
clusterIP: None
ports:
- name: metrics
port: 2020
protocol: TCP
targetPort: 2020
configmap
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
storage.path ${STORAGE_PATH}
storage.backlog.mem_limit 5M

@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-loki.conf

input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
storage.type filesystem

filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off

output-loki.conf: |
[OUTPUT]
Name loki
Host ${LOKI_HOST}
Port ${LOKI_PORT}
Match *
Labels job=fluentbit-kube, container=$kubernetes['container_name'], namespace=$kubernetes['namespace_name'], host=$kubernetes['host'], stream=$stream
Auto_Kubernetes_Labels off

parsers.conf: |
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
Name apache2
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
Name apache_error
Format regex
Regex ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$

[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
Name json
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On

[PARSER]
# http://rubular.com/r/tjUt3Awgg4
Name cri
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z

[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S

Issues

FluentBit卡住,re-schedule retry=0x7f012a85a298 0 in the next 8 seconds
Loki Output插件在Loki从挂掉到恢复正常后无法正确push日志


Loki

Loki支持多种Agent:

主要特性:

Architecture

多租户

Loki多租户模式下在拉取数据的时候通过请求头 X-Scope-OrgID 来进行区分,未配置多租户时该请求头被忽略并且统一设置为 fake

1
[auth_enabled: <boolean> | default = true]

Loki提供3种部署模式,使用 -target 来进行配置。

Monolithic mode

通过 -target=all 配置,默认为该模式。Loki的所有组件和服务均跑在同一个进程中。

该模式适合每天读写量在100g的场景。需要水平扩展可以通过 memberlist_config 来进行配置。
流量通过循环的方式路由到各个实例,而查询的并发则有实例数量和配置进行控制。

Simple scalable deployment mode

适合每天读写量在几百g的场景,同时也可以通过 -target=read -target=write 来分离读写以支持达每天TB级别的数据量,

读写分离的好处:

Microservices mode

通过 -target 来指定不同的部署组件:

微服务模式是最为高效的Loki部署方式,但是也是最复杂的。该方式适用于超大型Loki集群,同时对扩展和操作要求更多的场景。
同时微服务模式也是更适合K8s部署。

Components

Distributor

Distributor是Loki处理日志的第一站,用于处理客户端的日志流,同时对流进行验证并且保证在租户或全局配置限制之内。然后将验证过的Chunk分批并行发送给ingesters。

Distributor是无状态的,且通过grpc和Ingester通信,他的数量可以按需增减,同时最好在前部署一个负载均衡。

Distributor使用一致性哈希(租户id,label)来判断流应该被哪些Ingester处理(hash lookup)。
Distributor从Ingester ring中查询到大于流hash值最小的那个Ingester,如果 replication_factor 设置大于1,那么ring中后续 replication_factor - 1 个Ingester都会处理该stream。如果处理成功的个数小于 quorum,那么distributor就会返回写入失败。

1
quorum = floor(replication_factor / 2) + 1

Ingester

用于写入数据,同时也处理内存中数据的查询。Ingester有5种状态:

Ingester将数据写入 chunks(内存中),定时再将其写入后端存储。

下面情况Chunks将被压缩和标记为只读(同时一个替代Chunk也会被创建):

Chunk写入后端存储时,hash都是基于租户id,label,内容,所以不会出现相同内容被写入多次的问题出现。
如果Ingester意外崩溃,内存中的数据会丢失,可以配置 Write Ahead Log 来避免。
但是如果上面的例子写入失败,最终重试会导致相同数据的产生。

Loki默认开启out-of-order写入,如果关闭该选项,对同一个流:

Querier

Querier服务使用LogQL进行数据查询,同时从后端存储和ingester查询数据。Querier会将相同时间戳,标签和内容的数据进行删除。

Query frontend

这个服务可选且是无状态的,内部维护一个请求队列,让后面的Querier来拉取进行进行处理:

Query frontend对metrics数据查询做了缓存(日志暂未获得支持),缓存可以使用memcached,redis等内存缓存。

Dynamo

Storage

Loki只索引metadata,日志本身被压缩存储在对象存储种,所以loki需要存储 indexs 和 chunks。

index存储支持:

chunk存储支持:

Filesystem

优点:

缺点:

Retention

Compactor 仅支持 boltdb-shipper 存储,而 table manager 支持所有存储。

Compactor

Compactor以单例模式运行

Compactor优先更新索引失效,在经过 retention_delete_delay 之后才会实际删除数据。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
compactor:
working_directory: /data/retention
shared_store: gcs
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
schema_config:
configs:
- from: "2020-07-31"
index:
period: 24h
prefix: loki_index_
object_store: gcs
schema: v11
store: boltdb-shipper
storage_config:
boltdb_shipper:
active_index_directory: /data/index
cache_location: /data/boltdb-cache
shared_store: gcs
gcs:
bucket_name: loki

Compactor的retention必须要要将 index.period 配置成 24h

同时支持按照不同的流和不同租户来配置 retention

1
2
3
4
5
6
7
8
9
...
limits_config:
retention_period: 744h
retention_stream:
- selector: '{namespace="dev"}'
priority: 1
period: 24h
per_tenant_override_config: /etc/overrides.yaml
...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
overrides:
"29":
retention_period: 168h
retention_stream:
- selector: '{namespace="prod"}'
priority: 2
period: 336h
- selector: '{container="loki"}'
priority: 1
period: 72h
"30":
retention_stream:
- selector: '{container="nginx"}'
priority: 1
period: 24h

Table Manager

Loki 支持将 indexs 和 chunks 存储到基于表的存储中,表按照时间段来进行分割。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
schema_config:
configs:
- from: 2019-01-01
store: dynamo
schema: v10
index:
prefix: loki_
period: 168h
- from: 2019-04-15
store: dynamo
schema: v11
index:
prefix: loki_
period: 168h
1
2
3
table_manager:
retention_deletes_enabled: true
retention_period: 336h

注意 retention_periodindex.period 必须是24的整数倍。

1
number_of_tables_to_keep = floor(retention_period / table_period) + 1

Single Store(boltdb shipper)

Configuration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# 是否启用多租户
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
compactor:
retention_delete_delay: 2h
retention_delete_worker_count: 150
retention_enabled: true
shared_store: filesystem
working_directory: /data/loki/boltdb-shipper-compactor
ingester:
chunk_block_size: 262144
chunk_idle_period: 3m
chunk_retain_period: 1m
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
max_transfer_retries: 0
wal:
dir: /data/loki/wal
limits_config:
enforce_metric_name: false
ingestion_burst_size_mb: 16
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: 360h
schema_config:
configs:
- from: "2020-10-24"
index:
period: 24h
prefix: index_
object_store: filesystem
schema: v11
store: boltdb-shipper
server:
http_listen_port: 3100
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /data/loki/chunks
table_manager:
retention_deletes_enabled: false
retention_period: 0s

Loki同时支持http和grpc,在server 下进行配置,默认不开启grpc。

Deploy Loki

1
helm upgrade --install --namespace=logging loki grafana/loki --set "persistence.enabled=true,persistence.storageClassName=nfs-storage,persistence.size=30Gi,config.compactor.retention_enabled=true,config.compactor.retention_delete_delay=2h,config.compactor.retention_delete_worker_count=150,config.limits_config.retention_period=360h,replicas=1,config.limits_config.ingestion_burst_size_mb=16"

Helm配置项

LogQL

同时有多种Parser用于解析日志:

Pattern

1
{container="nginx-ingress-controller"} | json | line_format "{{.log}}"
1
192.168.1.39 - - [08/Dec/2021:05:56:31 +0000] "GET /api/datasources/proxy/15/loki/api/v1/query_range?direction=BACKWARD&limit=1000&query=%7Bcontainer%3D%22nginx-ingress-controller%22%7D%20%7C%20json%20%7C%20line_format%20%7B%7B.log%7D%7D&start=1638939392000000000&end=1638942993000000000&step=2 HTTP/1.1" 400 75 "http://grafana.minei.test/explore?orgId=1&left=%5B%22now-1h%22,%22now%22,%22Loki%22,%7B%22expr%22:%22%7Bcontainer%3D%5C%22nginx-ingress-controller%5C%22%7D%22,%22hide%22:false%7D%5D" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36" 825 0.003 [monitoring-grafana-http] [] 10.233.108.29:3000 75 0.002 400 d41b0936872b9d350a054b06658db682
1
2
3
# <label_name> 标签名称 
# <_> 占位符,不做匹配
<ip> - - [<time>] "<method> <uri> <scheme>" <status> <request_length> "<referer>" "<ua>" <response_length> <duration> <_>
1
{container="nginx-ingress-controller"} | json | line_format "{{.log}}" | pattern "<ip> - - [<time>] \"<method> <uri> <scheme>\" <status> <request_length> \"<referer>\" \"<ua>\" <response_length> <duration> <_>"

Log Stream Selector

和Prometheus类似,= != =~ !~,支持4种匹配方式。

Line Filter Expression

Loki可以按照行来进行匹配,支持4种匹配方式:

line_format 用于格式化日志行,可以使用 {{.key}} 来引用使用Parser格式化后的日志,比如:

1
{job="fluentbit-kube"} | json | line_format "{{.log}}"

Label Filter Expression

Label filter也支持对解析后的标签进行过滤:

1
{job="fluentbit"} | json | PRIORITY=4

Loki支持4种类型的值的自动推断:

多个表达式使用 and or 连接,除了字符串,其他3种类型都支持 == = != > >= < <=

1
{job="fluentbit"} | json | PRIORITY=4 and SYSLOG_FACILITY=3 and _HOSTNAME=~"local.*"

| label_format 可以重命名、修改和添加标签。

1
2
3
4
# 添加host标签,值等于hostname标签的值
# 将job标签的值修改为f
# 将job标签替换为newjob
{job="fluentbit"} | label_format host="{{.hostname}}" | label_format job="f" | label_format newjob=job

同时在解析日之后也可以进行添加标签操作:

1
2
# 此时的标签值必须是解析后对应key的值
{job="fluentbit"} | json p="PRIORITY"

Metric queries

Loki也支持metrics的存储,和Prometheus类似。

Loki Promtail

Deploy

1
helm upgrade --install --namespace=logging promtail grafana/promtail --set "config.lokiAddress=http://loki.logging.svc:3100/loki/api/v1/push"

Configuration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 配置Promtail
[server: <server_config>]

# 配置Loki地址,多租户等信息
clients:
- [<client_config>]

# 日志读取记录位置
[positions: <position_config>]

# 日志采集配置
scrape_configs:
- [<scrape_config>]

[target_config: <target_config>]

scrape_configs 按照job来进行区分,支持

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
server:
http_listen_port: 9080

positions:
filename: /root/jn/prometail-positions.yaml

clients:
- url: http://loki.minei.test/loki/api/v1/push

scrape_configs:
- job_name: journal
journal:
max_age: 1h
labels:
job: promtail-systemd-journal
machine: 192.168.1.125
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: 'system_unit'
- source_labels: ['__journal___machine_id']
target_label: 'machine_id'

Stages

Parsing stages:

Transform stages:

Action stages:

Filtering stages: