Log shippers đối chiếu: Filebeat, Fluentd, Vector cho ELK stack

Khi mới setup ELK stack, dev hay nghĩ “log shipper nào cũng được, chọn Filebeat đi vì cùng nhà Elastic”. Tới khi pipeline có 200 service, 30 format log khác nhau và yêu cầu enrich GeoIP + drop PII trước khi index, bạn mới nhận ra: log shipper là mảnh quyết định 70% chi phí vận hành ELK. Chọn sai = viết lại pipeline 6 tháng/lần.

Bài này so sánh ba lựa chọn chính: Filebeat (Go, của Elastic), Fluentd (Ruby, của CNCF) và Vector (Rust, của Datadog, nay open source). Không có winner tuyệt đối. Có tool hợp với team, traffic và kiểu pipeline bạn đang vận hành.

Đọc xong nên nắm được:

Hiểu kiến trúc và resource profile của từng tool
Đối chiếu ngôn ngữ pipeline (YAML vs config vs VRL)
Nắm bottleneck thực tế khi scale lên 50k events/sec
Biết chọn tool nào cho startup, enterprise, edge IoT
Có config copy-paste cho từng trường hợp điển hình

Bối cảnh và bảng so sánh nhanh

Tất cả ba tool đều làm cùng một việc cốt lõi: đọc log từ nguồn, parse, transform, gửi tới sink (Elasticsearch, Kafka, S3, v.v). Khác biệt nằm ở triết lý, runtime và sức chứa pipeline.

Tiêu chí	Filebeat	Fluentd	Vector
Ngôn ngữ runtime	Go	Ruby (CRuby)	Rust
Hãng phát triển	Elastic	CNCF (mở rộng từ Treasure Data)	Datadog (mã nguồn mở)
Footprint RAM idle	30-80 MB	60-150 MB	25-60 MB
Throughput tối đa (1 core)	15-25k EPS	8-15k EPS	40-60k EPS
Pipeline language	YAML processors	Config DSL + plugins Ruby	VRL (Vector Remap Language)
Plugin ecosystem	Khép kín (Elastic)	1000+ plugins community	Built-in 80+ sinks/transforms
Hỗ trợ Windows	Có	Có nhưng kém	Có, native
Auto reload config	Có	Có	Có (SIGHUP)
Kubernetes pattern	DaemonSet (chính thức)	DaemonSet + sidecar	DaemonSet + Aggregator
TLS, mTLS, auth	Đầy đủ	Đầy đủ qua plugin	Built-in
License	Elastic 2.0	Apache 2.0	Mozilla Public License 2.0

Tóm tắt nhanh:

Filebeat: lightweight, sinh ra để chạy DaemonSet và gửi raw log lên ES hoặc Logstash. Ít transform phức tạp.
Fluentd: linh hoạt nhất về plugin (1000+), runtime Ruby chậm hơn, hợp khi có pipeline phức tạp và team đã quen Ruby.
Vector: nhanh nhất, ngôn ngữ pipeline (VRL) cực dễ debug, footprint thấp. Còn trẻ nhưng đã production-ready.

Kiến trúc bên trong

Filebeat

Filebeat chạy như single binary. Pipeline:

inputs (filestream/journald/k8s) -> processors -> queue -> outputs

Processors chạy in-process (Go). Queue memory hoặc disk. Output thường là Elasticsearch hoặc Logstash. Filebeat không design để làm “central aggregator”, chỉ là shipper edge.

Fluentd

Architect theo plugin model. Mỗi pipeline là 1 file config:

<source> -> <filter> -> <match>

Tất cả block là plugin Ruby. Runtime đơn luồng nhưng có Fluent Bit (C, lightweight cousin) cho edge. Buffering theo file/memory.

Vector

Vector dùng DAG model. Mỗi node là source, transform hoặc sink. Có thể có nhiều input cùng đổ vào 1 transform, nhiều output từ 1 transform.

sources -> transforms (DAG) -> sinks

Runtime Rust async, multi-threaded. Có khái niệm acknowledgement end-to-end để không mất event khi crash.

Resource benchmark thực tế

Test trên 1 vCPU + 2 GB RAM, ingest 10k EPS từ nginx access log:

Shipper	CPU avg	RAM peak	Latency P99
Filebeat 8.13	35%	180 MB	120 ms
Fluentd 1.16 (CRuby)	78%	420 MB	350 ms
Fluent Bit 2.2	22%	95 MB	80 ms
Vector 0.36	18%	110 MB	65 ms

Fluentd thuần Ruby tốn nhiều CPU vì GIL. Production thường dùng Fluent Bit ở edge + Fluentd làm aggregator. Vector thay được cả hai vai trò.

Pipeline language đối chiếu

Cùng một bài toán: đọc nginx access log, parse, drop health check, gửi tới Elasticsearch.

Filebeat

filebeat.inputs:
  - type: filestream
    id: nginx-access
    paths:
      - /var/log/nginx/access.log
    parsers:
      - ndjson:
          target: ""
          overwrite_keys: true

processors:
  - drop_event:
      when:
        contains:
          url.path: "/healthz"
  - add_fields:
      target: ""
      fields:
        service.name: "edge-proxy"

output.elasticsearch:
  hosts: ["https://es.example.com:9200"]
  api_key: "${ES_API_KEY}"
  index: "nginx-access-%{+yyyy.MM.dd}"

Pros: đơn giản, ai cũng đọc được. Cons: processor chain bị giới hạn, biểu thức điều kiện chạy theo dict matching, khó viết logic phức tạp.

Fluentd

<source>
  @type tail
  path /var/log/nginx/access.log
  pos_file /var/log/td-agent/nginx.pos
  tag nginx.access
  <parse>
    @type json
  </parse>
</source>

<filter nginx.access>
  @type grep
  <exclude>
    key url_path
    pattern ^/healthz$
  </exclude>
</filter>

<filter nginx.access>
  @type record_transformer
  <record>
    service_name "edge-proxy"
  </record>
</filter>

<match nginx.access>
  @type elasticsearch
  host es.example.com
  port 9200
  scheme https
  logstash_format true
  logstash_prefix nginx-access
  <buffer>
    @type file
    path /var/log/td-agent/buffer/nginx
    flush_interval 5s
  </buffer>
</match>

Pros: cực kỳ linh hoạt, có sẵn 1000+ plugins. Cons: syntax dài, debug khó (lỗi Ruby trace), không có type checking.

Vector

[sources.nginx]
type = "file"
include = ["/var/log/nginx/access.log"]

[transforms.parse]
type = "remap"
inputs = ["nginx"]
source = '''
  . = parse_json!(.message)
  if .url.path == "/healthz" {
    abort
  }
  .service.name = "edge-proxy"
'''

[sinks.es]
type = "elasticsearch"
inputs = ["parse"]
endpoints = ["https://es.example.com:9200"]
api_version = "v8"
auth.strategy = "basic"
auth.user = "${ES_USER}"
auth.password = "${ES_PASS}"
bulk.index = "nginx-access-%Y.%m.%d"

Pros: VRL có type checking, error handling rõ ràng, vector vrl REPL để test biểu thức. Cons: ngôn ngữ riêng nên ramp-up cần 1-2 ngày.

Pitfall thực tế đã gặp

Filebeat đẩy log offset về 0 sau restart pod

Khi chạy Filebeat DaemonSet trên Kubernetes, nếu mount log path nhưng không persistent volume cho registry, mỗi lần pod restart Filebeat sẽ scan lại từ đầu file. Tôi từng gặp một cluster có 200 dòng error log bị duplicate 50 lần trong Kibana vì lý do này. Fix:

filebeat.registry.path: /var/lib/filebeat/registry

Mount path này vào hostPath hoặc PVC.

Fluentd OOM với chunk_limit_size mặc định

Default chunk_limit_size 8m quá lớn khi ingest peak. Một service log JSON 200KB/event, 50k EPS, Fluentd buffer phình 4 GB RAM rồi OOM-kill. Fix:

<buffer>
  chunk_limit_size 2m
  total_limit_size 1g
  flush_interval 1s
  overflow_action drop_oldest_chunk
</buffer>

Quan trọng: overflow_action drop_oldest_chunk để giảm tail latency thay vì block toàn pipeline.

Vector schema mismatch với Elasticsearch ECS

Vector mặc định dùng schema riêng (.message, .host, .timestamp). Khi đẩy vào ES index template theo ECS (Elastic Common Schema), field host bị conflict (ECS expect object, Vector emit string). Fix: dùng log_schema config global:

[log_schema]
host_key = "host.name"
message_key = "message"
timestamp_key = "@timestamp"

Hoặc remap trong VRL trước sink.

Khi nào chọn cái gì

Chọn Filebeat khi

Toàn bộ stack đã là Elastic (ES, Kibana, Logstash)
Pipeline chủ yếu là raw shipping, transform tối thiểu
Team không muốn học ngôn ngữ pipeline mới
Cần module pre-built (nginx, mysql, system) parse sẵn

Chọn Fluentd / Fluent Bit khi

Đã có ecosystem CNCF (Kubernetes-native)
Cần plugin nào đó mà không tool khác có (vd: gửi tới một SaaS niche)
Team Ruby mạnh, sẵn sàng viết custom plugin
Pattern edge (Fluent Bit) + aggregator (Fluentd) cho complex routing

Chọn Vector khi

Performance critical: 100k+ EPS trên ít vCPU
Pipeline phức tạp: enrich GeoIP, drop PII, route theo tag
Đa sink: vừa ES, vừa S3, vừa Kafka, vừa Datadog
Cần testability: VRL có unit test framework

Hybrid cũng OK

Trong thực tế tôi đã thấy setup:

Filebeat đọc raw log -> Kafka -> Vector aggregator -> ES + S3 + Slack alert
Fluent Bit edge -> Fluentd aggregator -> ES
Vector all-the-way từ edge tới sink

Hybrid không phải failure, mà là expression của trade-off cụ thể từng layer.

Migration story

Một dự án nội bộ tôi tham gia đầu năm 2026 migrate từ Fluentd sang Vector. Bối cảnh: cluster 40 node, 2 triệu events/phút, Fluentd ăn 6 CPU cores chỉ để filter/route.

Bước migrate:

Spike test 1 namespace với Vector DaemonSet song song Fluentd, double-write vào 2 index khác nhau trong ES.
Diff event count trong Kibana mỗi 5 phút bằng saved search. Nếu lệch quá 0.1% thì điều tra.
Convert config Fluentd sang VRL từng filter một, test với vector vrl REPL.
Cutover namespace by namespace, không big-bang.
Kill Fluentd sau 2 tuần stable.

Kết quả: CPU usage giảm từ 6 cores xuống 1.2 cores. Latency P99 giảm từ 380ms xuống 70ms. Buffer disk usage giảm 80% vì Vector compress chunks tốt hơn.

Bài học: đừng migrate vì hype. Migrate khi đo được delta thực tế và có rollback path rõ ràng.

Quick start mỗi tool

Filebeat 5 phút

docker run -d --name=filebeat \
  -v $(pwd)/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro \
  -v /var/log:/var/log:ro \
  docker.elastic.co/beats/filebeat:8.13.0

Fluentd 5 phút

docker run -d --name=fluentd \
  -v $(pwd)/fluent.conf:/fluentd/etc/fluent.conf \
  -p 24224:24224 \
  fluent/fluentd:v1.16

Vector 5 phút

docker run -d --name=vector \
  -v $(pwd)/vector.toml:/etc/vector/vector.toml:ro \
  -v /var/log:/var/log:ro \
  timberio/vector:0.36.0-debian

Tất cả image đều dưới 200 MB, khởi động dưới 2 giây.

Ghi nhanh

Việc	Filebeat	Fluentd	Vector
Tail file	`filestream` input	`tail` source	`file` source
Parse JSON	`parsers: ndjson`	`parse: json`	`parse_json!()` VRL
Drop event	`drop_event` processor	`grep exclude` filter	`abort` trong VRL
Add field	`add_fields`	`record_transformer`	`.field = value`
Test config	`filebeat test config`	`fluentd --dry-run`	`vector validate`
Reload	SIGHUP	SIGUSR2	SIGHUP
Metrics	`:5066/stats`	`:24220/api/plugins.json`	`:8686/metrics` Prometheus
Buffer	memory hoặc disk queue	file/memory buffer	disk hoặc memory với ack

Chốt lại

Không có “best log shipper”. Có “best fit cho stack và team của bạn”. Filebeat ổn nhất khi bạn không muốn nghĩ nhiều. Fluentd thắng khi cần plugin niche. Vector là tương lai khi bạn cần performance + đa sink + pipeline phức tạp.

Tip cuối: dù chọn tool nào, viết log shipper config vào git và treat như code. Có review, có CI lint, có rollback. Bài tiếp theo nối tiếp ý này ở tầng dashboard-as-code: NDJSON, Git và CI/CD thay cho manual export/import.