Easy log forwarding on K8S using filebeat CRD’s

4 min readOct 16, 2020

Log forwarding is an essential part of any Kubernetes cluster, due to the ephemeral nature of pods you want to persist all the logging data somewhere outside the pod so that it can be viewed beyond the pod’s lifetime, and also outside your worker node as they could also die.

Filebeat is a well established log shipper. In the Kubernetes ecosystem you have other options such as fluentbit and fluentd(the CNCF backed option), I found filebeat to be easier to set up than them, at least to get it running, and it is just a featured-rich.

To make the setup easier we are going to use Elastic’s ECK, which is a set of CRD’s for the ELK stack for Kubernetes, I won't get into too many details on CRD’s, but in short they allow you to extend Kubernetes beyond the typical resources such as Pods, Services, etc. so that you can define your own custom resources in a declarative way, this makes streamlining your cluster set up easier since you can have everything centralized in the same place.

Creating a test cluster

In case you haven’t heard about k3d, this would be a good time to use it. K3d can help you set up a lightweight Kubernetes cluster in your computer within seconds, and it supports multiple platforms, including windows.

Creating the CRD’s

The first step would be to create the ECK CRD’s in your kubernetes cluster. To do that you can run the following command, extracted from ECK’s documentation. For more information refer to ECK docs

kubectl apply -f https://download.elastic.co/downloads/eck/1.2.1/all-in-one.yaml

Creating the Role and Service Account

Next we need the cluster role and service account with the clusterrole binding that will be used by filebeat.

First let's create the namespace where we will create all of our filebeat resources:

kubectl create ns elastic-system

Now onto the YAML. Before applying the YAML ensure that you are in the “elastic-system” namespace that we created earlier, or pass on the namespace flag to kubectl using “-n elastic-system” flag.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - namespaces
  - pods
  verbs:
  - get
  - watch
  - list
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: filebeat
  namespace: elastic-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: filebeat
subjects:
- kind: ServiceAccount
  name: filebeat
  namespace: elastic-system
roleRef:
  kind: ClusterRole
  name: filebeat
  apiGroup: rbac.authorization.k8s.io

Creating the filebeat resource

Now that we have the CRD’s and the service account created we can define a Beat resource using the following YAML, again remember this should be done in the “elastic-system” namespace:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: filebeat
spec:
  type: filebeat
  version: 7.9.1
  image: docker.elastic.co/beats/filebeat-oss:7.9.1
  config:
    output.elasticsearch:
      hosts: ["<https://yourelasticsearch>"]
      index: "k8s-%{[agent.version]}-%{+yyyy.MM.dd}" 
    setup.template.name: "k8s"
    setup.template.pattern: "k8s-*"
    setup.ilm.enabled: false
    filebeat:
      autodiscover:
        providers:
        - type: kubernetes
          node: ${NODE_NAME}
          hints:
            enabled: true
            default_config:
              type: container
              paths:
              - /var/log/containers/*${data.kubernetes.container.id}.log
    processors:
    - add_cloud_metadata: {}
    - add_host_metadata: {}
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: filebeat
        automountServiceAccountToken: true
        terminationGracePeriodSeconds: 30
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true # Allows to provide richer host metadata
        containers:
        - name: filebeat
          securityContext:
            runAsUser: 0
            # If using Red Hat OpenShift uncomment this:
            #privileged: true
          volumeMounts:
          - name: varlogcontainers
            mountPath: /var/log/containers
          - name: varlogpods
            mountPath: /var/log/pods
          - name: varlibdockercontainers
            mountPath: /var/lib/docker/containers
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
        volumes:
        - name: varlogcontainers
          hostPath:
            path: /var/log/containers
        - name: varlogpods
          hostPath:
            path: /var/log/pods
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
---

Few things to highlight about the YAML

I changed the image to use “filebeat-oss” image. This is the image of the opensource distribution of filebeat, the other image (which doesn’t contain oss at the end) will try to look for an elasticsearch license and fail if you don’t have one. For a list of images you can go to http://docker.elastic.co/.
Replace “https://yourelasticsearch” with your ElasticSearch endpoint.
All logs will go into a new ElasticSearch index pattern called “k8s-*”.

About Autodiscovery

In the filebeat configuration that is part of the filebeat resource we created in the YAML you’ll notice we are using autodiscovery, this simplifies things for us as it will automatically look for all pods and forward the logs without us having to do anything and it will also enrich the logging data with Kubernetes metadata, such as the namespace of the pod, the labels of the pods, and other data that will make the logs much easier to search and filter by.

Troubleshooting

If you aren’t seeing any logs in ElasticSearch you should check the logs of the DaemonSet that was created when you executed the filebeat resource YAML. It should be named “filebeat-beat-filebeat”, you can use the command below to view the logs, and look for errors there.

kubectl logs ds/filebeat-beat-filebeat -n elastic-system

Wrap up

So now you have a basic log forwarding setup that should serve as a start for more robust log forwarding. This setup is missing log parsing, often you will want to enrich your logging data as much as you can and this comes with some parsing to be done on the messages, to extract for example the logger that put the log message in the case of a web application. You can look at ElasticSearch Ingest Node for that.