I'm starting to provide Chinese / English versions of some posts, switch with the Language menu above. 我开始提供部分文章的中文、英文翻译,请使用顶部语言菜单切换。

Replace Jenkins with Drone CI

Jenkins is a free and open source CI/CD software, widely used in all kinds of scenarios. The main advantage of Jenkins is its grand collection of plugins capable of all sorts of jobs, including deploying with SCP or Ansible, analyzing code with Cppcheck, and notifying job status with Telegram or DingTalk.

Previously I also use Jenkins for automation of numerous jobs, for example rebuilding my Docker images, deploying the blog you're visiting right now, and even auto sign-in to Genshin Impact.

But Jenkins is a CI with a long history, and its predecessor Hudson was released back in 2005. Therefore, Jenkins executes commands directly when it comes to running jobs, instead of using modern approaches such as containers. This means that whether a CI pipeline succeeds will largely depend on the environment of the Worker host. For example, I rented a dedicated server with higher specs. As I rebuilt my whole environment, I was greeted with a number of weird issues, which took me about a week to find out and fix.

In addition, Jenkins is written in Java, hence its high memory consumption. A Jenkins instance can take as much as 1GB memory, which makes it impossible for a low specs server to run even the simpliest tasks. In addition, it's hard to use all the awesomeness of Jenkins plugins from a configuration file. A lot of plugins didn't implement the functionality of setting parameters from a Jenkinsfile, and such plugins can only be configured one-by-one on the Jenkins webpage, which is a complicated and error-prone process.

By comparison, Drone the container-based CI is a relatively modern approach. Drone recommends its Docker container based Worker (called Runner by the Drone folks). As containers are used as execution environments, Drone fully exploits the advantage of containers: consistency. As long as the container image is consistent, you can be sure that those CI commands will be executed under the same environment every time, and its output should be stable. Of course, if your script cannot run in a container by any means, Drone also has runners for executing commands directly on host or in a DigitalOcean cloud server.

Drone also has a lot of other advantages: Drone is written in Go, using one tenth the memory of Jenkins; Drone's configuration files are written in YAML or Jsonnet, unlike the special language of Jenkinsfile; Although the number of Drone's plugins is comparatively smaller to Jenkins, all of them are Docker containers and can be used from the configuration file.

JenkinsDrone
EnvironmentWorker's hostYour choice: Docker container, Worker's host, or DigitalOcean cloud server
Config syntaxSpecial language: JenkinsfileGeneric YAML/Jsonnet
PluginsMore, 1836 (as of this is written)Less, 102
Plugin configWeb-based, some available through config fileAll in config file
Programming LanguageJavaGo
Memory UsageMore, around 1GBLess, around 100MB

Install Drone

As a containerized CI, Drone itself is a Docker container, and is configured through environment variables. Drone can be connected to GitHub, GitLab, Gitea or BitBucket, please refer to the linked official documents for guides. However, one Drone instance can only connect to one of them. If you are like me, who needs CI on both GitHub and my own Gitea instance, you will need two sets of Drone.

If you plan to use Drone for deploying, you will need some way to pass your deployment keys to Drone. I use secret management software Vault, with official support from Drone. Of course you can simply store your secrets in Drone, but not through its web UI. You must use Drone's commandline tool for that.

Here is my configuration with Vault and Drone for reference:

version: '2.4'
services:
    # Secret management, Vault instance and plugin for Drone
    vault:
        image: vault
        container_name: vault
        restart: unless-stopped
        command: 'server'
        labels:
            - com.centurylinklabs.watchtower.enable=false
        volumes:
            - './conf/vault:/vault/config:ro'
            - './data/vault:/vault/file'

    drone-vault:
        image: drone/vault
        container_name: drone-vault
        restart: unless-stopped
        environment:
            DRONE_DEBUG: 'true'
            DRONE_SECRET: '***drone-vault secret***'
            VAULT_ADDR: 'https://vault.lantian.pub'
            VAULT_TOKEN: '***Vault secret***'
        depends_on:
            - vault

    # Drone #1 for my own Gitea
    drone:
        image: drone/drone:2
        container_name: drone
        restart: unless-stopped
        environment:
            DRONE_GITEA_SERVER: 'https://git.lantian.pub'
            DRONE_GITEA_CLIENT_ID: '***Gitea OAuth ID***'
            DRONE_GITEA_CLIENT_SECRET: '***Gitea OAuth Secret***'
            DRONE_RPC_SECRET: '***Drone Runner Secret, generate with openssl rand -hex 16***'
            DRONE_SERVER_HOST: ci.lantian.pub
            DRONE_SERVER_PROTO: https
            DRONE_USER_CREATE: username:lantian,admin:true # Admin account
            DRONE_JSONNET_ENABLED: 'true'
            DRONE_STARLARK_ENABLED: 'true'
        volumes:
            - './data/drone:/data'

    # Drone #1's Docker Runner
    drone-runner-docker:
        image: drone/drone-runner-docker:1
        container_name: drone-runner-docker
        restart: unless-stopped
        environment:
            DRONE_RPC_PROTO: https
            DRONE_RPC_HOST: ci.lantian.pub
            DRONE_RPC_SECRET: '***Drone Secret, same as DRONE_RPC_SECRET above'
            DRONE_RUNNER_CAPACITY: 4 # Max parallel jobs
            DRONE_RUNNER_NAME: drone-docker
            DRONE_SECRET_PLUGIN_ENDPOINT: http://drone-vault:3000
            DRONE_SECRET_PLUGIN_TOKEN: '***drone-vault secret***'
        volumes:
            - '/var/run:/var/run'
            - '/cache:/cache'
        depends_on:
            - drone
            - drone-vault

    # Drone #1 for GitHub
    drone-github:
        image: drone/drone:2
        container_name: drone-github
        restart: unless-stopped
        environment:
            DRONE_GITHUB_CLIENT_ID: '**GitHub OAuth ID**'
            DRONE_GITHUB_CLIENT_SECRET: '***GitHub OAuth Secret***'
            DRONE_RPC_SECRET: '***Drone Runner Secret, generate with openssl rand -hex 16***'
            DRONE_SERVER_HOST: ci-github.lantian.pub
            DRONE_SERVER_PROTO: https
            DRONE_USER_CREATE: username:xddxdd,admin:true # Admin account
            DRONE_REGISTRATION_CLOSED: 'true' # Disallow new user registration
            DRONE_JSONNET_ENABLED: 'true'
            DRONE_STARLARK_ENABLED: 'true'
        volumes:
            - './data/drone-github:/data'

    # Drone #2's Docker Runner
    drone-github-runner-docker:
        image: drone/drone-runner-docker:1
        container_name: drone-github-runner-docker
        restart: unless-stopped
        environment:
            DRONE_RPC_PROTO: https
            DRONE_RPC_HOST: ci-github.lantian.pub
            DRONE_RPC_SECRET: '***Drone Secret, same as DRONE_RPC_SECRET above'
            DRONE_RUNNER_CAPACITY: 4 # Max parallel jobs
            DRONE_RUNNER_NAME: drone-docker
            DRONE_SECRET_PLUGIN_ENDPOINT: http://drone-vault:3000
            DRONE_SECRET_PLUGIN_TOKEN: '***drone-vault secret***'
        volumes:
            - '/var/run:/var/run'
            - '/cache:/cache'
        depends_on:
            - drone-github
            - drone-vault

Basic Drone CI/CD

After setting up Drone, the next step is to add a task. Here I'll use the example of deploying my Hexo blog.

I already have a set of deployment scripts for the following tasks:

  • Install node_modules
  • hexo generate
  • hexo deploy to GitHub Pages (as a backup)
  • Convert all images to WebP, and Gzip and Brotli compress all static resources
  • Rsync generated files to all of my nodes with Ansible

In addition, since my blog uses Dependabot to update dependencies automatically, Dependabot may create pull requests from time to time. Obviously, the pull requests cannot be deployed to my nodes. The CI should just try generate and see if it fails.

So here comes the most basic form of our configuration, written to .drone.yaml:

kind: pipeline
type: docker
name: default

trigger:
    branch:
        - master

steps:
    - name: hexo generate
      image: node:15-alpine
      commands:
          # Not all packages are needed: this is to be consistent with following steps
          - apk add --no-cache build-base bash git openssh wget python3 gzip brotli zstd parallel imagemagick
          - npm install
          - node_modules/hexo/bin/hexo generate

    - name: hexo deploy
      image: node:15-alpine
      commands:
          # Install packages
          - apk add --no-cache build-base bash git openssh wget python3 gzip brotli zstd parallel imagemagick
          - node_modules/hexo/bin/hexo deploy
      # Don't deploy Dependabot's PRs
      when:
          event:
              exclude:
                  - pull_request

    # Some subsequent steps are skipped

This config will generate the static files and attempt hexo deploy, but it will fail since it doesn't have the SSH keys. For obvious reasons I won't recommend adding your SSH key directly to the config. You should instead add it to Vault (or Drone's secret storage), and use it from the config file:

# Fetch SSH key from Vault, the repository must be set to Trusted in Drone
kind: secret
name: id_ed25519
get:
    # This path is shown as kv/ssh in Vault. "data" must be added.
    path: kv/data/ssh
    name: id_ed25519

---
kind: pipeline
type: docker
name: default

# ...

steps:
    # ...
    - name: hexo deploy
      image: node:15-alpine
      environment:
          # Use the SSH key fetched from Vault, set as environment variable
          SSH_KEY:
              from_secret: id_ed25519
      commands:
          # Install SSH key
          - mkdir -p /root/.ssh/
          - echo "$SSH_KEY" > /root/.ssh/id_ed25519
          - chmod 600 /root/.ssh/id_ed25519

          # Configure SSH, mainly disable host key verification, or login will fail
          - |
              cat <<EOF >/root/.ssh/config
              StrictHostKeyChecking no
              UserKnownHostsFile=/dev/null
              VerifyHostKeyDNS yes
              LogLevel ERROR
              EOF

          # Install packages... redacted

Now we have SSH keys in the CI containers, and it will be able to connect to GitHub or other deployment targets via SSH.

But another problem exists: every time the build is started, the container is in a clean state without node_modules, which means a considerable amount of time is needed to download this blackhole.

Good news is that Drone provides a plugin to cache intermediate directories, and decompress them on the next build:

# ...
steps:
    # Restore last cache
    - name: restore cache
      image: meltwater/drone-cache:dev
      settings:
          backend: 'filesystem'
          restore: true
          cache_key: 'volume'
          archive_format: 'gzip'
          filesystem_cache_root: '/cache'
          # Cache these two folders
          mount:
              - 'node_modules'
              - 'img_cache'
      volumes:
          - name: cache
            path: /cache

    - name: hexo generate
      # ...

    # Cache result generated this time
    - name: rebuild cache
      image: meltwater/drone-cache:dev
      settings:
          backend: 'filesystem'
          rebuild: true
          cache_key: 'volume'
          archive_format: 'gzip'
          filesystem_cache_root: '/cache'
          # Cache these two folders
          mount:
              - 'node_modules'
              - 'img_cache'
      volumes:
          - name: cache
            path: /cache

# Cache files are stored to /cache on host, need repo set to Trusted in Drone
volumes:
    - name: cache
      host:
          path: /cache

We can also have Telegram notifications on build failures:

# Fetch Telegram token and target account from Vault
kind: secret
name: tg_token
get:
    path: kv/data/telegram
    name: token

---
kind: secret
name: tg_target
get:
    path: kv/data/telegram
    name: target

---
# ...
steps:
    # ...

    # Handle notification on failure
    - name: telegram notification for failure
      image: appleboy/drone-telegram
      settings:
          token:
              from_secret: tg_token
          to:
              from_secret: tg_target
      when:
          status:
              - failure

    # Handle notification on failure, not sent when triggered from cron job
    - name: telegram notification for success
      image: appleboy/drone-telegram
      settings:
          token:
              from_secret: tg_token
          to:
              from_secret: tg_target
      when:
          branch:
              - master
          status:
              - success
          event:
              exclude:
                  - cron

Now we have a Drone configuration with deployments, caching and Telegram notifications.

Matrix Build

Sometimes we need to test our programs on different environments, such as Python 2.7/3.6/3.7/3.8/3.9, GCC/Clang, etc. Drone supports Jsonnet configuration format to define jobs in batches.

Take my route-chain project for example, some contents are removed/simplified for demonstration:

// Define a "function" to create a pipeline
local DebianCompileJob(image, kernel_headers) = {
  "kind": "pipeline",
  "type": "docker",
  "name": image,
  "steps": [
    {
      "name": "build",
      "image": image,
      "commands": [
        "apt-get update",
        "DEBIAN_FRONTEND=noninteractive apt-get -y --no-install-recommends install build-essential " + kernel_headers,
        "make"
      ]
    },
    {
      "name": "telegram notification",
      "image": "appleboy/drone-telegram",
      "settings": {
        "token": {
          "from_secret": "tg_token"
        },
        "to": {
          "from_secret": "tg_target"
        }
      }
    }
  ]
};

[
  // Telegram token and target account
  {
    "kind": "secret",
    "name": "tg_token",
    "get": {
      "path": "kv/data/telegram",
      "name": "token"
    }
  },
  {
    "kind": "secret",
    "name": "tg_target",
    "get": {
      "path": "kv/data/telegram",
      "name": "target"
    }
  },
  // Call DebianCompileJob in batches to create jobs for different images and linux-headers packages
  DebianCompileJob('debian:jessie', 'linux-headers-amd64'),
  DebianCompileJob('debian:stretch', 'linux-headers-amd64'),
  DebianCompileJob('debian:buster', 'linux-headers-amd64'),
  DebianCompileJob('debian:bullseye', 'linux-headers-amd64'),
  DebianCompileJob('debian:unstable', 'linux-headers-amd64'),
  DebianCompileJob('ubuntu:xenial', 'linux-headers-generic'),
  DebianCompileJob('ubuntu:bionic', 'linux-headers-generic'),
  DebianCompileJob('ubuntu:focal', 'linux-headers-generic'),
]

Save the config to .drone.jsonnet, and change the config file name from .drone.yaml to .drone.jsonnet, and you're good to go.

Aborting Build Early

Sometimes we don't need to run all jobs in a Matrix Build. For example, I don't need to rebuild all Docker images on every commit to my Dockerfiles repository, specifically 14 (images) multiplied by 8 (architectures) to 112 jobs.

Fortunately Drone supports aborting a pipeline early, just quit some step with exit code 78 like this:

# ...
steps:
    # ...
    - name: skip build
      image: alpine
      commands:
          - ./should_build.sh && exit 0 || exit 78

An actual example can be found at this commit in my Dockerfiles repo.

But since Drone runs builds in containers, and containers are somewhat slow to start, handling 112 pipelines alone needs tens of minutes, even if all jobs quit immediately. Therefore, I adjusted the Dockerfiles repo configuration to run one pipeline for each architecture, and determine images to be built from the commit message. In this case, only 8 pipelines are needed, and the execution time for an empty job won't be too long.