← 返回所有职位
Engineering Full-time Indonesia-remote (Jakarta preferred for occasional DC visits)

Site Reliability Engineer

Own the production fabric — Pingora reverse proxy (Rust), Incus x99 cluster, deploy pipeline, observability, and the on-call rotation. The role that keeps uptime.dalang.io's numbers honest.

职位描述以英文呈现。您可以使用英文或印尼语提交申请。

工作职责

  • Run the Pingora proxy that front-doors all dalang.io traffic. Maintain the cargo build/release pipeline, the systemd unit, routing rules, and IP-forwarding hygiene; changes here have site-wide blast radius.
  • Operate the Incus cluster (x99 remote): node maintenance, capacity planning, networking, snapshot and backup discipline.
  • Maintain systemd-based deploy units (dalang-test-api, dalang-test-frontend; production replicas dalang-api@*, dalang-frontend@*).
  • Build and own observability: Grafana dashboards, log aggregation, the public uptime.dalang.io status page.
  • Lead the on-call rotation; write incident reports and blameless postmortems.
  • Anchor Dalang's own ISO 27001 readiness audit when the engineering team scales.

任职要求

  • 4+ years SRE / DevOps / platform engineering experience.
  • Strong Linux internals: systemd, networking (iptables/nftables, BGP basics, CDN edge behavior), filesystems, kernel tuning.
  • Hands-on with at least one of LXD/Incus, Kubernetes operators, OpenStack, Proxmox.
  • Comfortable in Rust, OR strong willingness to debug a Rust-based proxy (Pingora is small and approachable).
  • Engineering judgment on observability — knows the difference between alert noise and real signal.

加分项

  • Pingora specifically.
  • ISO 27001 or SOC 2 audit experience.
  • Cloudflare API operations.
  • Performance forensics (flame graphs, eBPF, perf).

90 天内的成功标准

  • On-call shadowing for the first two weeks; primary by week six.
  • One quantified reduction (alert noise, MTTR, capacity headroom).
  • One deploy-pipeline improvement merged.

申请方式

请发送您的简历以及一段简短说明(英文或印尼语),告诉我们您会优先处理哪两项职责以及原因。我们会阅读每一份申请,并在 7 天内回复。

立即申请 → [email protected]