工作职责
- Run the Pingora proxy that front-doors all dalang.io traffic. Maintain the cargo build/release pipeline, the systemd unit, routing rules, and IP-forwarding hygiene; changes here have site-wide blast radius.
- Operate the Incus cluster (x99 remote): node maintenance, capacity planning, networking, snapshot and backup discipline.
- Maintain systemd-based deploy units (dalang-test-api, dalang-test-frontend; production replicas dalang-api@*, dalang-frontend@*).
- Build and own observability: Grafana dashboards, log aggregation, the public uptime.dalang.io status page.
- Lead the on-call rotation; write incident reports and blameless postmortems.
- Anchor Dalang's own ISO 27001 readiness audit when the engineering team scales.
任职要求
- 4+ years SRE / DevOps / platform engineering experience.
- Strong Linux internals: systemd, networking (iptables/nftables, BGP basics, CDN edge behavior), filesystems, kernel tuning.
- Hands-on with at least one of LXD/Incus, Kubernetes operators, OpenStack, Proxmox.
- Comfortable in Rust, OR strong willingness to debug a Rust-based proxy (Pingora is small and approachable).
- Engineering judgment on observability — knows the difference between alert noise and real signal.
加分项
- Pingora specifically.
- ISO 27001 or SOC 2 audit experience.
- Cloudflare API operations.
- Performance forensics (flame graphs, eBPF, perf).
90 天内的成功标准
- On-call shadowing for the first two weeks; primary by week six.
- One quantified reduction (alert noise, MTTR, capacity headroom).
- One deploy-pipeline improvement merged.
