←back to thread

391 points whoishiring | 1 comments | | HN request time: 0.216s | source

Please state the location and include REMOTE for remote work, REMOTE (US) or similar if the country is restricted, and ONSITE when remote work is not an option.

Please only post if you personally are part of the hiring company—no recruiting firms or job boards. One post per company. If it isn't a household name, explain what your company does.

Please only post if you are actively filling a position and are committed to responding to applicants.

Commenters: please don't reply to job posts to complain about something. It's off topic here.

Readers: please only email if you are personally interested in the job.

Searchers: try http://nchelluri.github.io/hnjobs/, https://hnresumetojobs.com, https://hnhired.fly.dev, https://kennytilton.github.io/whoishiring/, https://hnjobs.emilburzo.com, or this (unofficial) Chrome extension: https://chromewebstore.google.com/detail/hn-hiring-pro/mpfal....

Don't miss these other fine threads:

Who wants to be hired? https://news.ycombinator.com/item?id=43243022

Freelancer? Seeking freelancer? https://news.ycombinator.com/item?id=43243023

1. ruuda ◴[] No.43244997[source]
Chorus One | https://chorus.one/careers | Platforms Engineer | REMOTE (Switzerland ± 6 hours)

Chorus One operates validators on many proof-of-stake blockchains (the ones where security is based on a Byzantine fault-tolerant consensus algorithm rather than wasting energy). We are hiring for several roles, but the one I will highlight is what we call Platforms Engineer. Some companies call this Site Reliability Engineering or Devops.

The main thing we do is take upstream software, build it, run it on our infrastructure, and then monitor it and optimize that setup. Some things that make this interesting are:

    * Building automation that enables us to do this for many networks (70+ currently).
    * Doing this with high uptime, building automation for failover, etc.
    * Working with software that is on the one hand cutting-edge and doing interesting things (consensus algorithms, distributed systems, cryptography), but on the other hand that means it’s immature and often not easy to operate and monitor. Often we have to build custom tools, and dive into the source code of the project. We contribute patches upstream when it makes sense.
    * Some of these projects are exercising the limits of what a machine can do, we have to do some low-level investigation that requires understanding of what the Linux kernel and network hardware are doing to properly identify what’s going on.
We do have a small cloud footprint, but run primarily on bare metal. We are looking for people who can not just configure services offered by the public clouds, but who deeply understand what lies below; people who could build their own cloud. That sounds a bit pretentious and it’s not exactly what we do, but it does involve many of the same aspects.

A very recent example of what I personally find fascinating: last week Ethereum’s Holesky testnet experienced loss of liveness. This is a real-world, globally distributed system that implements Byzantine fault tolerance, with multiple independent implementations of the protocol. Several of these implementations had a bug in an update, which caused a split in the network. The protocol is designed to handle this situation in theory, but in practice it is triggering previously unexplored failure modes in the implementations, that are hard to test for in synthetic small-scale tests. I think there are very few places where you get to be involved in a planet-scale distributed system exhibiting “interesting” behavior, especially one that is not in control of a single entity. Of course, there is also the less fun part that the testnet is now broken, alerts are firing, and it’s hard and chaotic to coordinate a fix when the network is not controlled by a single entity. Fortunately it’s a testnet.

Apply at https://careers.chorus.one/o/platforms-engineer-remote.