←back to thread

15 points jbryu | 2 comments | | HN request time: 0.418s | source

I’m hosting a turn-based multiplayer browser game on a single Hetzner CCX23 x86 cloud server (4 vCPU, 16GB RAM, 80GB disk). The backend is built with Node.js and Socket.IO and is run via Docker Swarm. I use also use Traefik for load balancing.

Matchmaking uses a round-robin sharding approach: each room is always handled by the same backend instance, letting me keep game state in memory and scale horizontally without Redis.

Here’s the issue: At ~500 concurrent players across ~60 rooms (max 8 players/room), I see low CPU usage but high event loop lag. One feature in my game is typing during a player's turn - each throttled keystroke is broadcast to the other players in real-time. If I remove this logic, I can handle 1000+ players without issue.

Scaling out backend instances on my single-server doesn't help. I expected less load per backend instance to help, but I still hit the same limit around 500 players. This suggests to me that the bottleneck isn’t CPU or app logic, but something deeper in the stack. But I’m not sure what.

Some server metrics at 500 players:

- CPU: 25% per core (according to htop)

- PPS: ~3000 in / ~3000 out

- Bandwidth: ~100KBps in / ~800KBps out

Could 500 concurrent players just be a realistic upper bound for my single-server setup, or is something misconfigured? I know scaling out with new servers should fix the issue, but I wanted to check in with the internet first to see if I'm missing anything. I’m new to multiplayer architecture so any insight would be greatly appreciated.

1. toast0 ◴[] No.44390489[source]
What are your processes waiting on? in Linux top, show the WCHAN field. In FreeBSD top, look at the STATE field. Ideally, your service processes are waiting on i/o (epoll, select, kqread, etc) or you're CPU limited.

Is there any cross-room communication? Can you spawn a process per room? Scaling limited at 25% CPU on a 4 vcpu node strongly suggests a locked section limiting you to effectively single threaded performance. Multiple processes serving rooms should bypass that if you can't find it otherwise, but maybe there's something wrong in your load balancing etc.

Personally, I'd rather run with fewer layers, because then you don't have to debug the layers when you have perf issues. Do matchmaking wherever with whatever layers, and let your room servers run in the host os, no containers. But nobody likes my ideas. :P

Edit to add: your network load is tiny. This is almost certainly something with your software, or how you've setup your layers. Unless those vCPUs are ancient, you should be able to push a whole lot more packets.

replies(1): >>44391792 #
2. jbryu ◴[] No.44391792[source]
So when running `top` WCHAN shows `ep_poll` most of the time and sometimes `-`. Even when the game starts lagging this pattern stays pretty consistent.

There is no cross-room communication. I could spawn a process per room but I was trying to address this issue with my current Docker setup where I have multiple `game` containers that run a single node.js process and each process can host multiple rooms.

Not having to use Docker sounds simpler but it's that's where I'm at atm haha.

I agree that the network load feels very small. Maybe it's a socket.io related issue where when many broadcasts are being fired at once, then a shared I/O step gets bottlenecked?

Here's my actual typing broadcast code, I was originally broadcasting from the socket event callback itself but I found performance improved slightly by batching broadcasts per player in a setInterval loop (also note that only 1 player in a given room can be typing at once, so batching broadcasts per room shouldn't address the bottleneck).

  /**
   * Used to handle very frequent typing events more gracefully to avoid overloading CPU
   */
  const TypingUsersMap = new Map<
    ConnectionId,
    {
      socketId: string | null; // doesn't exist for bots
      roomId: PublicRoomId;
      userId: UserId;
      currentInput: string;
    }
  >();

  type ConnectionId = `${UserId}:${PublicRoomId}`;

  // ! this should be same as client throttle interval
  const TYPING_BROADCAST_INTERVAL = 200;

  export let typingBroadcastInterval: NodeJS.Timeout | undefined = undefined;
  export const startTypingBroadcastJob = () => {
    typingBroadcastInterval = setInterval(() => {
      const freshTypingUsersMap = new Map(TypingUsersMap);
      TypingUsersMap.clear();

      if (freshTypingUsersMap.size === 0) return; // Nothing to do

      // Go through each user that has a pending update
      for (const [_connectionId, data] of freshTypingUsersMap.entries()) {
        const socket = data.socketId
          ? io.sockets.sockets.get(data.socketId)
          : undefined;

        // Use the data we stored to perform the broadcast
        if (socket) {
          // emit to other players
          socket
            .to(data.roomId)
            .volatile.emit(
              SOCKET_EVENT_NAMES.USER_TYPING_RES,
              data.userId,
              data.currentInput
            );
        } else {
          // bots emit to everyone
          io.to(data.roomId).volatile.emit(
            SOCKET_EVENT_NAMES.USER_TYPING_RES,
            data.userId,
            data.currentInput
          );
        }
      }
    }, TYPING_BROADCAST_INTERVAL);
  };

  export const stopTypingBroadcastJob = () => {
    if (typingBroadcastInterval) {
      clearInterval(typingBroadcastInterval);
      typingBroadcastInterval = undefined;
    }
  };

  // this is called from the USER_TYPING socket event callback. so effectively every throttled keystroke by the user gets queued.
  export const queueTypingEvent = ({
    socketId,
    roomId,
    userId,
    currentInput,
  }: {
    socketId: string | null;
    roomId: PublicRoomId;
    userId: UserId;
    currentInput: string;
  }) => {
    const connectionId: ConnectionId = `${userId}:${roomId}`;
    TypingUsersMap.set(connectionId, {
      socketId,
      roomId,
      userId,
      currentInput,
    });
  };