So they are only half way correct about masking. The RFC does mandate that client to server communication be masked. That is only enforced by web browsers. If the client is absolutely anything else just ignore masking. Since the RFC requires a bit to identify if a message is masked and that bit is in no way associated to the client/server role identity of the communication there is no way to really mandate enforcement. So, just don't mask messages and nothing will break.
Fragmentation is completely unavoidable though. The RFC does allow for messages to be fragmented at custom lengths in the protocol itself, and that is avoidable. However, TLS imposes message fragmentation. In some run times messages sent at too high a frequency will be concatenated and that requires fragmentation by message length at the receiving end. Firefox sometimes sends frame headers detached from their frame bodies, which is another form of fragmentation.
You have to account for all that fragmentation from outside the protocol and it is very slow. In my own implementation receiving messages took just under 11x longer to process than sending messages on a burst of 10 million messages largely irrespective of message body length. Even after that slowness WebSockets in my WebSocket implementation proved to be almost 8x faster than HTTP 1 in real world full-duplex use on a large application.