ideally the "alert" tex wouldn't be in the chat window. It would be on a large overlay at the top of the screen. The whole point there being that any time you can hear a sound, you can also receive "alert" messages regardless of what you're doing.
Anyway, my thinking on MIDI based/sequential tone devices:
Tones (like minecraft note blocks) that run on a server dependant clock are irritating and half assed implementations for their purpose.
Noteblocks are super fun and cool, for instance, in SP Minecraft because you've got a constant clock speed running the sim from your machine. If I design a noteblock device on my PC, I know that its going to play back at the same rate every time I use the device.
The same noteblocks end up being a stuttering incomprehensible mess when you go online because the timing of those pulses is inconsistent due to the way internet protocols work. A delay of a fraction of a second due to a few packets lost in transit having to be resent, multiplied by multiple lines of execution results in a stuttering and unpleasant experience that's like hearing someone with no rhythm try to play the notes. There's no way to guarantee the precise timing of server/client communications to the degree required by music.
These miniscule timing errors are largely unnoticeable for most things, and a lot of prediction and interpolation and other tricks are used in online games to smooth them out but the issue with music is that its nature prevents it from being programmatically predicted in the same was as movement, gravity, etc.
For this reason I don't think YOLOL, or indeed any server based sound timing for sequences like this is a good idea. Every game with 'live" midi style instrucments has the same problem for the exact same reason. Music is heavily dependant on precise timing, and the internet's Ip protocol is engineered to sacrifice timing for stability of connection.
If in stead we had a device that could package a sequence of sounds, and then transmit the ENTIRE phrase, complete with timing, you'd get a much better result. In the case of "transmit the entire sequence" you can verify sending and receipt of the entire thing before the client plays it, ensuring that it always plays back in exactly the timing it was intended. In stead of multiple notes being .002 seconds off piling up to the whole sounding janky, you just have a .002 second delay before a smooth playback is triggered, and its not even noticeable.
If you use standard MIDI for something like this, your file size is really quite small since you're not actually transmitting sound, just a MIDI note sequence, and you could easily construct these kinds of alerts with such a device and have them always play back in a cohesive and pleasant manner. Hell you could even use a version of this device that's essentially a sound board if you wanted, with various recorded words/phrases. The down side being, of course, that you'd have to have an interface separate from existing interfaces and YOLOL to author the content.