On March 21st, from 1:32:47 am to 7:20 am +1 GMT, our platform was experience connection issues. Approximately 25% of traffic, within this time frame, was lost mostly to 2 customers from America.
Information was not lost, so you do not need to take any futher action. This is just an informative note, seeing as we want to be transparent.
Here are the technical issues explained:
- 1 server had a an issue on a HD RAID, unable to restore it server went unstable.
- Load Balancer on OVH failed to send all traffic automatically to the other servers.
- We were notified by 2 customers having issues accessing their sites. Somehow most of the traffic to their sites was still directed to the failed server.
- 176801 petitions were served during those hours, should have been 225079 , aprox 25% lost.
We have an alert system that notify us by phone if anything happens, this failed since we were getting correct HTTP responses. Then on our status https://status.yclas.com/ page we did get notified, but only by email, we were "happily" sleeping :(. Our support team did not notice the service was down since they were redirected to the correct IP.
We are taking measures so this does not happen again, we are buying new servers (this was in progress before), adding new notification systems and hopefully this issue wouldn't repeat itself.
Remember you can always check our status at https://status.yclas.com/ where we have a 99'88% still ;)
Thanks and sorry for any inconvenience.