Here Chema Yclas Founder.
It's been a really long time since we had to post a service disruption blog post. But we believe in transparency; that’s why you have publicly available our servers status at status.yclas.com
Last night from 2019-08-10 00:59:12 to 2019-08-10 07:03:01 GMT +1 for a period of 6 hours and 3 minutes our service was down.
After this incident our service has a 99.627% uptime in the last 30 days, making it incredibly reliable. We are used to offer around 99.8% in the last year since we moved all the services to amazon AWS.
We are extremely sorry about this down time and we are going to take measures so it does not happen again.
This a detailed log of what has happened, a series of misfortunes, bad timing, and human error:
- At 2019-08-10 00:59:12 me and Javier (our 24/7 sysadmin) got a notification of service down
- I went to sleep at 00:25 and all systems were perfectly fine, got the notifications but did not hear them since I had my phone in another area (I am currently on holidays).
- Javier it’s 24/7 with monitoring systems, he was sick and before going to sleep at 00:45 took some medications and felt deeply sleep, did not hear the notifications either.
- Javier woke up at 06:00 am and saw the notifications.
- Immediately realized that the pool of available servers in AWS was reduced to 0
- Not any notification from Amazon was sent before so we could prevent this from happening.
- Javier added 2 new different server instances that are not auto scalable.
- Services were online after 1 hour of work
- Contact Amazon about this incident and wrote a complaint for not notifying before.
- Modification of number of servers in the availability pool, static resources
- Notification of down time to an extra team member who is in another time zone
I need to say that Javier has done ALWAYS a wonderful work at yclas since he started managing our systems we are having the best uptime and reliability we ever had. We are extremely grateful for that.
Thanks and sorry again Chema Yclas Founder.
Service Disruption 2019-08-102019-08-10