As most of you are aware, on Wednesday, June 12, we experienced an unplanned outage between the hours of 10:56 a.m. CDT (GMT-6) and 1:32 p.m. CDT (GMT-6). It is always our goal to provide uninterrupted service, and we sincerely regret the incident. Our CTO, Stephen Franklin, and I want to assure you that the LeanKit team takes the service of our customers very seriously.
The root cause of the interruption on Wednesday was a major service outage affecting our hosting provider’s Chicago and Dallas data centers. Full details can be found here. (I expect that more information will be added as their research of the outage continues.)
Throughout the incident, we were in constant contact with our hosting provider. As a result, we were able to restore service for LeanKit customers about three hours before our hosting provider fully resolved their network outage. At no time was there any risk of customer data loss; only network connectivity was affected.
Regardless of the root cause of the service interruption, we believe it is our responsibility to ensure system availability. We are currently taking steps to ensure that this particular type of service interruption is not repeated. As we continue to develop our infrastructure and disaster recovery capabilities, we will keep you up to date here, on our blog.
Again, Stephen and I will take every step to ensure that this type of service interruption is not repeated. Please do not hesitate to contact Stephen (firstname.lastname@example.org) or me (email@example.com) at any time. We welcome your direct feedback and questions.
Please join our partners at the Lean Software Institute for their next webinar, “Lean as an Organizational Learning System” on October 4
About the webinar:
We all admire companies like Toyota and Apple that really care about innovation and the relentless pursuit of perfection – and pull it off! That pursuit is of course never-ending, as customers expect more and as competitors continue to improve. As David Allen likes to say, “The better you get, the better you’d better get!”
None of these approaches are sufficient in themselves, however, because they don’t explain how an organization actually DOES to get better. What are the actual processes than need to be in place? What are the biggest practical challenges? How do we track progress?
In this webinar we will discuss how Lean Management can help software executives mobilize their employees and managers to learn faster than the competition and deliver more value faster to customers and shareholders.
Our partners at the Lean Software Institute have just announced their new webinar series, “Fit for the Future: Lean and the Software Industry“, and have invited us to join them in presenting the first installment, “Beyond Kanban: Lean as an Operating System“.
September 7, 2011, 10:00 PST / 13:00 EST / 19:00 CET
Duration: One hour
In this webinar we will discuss how to go beyond Value Stream Mapping and Kanban Boards to create a representation of a business as a “system of systems”. You will learn about the Lean Software Institute’s five-dimensional model for describing business systems, including Product Development Systems. We show how this model can provide breakthrough insights into why organizations encounter performance challenges.
This webinar series is aimed at CXOs and other senior executives in the IT industry who are attempting to improve their organizations’ productivity, accelerate innovation, enhance financial performance, and improve employee engagement.
Frode L. Odegard, Founder & CEO, Lean Software Institute
Chris Hefley, Co-Founder & CEO, LeanKitKanban
Are you a software developer, devops, agile coach, or IT project manager in the Southeastern U.S.? If so, you should definitely head down to Chattanooga, TN this week, for DevLink 2011. For the past several years, DevLink has consistently been the one of the very best regional IT/software conferences in the country.
The LeanKit crew will be there, showing off our agile project management tool, LeanKit Kanban, on our big touchscreen and talking about Kanban in the Open Spaces. Stop by and see us, or come by the Open Spaces to learn more about Kanban.
LeanKit is also hiring in 2011 and 2012 – so if you’re a developer, agile coach, project manager, devops guy/gal in the area we’d like to meet you. If you’ve ever wanted to work for a cutting-edge startup company, but don’t want to pick up and move to Silicon Valley to do it, then LeanKit may just be the place for you.
We are pleased to announce that we’ll be partnering with VersionOne to provide advanced Kanban system support in an integrated solution with their enterprise Agile Lifecycle Management tool.
Read more about this news update here.
Earlier today we again had reports of slow response times. In some instances the application was completely unresponsive. As yesterday, we performed routine corrective actions and were able to resolve the immediate issue. When the issue began to recur later in the day we were able to quickly resolve it. But, of course, we weren’t satisfied with these temporary fixes.
Deeper analysis of our log files revealed a very large spike in network bandwidth utilization during the incidents, which we traced to an unusually larger volume of API usage by a customer. We are working with this customer to ensure that their integration solutions do not risk system performance for other customers. To prevent this issue from re-occuring in the future we are working on measures to automatically throttle-back API calls by any customer in the event their usage is causing a system-wide problem.
Until this is in place, we will be actively monitoring the situation on a 24-hours basis looking for any indication of a problem so that we can take immediate corrective action before customers are affected. To ensure that we are immediately notified of other issues in the future, we have now implemented automated monitoring and alerting from the following global locations. And we have broadened the scope of this monitoring so it covers all functional areas within the application.
New York, New York
We once again apologize for the problem and any inconvenience you may have suffered. We deeply appreciate those customers who provided us additional information to help isolate the problem.
CTO and Co-Founder
Earlier today, a number of customers reported slow response times when using LeanKit Kanban. When we investigated we found that we could replicate the issue intermittently but not consistently. The majority of response times for our test account were a little bit worse than average but not outside of normal parameters. But for a few accounts, we confirmed that response times were extremely slow. Investigation into the hosting infrastructure did not revel any obvious causes. Resource utilization metrics were above average for that time period but again not outside of normal parameters.
Although we didn’t have a clear root cause, we began to take standard corrective actions based on the most probable causes. Response times returned to normal based on our own testing and feedback from those customers who had initially reported the issue.
Even though the immediate issue is resolved, we are by no means done investigating. We are currently combing over all log information to uncover the root cause. However, this incident has already revealed some areas where we need to improve. Our current monitoring and alerting did not raise this issue quickly enough. We had to find out from customer. We know that’s not acceptable.
So, we are going to greatly expand our automated monitoring & alerting to include numerous test organizations instead of the one organization that we are using now. And we will expand the locations from which we monitor response time to provide better global coverage.
As we learn more, we will keep you posted. We appreciate your patience and sincerely apologize for any issues we may have caused for you.