Change Risk Expert

Identifying risks associated with change requests to avoid problems

Overview

“Never touch a running system” — unfortunately, running IT systems are not static systems. Applications need to be adapted, preventive changes carried out, bugs fixed, faulty configurations corrected, and updates applied. Changes bring with them the risk of failure. Seemingly innocent modifications can trigger cascading avalanches of service disruptions, in the worst case bringing entire companies to a standstill.

Changes are responsible for some 80% of all incidents that result in client outages. The more complex an IT system, the more difficult it becomes to estimate the effect of a change.

An essential aspect of an effective change management process is risk management, which aims to assess and mitigate the impact of changes to reduce any chance of failure. Today, IT service providers typically assess the risk of a change through the risk categorization approach performed either manually, or through a questionnaire. The manual approach to estimating risk is very subjective and in the worst-case biased. The questionnaire approach suffers from applying the same set of questions regardless of the type of change request submitted. There is, thus, a need for a more accurate risk assessment method that takes into consideration the unique context of each change request.

Another important issue is that, the larger an IT organization, the more difficult it becomes for individual change requesters to stay abreast of the success and failure reasons encountered by their colleagues.

Change Risk Expert

To address both issues, CRE employs an advanced classification method that goes beyond the current change classification systems and classifies changes finely in order to define a unique change context for accurate risk management and effectively share best practices and disseminate lessons learned in a targeted fashion by showing only the relevant information to the change requesters at a given time. Furthermore, CRE’s advanced real-time risk management capabilities ensure proper assessment and mitigation of change risks, thereby reducing the chance of failure.

For the change ticket classification, we choose to implement a regularized logistic regression as it has been shown to provide outstanding predictive performance across a range of text classification tasks and corpora. Although the maximum entropy classifier yields very high classification accuracy, the creation of labeled tickets is costly. This is further exacerbated by the fact that classifiers trained for support groups in one location cannot readily be transferred to other support groups performing the same task due to variations in lingua.

To reduce labeling costs we added active learning, experiment with transfer learning as well as a general expectation criteria classifier. Change tickets are classified using their short description (around 100 characters long and human-generated).

Publications

  1. Kadar, C.; Wiesmann, D.; Iria, J.; Husemann, D.; Lucic, M., "Automatic Classification of Change Requests for Improved IT Service Quality," 2011 Annual SRII Global Conference (SRII), pp.430-439, March 29 2011-April 2 2011.
  2. Güven, S.and Barbu, C.M., "A Real-Time Risk Assessment and Mitigation Engine Based on Dynamic Context," 2011 IEEE International Conference on Services Computing (SCC), pp.663-670, 4-9 July 2011.
  3. Kadar, C. and Iria, J., “Domain Adaptation for Text Categorization by Feature Labeling” in “Advances in Information Retrieval”, Lecture Notes in Computer Science, 2011, pp. 424-435.
  4. Mazilu, S. and Iria, J., "L1 vs. L2 Regularization in Text Classification when Learning from Labeled Features," icmla, vol. 1, pp.166-171, 2011 10th International Conference on Machine Learning and Applications and Workshops, 2011.
  5. Güven, S., Barbu, C., Husemann D., Wiesmann, D., “Change Risk Expert: Leveraging Advanced Classification and Risk Management Techniques for Systematic Change Failure Reduction”, 2012 IFIP/IEEE NOMS, 15-20 April 2012.