Cognitive IT services

A better way to run IT

Cognitive IT services focuses on the application, adaption and extension of machine learning and AI methods to improve how IT services are delivered. Using advanced NLP methods, we parse client requirements and match them to IT service capabilities. By mining system logs, we derive diagnostic and remediation plans to achieve autonomic IT management and combine them with deep predictive models to prevent outages before they occur.

In the IBM Services Platform with Watson, these and various other cognitive capabilities as well as various Watson application programming interfaces are combined in a cloud-delivered platform to design superior client solutions, achieve an exceptional level of autonomic service management, and facilitate a healthy “always-on” environment. It continuously learns and optimizes information technology performance to enable enhanced client business outcomes.

Learn more about the IBM Services Platform with Watson

Design

Cognitive Solution Designer

Designing an IT service solution for large client engagements involves analyzing a considerable amount of unstructured information. This includes client requirements, available offerings, prior related solutions, and the exchange of further documents between the service provider and the client. Applying cognitive computing capabilities to this process presents a significant innovation opportunity.

Our vision for the Cognitive Solution Designer is to transform the existing human-intensive solution design process into a cognitive state where human activities are augmented by cognitive automation. A key technical challenge is to provide capabilities for understanding client requirements that are communicated in natural language, for example in requests for proposals (RFP) and requests for services (RFS).

Another challenge is to enable a combination of cognitive assistance and automation to accelerate the process steps such as selecting solution components that fulfill the client requirements, configuring their detailed parameters, and preparing response documents. This requires digesting a rich set of knowledge sources including solution modules, prior solutions, as well as internal and external products. A high level of relevance and accuracy of the results is critical to ensure that the resulting responses truly address the client needs.

The solution design phase is critical for the entire service delivery process. First, the solution must fulfill the client requirements at a reasonable price. Next, it must correspond to existing delivery capabilities within the planned budget to enable a smooth transition and transformation phase. Finally, it must enable efficient high-quality steady-state service delivery to achieve lasting client satisfaction.

Cognitive Solution Designer is being developed in a multi-disciplinary approach and is an effort involving the IBM Research labs in Almaden, Beijing and Zurich as well as IBM business organizations to ensure the solution can and will be used in practice. Technologies for extracting functional and non-functional client requirements include linguistic and ontology-based features as well as classification using deep networks. For solution construction, we are moving towards text similarity using linguistic features, ontology and text entailment methods.

Manage

Complex Event Automation

Managing IT infrastructure is complex and costly because of the many variation points at all levels of a system stack (hardware, network, middleware, application, etc). Even with the best calibration of the system configuration, incidents happen due to unforeseen loads, hardware defects, application errors, etc.

To minimize service outages from incidents, variables on all levels of the system stack (disk usage, database hit ratio, etc.) are continuously monitored to detect the buildup of incidents as early as possible. Furthermore, to minimize the time until an incident is resolved and to minimize costs, incident responses such as system clean-up, component re-start and re-configuration are partially automated.

We are developing various analytical tools to support monitoring and incident response automation:

  • Learn from automation execution data on some servers whether and how to transfer incident response automation to other (differently configured) servers,
  • Learn from human incident response data how to automate incident responses,
  • Learn from historical monitoring data which sets of monitoring alerts are symptoms with a common cause,
  • Learn from historical monitoring data how to adjust monitoring to minimize noise and maximize signal.

To support incident response automation at scale, we are also developing an optimized representation of the diagnosis and remediation process that is used as incident response and is amenable to both human engineering and machine learning. This process representation features optimized decision trees and automated planners.

Optimize

PASIR: Predictive Analytics for Server Incident Reduction

To optimize IT performance, we apply deep predictive analytics that use statistical and data mining techniques to analyze historic and current data, and create rules and models to predict future critical events. These techniques enable smart asset management to realize lifecycle cost reduction and improve decision making, which in turn provides benefits along four dimensions:

  • Improved customer satisfaction and reliability,
  • Reduced total cost of ownership by prioritization of maintenance activities,
  • Better route planning and optimization of field support,
  • Improved overall compliance.

Recently, we have developed Predictive Analytics for Server Incident Reduction (PASIR), a novel, automated approach of selecting appropriate server modernization actions based on actual server behavior. Our methodology adopts the rationale used in condition-based and predictive maintenance for asset-intensive industries, which infers the time and type of maintenance from historic asset monitoring signals. PASIR identifies high-impact server incidents that signal either unavailability or performance degradation issues and correlates them to the server properties and utilization measures through multivariate methods.

The goal is to identify problematic server configurations and to prescribe the best fitting modernization measures via a what-if analysis to implement in an effort not only to reduce the number of troubled servers, but also the volume of such critical incidents. Since 2014, PASIR has been broadly deployed on 130+ IBM-serviced IT environments pertaining to various industries. The predictions have been used to plan refresh programs and to identify at-risk application environments and servers suited for cloud migration. They have also contributed to proposal cost penalty analyses for problematic servers.

PASIR has been extended to predict failures for network devices (PAND) and security services (PASS). Both PAND and PASS have been deployed over more than 10,000 Cisco and Juniper devices (e.g., routers, firewalls, load balancers) across various IT environments. Early failure triggers are identified and then fed back into the ticketing system to enable support teams to schedule maintenance operations proactively. In addition, predictions contribute to the roadmap planning for end-of-life products, including their migration to an efficient strategic platform, as well as recommends platform configuration, and best practices for consolidation and deployment relative to usage patterns.

More predictive maintenance use cases.