Friday, April 27, 2012

[OpsMgr 2007] How to troubleshoot Event ID 2115 in Operations Manager

Article ID: 2681388 - Last Review: April 17, 2012 - Revision: 1.1
 
Microsoft has published KB2681388 to expose how to troubleshoot Event ID 2115 on your RMS

Symptomes :
In Operations Manager, one of the performance concerns surrounds Operations Manager Database and Data Warehouse insertion times. The following is a description to help identify and troubleshoot problems concerning Database and Data Warehouse data insertion.

Examine the Operations Manager Event log for the presence of Event ID 2115 events. These events typically indicate that performance issues exist on the Management Server or the Microsoft SQL Server that is hosting the OperationsManager or OperationsManager Data Warehouse databases. Database and Data Warehouse write action workflows run on the Management Servers and these workflows first retain the data received from the Agents and Gateway Servers in an internal buffer. They then gather this data from the internal buffer and insert it into the Database and Data Warehouse. When the first data insertion has completed, the workflows will then create another batch.

The size of each batch of data depends on how much data is available in the buffer when the batch is created, however there is a maximum limit on the size of the data batch of up to 5000 data items.  If the data item incoming rate increases, or the data item insertion throughput to the Operation Manager and Data Warehouse databases throughput is reduced, the buffer will then accumulate more data and the batch size will grow larger.  There are several write action workflows that run on a Management Server.  These workflows handle data insertion to the Operations Manager and Data Warehouse databases for different data types.  For example:
  • Microsoft.SystemCenter.DataWarehouse.CollectEntityHealthStateChange
  • Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData
  • Microsoft.SystemCenter.DataWarehouse.CollectEventData
  • Microsoft.SystemCenter.CollectAlerts
  • Microsoft.SystemCenter.CollectEntityState
  • Microsoft.SystemCenter.CollectPublishedEntityState
  • Microsoft.SystemCenter.CollectDiscoveryData
  • Microsoft.SystemCenter.CollectSignatureData
  • Microsoft.SystemCenter.CollectEventData

When a Database or Data Warehouse write action workflow on a Management Server experiences slow data batch insertion, for example times in excess of 60 seconds, it will begin logging Event ID 2115 to the Operations Manager Event log. This event is logged every one minute until the data batch is inserted into the Database or Data Warehouse, or the data is dropped by the write action workflow module. As a result, Event ID 2115 will be logged due to the latency inserting data into the Database or Data Warehouse. Below is an example Event logged due to data dropped by the write action workflow module: 

Event Type: Error
Event Source: HealthService
Event Category: None
Event ID: 4506
Computer: <RMS NAME>
Description:
Data was dropped due to too much outstanding data in rule "Microsoft.SystemCenter.OperationalDataReporting.SubmitOperationalDataFailed.Alert" running for instance <RMS NAME> with id:"{F56EB161-4ABE-5BC7-610F-4365524F294E}" in management group <MANAGEMENT GROUP NAME>.


Event ID 2115 contains 2 significant pieces of information.  First, the name of the Workflow that is experiencing the problem and second, the elapsed time since the workflow began inserting the last batch of data. 

For example:

Log Name: Operations Manager
Source:        HealthService
Event ID:      2115
Level:         Warning
Computer:      <RMS NAME>
Description:
A Bind Data Source in Management Group <MANGEMENT GROUP NAME> has posted items to the workflow, but has not received a response in 300 seconds.  This indicates a performance or functional problem with the workflow.
 Workflow Id : Microsoft.SystemCenter.CollectPublishedEntityState
 Instance    : <RMS NAME>
Instance Id : {88676CDF-E284-7838-AC70-E898DA1720CB}


This particular Event ID 2115 message indicates that the workflow Microsoft.SystemCenter.CollectPublishedEntityState, which writes Entity State data to the Operations Manager database, is trying to insert a batch of Entity State data and it started 300 seconds ago.  In this example the insertion of the Entity State data has not yet finished.  Normally inserting a batch of data should complete within 60 seconds.  If the Workflow Id contains Data Warehouse then the problem concerns the Operations Manager Data Warehouse.  Otherwise, the problem would concern inserting data into the Operations Manager database.

Cause :

As the description of Event ID 2115 states, this may indicate a database performance problem or too much data incoming from the agents. Event ID 2115 simply indicates there is a backlog inserting data into the Database; Operations Manager or Operations Manager Data Warehouse. These Events can originate from a number of possible causes. For example, a large amount of Discovery data, a Database connectivity issue or full database condition, potential disk or network constraints.

In Operations Manager, Discovery data insertion is a relatively expensive process. We define a burst of data as a short period of time where a significant amount of data is received by the Management Server. These bursts of data can cause Event ID 2115 since the data insertion should occur infrequently. If Event ID 2115 consistently appears for Discovery data collection, this can indicate either a Database or Data Warehouse insertion problem or Discovery rules in a Management Pack collecting too much discovery data.

Operations Manager configuration updates caused by Instance Space changes or Management Pack imports have a direct effect on CPU utilization on the Database Server and this can impact Database insertion times. Following a Management Pack import or a large instance space change, it is expected to see Event ID 2115 messages. For more information on this topic please see the following:

2603913 - How to detect and troubleshoot frequent configuration changes in Operations Manager (http://support.microsoft.com/kb/2603913 (http://support.microsoft.com/kb/2603913)

If the Operations Manager or Operations Manager Data Warehouse databases are out of space or offline, it is expected that the Management Server will continue to log Event ID 2115 messages to the Operations Manager Event log and the pending time will grow higher.

If the write action workflows cannot connect to the Operations Manager or Operations Manager Data Warehouse databases, or they are using invalid credentials to establish their connection, the data insertion will be blocked and Event ID 2115 messages will be logged accordingly until this situation is resolved.

In Operations Manager, expensive User Interface queries can impact resource utilization on the Database which can lead to latency in Database insertion times. When a user is performing an expensive User Interface operation it is possible to see Event ID 2115 messages logged.


Event ID 2115 messages can also indicate a performance problem if the Operations Manager Database and Data Warehouse databases are not properly configured. Performance problems on the database servers can lead to Event ID 2115 messages. Some possible causes include the following:
  • The SQL Log or TempDB database is too small or out of space.
  • The Network link from the Operations Manager and Data Warehouse database to the Management Server is bandwidth constrained or the latency is large. In this scenario we recommend to Management Server to be on the same LAN as the Operations Manager and Data Warehouse server.
  • The data disk hosting the Database, logs or TempDB used by the Operations Manager and Data Warehouse databases is slow or experiencing a function problem. In this scenario we recommend leveraging RAID 10 and we also recommend enabling battery backed Write Cache on the Array Controller.
  • The Operations Manager Database or Data Warehouse server does not have sufficient memory or CPU resources.
  • The SQL Server instance hosting the Operations Manager Database or Data Warehouse is offline.

It is recommend that the Management Server reside on the same LAN as the Operations Manager and Data Warehouse database server.

Event ID 2115 messages can also occur if the disk subsystem hosting the Database, logs or TempDB used by the Operations Manager and Data Warehouse databases is slow or experiencing a function problem. In this scenario we recommend leveraging RAID 10 and we also recommend enabling battery backed Write Cache on the Array Controller. 

Resolution :
Microsoft propose 6 scenarios to solve the issue.



This posting is provided "AS IS" with no warranties.

No comments:

Post a Comment