Part 1 of this blog will describe ULS and Event log improvements and what you need to know. The new changes simplify troubleshooting and provide the Administrator more flexibility in managing different aspects of ULS logging. I will expand on this topic in Part 2 and will discuss Developer Dashboard. A special shout-out goes to Dan Winter for the knowledge transfer and technical review.
Correlation ID
A deep investment has been made to improve Correlation ID in SharePoint 2010. Events such as page loads and other common events within ULS logs have a corresponding Correlation ID. This ID is grouped with other events based on the same conversation. For example, User A hits a WFE and accesses a site. All the events in the ULS logs that correlate with User A’s login all share the same correlation ID. This makes mining the ULS logs easy in that you can filter based on correlation ID. If the conversation hits multiple servers in the farm, those other servers maintain the exact same correlation ID. A common scenario for this is when a WFE is calling a web service on an application server. It’s also possible to filter on Correlation ID via a SQL profiler trace so you can truly get a farm wide picture of what happened at each level of the conversation. SharePoint error pages also expose a correlation ID along with time stamp of the error so now it’s simple to find the conversation in the servers corresponding ULS logs.
For Example:
Event Throttling
ULS and Event logging now has a richer category based management in that you can set specific subcategories to a unique level of logging while throttling other subcategories that belong to same root category.
For Example, troubleshooting query processor, turn up verbose on query processor component:
In the example above, the root category is bold it flags an administrator when unique logging has been set within a root category. Also, note the different logging levels which can be set on each sub category.
When the issue has been resolved or custom logging levels are no longer required, you can now set logging back to default:
Log File Retention
In SharePoint 2007, the default log file retention policy was to set log files to generate once every 30 minutes with a total of 96 log files at any given time. This equates to 48 hours. ULS logging has been improved by reducing log file size by at least 50 %. This allows a completely new approach to setting log file retention policy. The default setting is now at # of days with the default being 14. A new option exists to restrict trace logs to a fixed disk size. This is unchecked by default.
For Example:
Once day 15 hits, log files from day 1 are pruned. If restrict trace log setting is enforced to a fixed disk size, then both settings are enforced. The order on which the settings are enforced is dependent upon which condition is hit first.
For Example:
Number of days to store log files: 14
Restrict trace log disk space usage: 2 GB
SharePoint administrator is chasing a problem and cranks up some verbose logging due to profile provision problem. This causes the total ULS logs size to hit 2 GB on day 3. In this case, the oldest logs are pruned to honor the restrict trace log setting. These settings are configurable via the Set-SPDiagnosticConfig cmdlet.
Note: The highest level you can set the trace log disk space usage is 1 TB.
Event Log Flood Protection
The name says it all. This feature will prevent one event from flooding the event log. This is enabled by default within diagnostics logging:
With this option enabled, if the same event fires 5 times within 2 minutes, we will suppress the event from logging and throw a summary event at the 2 minute mark with a nice description of how many times the event actually fired. This is the default behavior and can be customized using PowerShell via the Set-SPDiagnosticConfig cmdlet.
Note: To qualify has the same event is an event with the same event source and ID. The event severity and/or text doesn’t apply and is not checked.
Manage logging using PowerShell
A more hardcore SharePoint administrator is likely to use PowerShell for all administration including managing ULS logs. The PG has given us a very rich set of command-lets to do things like get\set diagnostic levels and retention policies. It’s possible to mine ULS logs using PowerShell for monitoring or troubleshooting purposes. I’ll discuss each area below and provide some examples.
Retrieve logging levels
The following cmdlet retrieves logging levels:
Get-SPLogLevel – Displays a list of objects or diagnostic levels
The cmdlet pulls all logging levels but what if you wanted to filter this result by area or specific category?
Examples:
To retrieve all logging categories by area “WSS Search Diagnostics”:
Get-SPLogLevel –identity “WSS Search Diagnostics:*” | FL
To retrieve a specific logging category like Query Processor within the WSS Search Diagnostic area:
Get-SpLogLevel –identity “WSS Search Diagnostics:Query Processor” | FL
Set logging levels
The following cmdlet sets logging levels:
Set-SPLogLevel – Sets trace logging and event logging levels to a category
Note: No parameters are set to required but you still need parameters to correctly set logging levels
Examples:
To set the entire area “WSS Search Diagnostics” to event log level of monitorable and trace level to verbose:
Set-SPLogLevel -TraceSeverity Verbose -EventSeverity Warning –Identity “WSS Search Diagnostics:*”
It’s possible set a specific category within an Area. For Example, setting Query Processor to verbose within WSS Search Diagnostics:
Set-SPLogLevel -TraceSeverity Verbose -EventSeverity Warning –Identity “WSS Search Diagnostics:Query Processor”
Note: Logging levels can be reset back to default level using Clear-SPLoglevel
Creating and Merging Log files
It’s possible to force a new uls log to be created by running new-splogfile. This will end the current log file and start a new one. This is valuable when attempting to reproduce a problem and want to minimize the total size of the log file. Merge-SPLogFile combines trace log entries from all farm computers into a single log file on the local computer. It does this by creating a one-time timer job and executing it to create the merged log. This is extremely valuable if you are tracking an issue and suspect it’s hitting multiple servers in a farm. Since the conversation across multiple servers will contain the exact same correlation ID, the merge-splogfile becomes a powerful cmdlet for troubleshooting.
Some examples of Merge-SPLogFile:
Merge-SPLogFile -Path “C:\Logs\FarmMergedLog.log” -Overwrite
The preceding example merges the last hour of log data from all farm computers with no filtering.
Merge-SPLogFile -Path “C:\Logs\FarmMergedLog.log” -Overwrite -Area Search
The preceding example merges the last hour of log data from the Search area
Additional parameters are available to tune results. For Example, looking for something specific in the ULS message you can filter results using the Message parameter.
Mining ULS Logs via PowerShell
It’s possible to view ULS log entries using the Get-SPLogEvent cmdlet. At beta 2, Running this cmdlet by itself retrieves every event from every ULS log within the directory hosting ULS logs. Some examples of running Get-SPLogEvent:
Example 1: ULS log or logs are moved to a different directory named “newpath”, run the following:
Get-SPLogEvent –Directory “C:\newpath”
Example 2: Retrieve all the events within a certain time period run the following:
Get-SPLogEvent -StartTime “12/04/2007 17:00” -EndTime “12/04/2007 18:00”
Example 3: Retrieve the first 10 events and last 2 events from logs directory:
Get-SPLogEvent | select –first 10 –last 2
Example 4: Retrieve all events with same correlation ID:
Notice the message: “Warning: column “Message” does not fit into the display and was removed.
You can include the message by performing the following:
Note: The more columns you include, the more difficult it is to read the message because it wraps.
Example 5: Retrieve all events based on a specified level.
Get-SPLogEvent | ?{$_.level –eq “critical”}