,

Troubleshooting Security Only Crawl

Posted by

What is Security Only Crawl?



Before troubleshooting Security Only Crawl’s, let’s start out with what is a security only crawl?  Security Only crawl’s take place when users are added/removed from Sharepoint groups and/or explicitly added/removed from a list.  When incremental crawl starts, these security changes, “Updated ACL’s”, must be pushed down to all affected items within the index.  We security trim these results.


For Example, User A is able to search on all items in a specific document library named HR Docs.  You don’t want User A to be able to search and discover documents which were crawled previously from Document Library named HR Docs.  You remove User A from the only Sharepoint group that has permissions to view these documents and iniate an incremental crawl.  What will happen is that the security change will be discovered and the updated ACL will be processed.  The gatherer will commit the security changes to all items within HR Docs library.  When the security only crawl is completed, User A will not return results for querying items within HR Docs library.


Note:   The items within the document library are not fetched but the security of the item is. 


Troubleshooting Security only crawls?


This is an interesting topic and one that is rarely documented which is the primary reason for this blog.  What would necessarily lead you down the path to troubleshooting security only crawls?  Well there are a few scenarios such as:


1.)  User A is still able to search and return items which he shouldn’t be due to being removed from Sharepoint groups.


2.)  User A is not able to search items even though he was recently added to a Sharepoint group with rights to the items in question.


3.)  Incremental crawls are intermittently taking a long time to crawl very few items.


Scenario three is more frequent and tends to cause the most confusion for Sharepoint Administrators so I’ll discuss this in more detail.  What you will learn from scenario three should help you in troubleshooting scenarios one and two.   
 


Scenario 3:  My incremental crawls are taking longer than expected to crawl a very small # of items.


You perform an incremental crawl which takes 30 minutes.  You perform an incremental crawl the following day and crawl the same # of items and it takes 3 hours.  This starts to become a trend and typically raises a red flag.   The other symptom is that the crawl status page within Central Administrator will appear as crawling with no items updated for a long period of time.  Eventually, the crawl status will update with crawl complete.  The assumption is that the crawler appears to be getting hung or stuck during these long incremental crawls.  The problem is that you really don’t know if the crawler is still running or not.  I typically see the above symptoms when security only crawls are taking place.  Please keep in mind their are several moving parts to the incremental crawl process and I’m just identifying a common one.   


Note:  There is no easy way to tell if a security only crawl is running.  The reason is because the search status page doesn’t log any security related events.  The only thing you will see is that it’s still crawling.


How can you verify that security only crawls are running? 


I found two methods to assists in determining if security only crawls are occurring. 


 


Method 1:


Whenever a document is added/removed or User is added/removed from a group, this change gets logged in the change log “eventcache table” within that associated content database.  Please see my original blog in determining which content database you should be checking the change log post incremental crawl period.  The key is comparing the change log between short/long incremental crawls to see what’s different between the two. 


The eventtypes column within Eventcache table contains the type of changes that the crawler will enumerate through during incremental crawl time. 
This is one way to determine if security only crawls are contributing to your long crawl times.  You will need to query the eventtype column for all events that were part of your long running incremental crawl. 


You would be looking for event types 4194304 and 2097152.
 
2097152 – “A user is added to a Sharepoint Group”
4194304 – “A user is deleted from a Sharepoint Group”


Steps are the following:


A.) Dump the MSSCrawlHistory from Search database table and look for the latest Crawl’s with Status 11 (One long running and one that ran quickly)


B.)  The screen shot shows me Crawl ID 8162 and 8165 “I’ll use these crawl ID’s in my example”


C.) Next, open the MSSChangeLogCookies table within the Search DB and you’ll see something like the following for the corresponding crawl ID’s gathered from reviewing the MSSCrawlHistory table:


ChangeLogCookie_old will contain:


1;0;888eef75-d584-4edf-b242-f5161d4c3c44;633686171531600000;2564


ChangeLogCookie_new will contain:


1;0;888eef75-d584-4edf-b242-f5161d4c3c44;633686187284500000;2577


It’s important to understand the bold values above.  The long Guid “888eef75-d584-4edf-b242-f5161d4c3c44” is the actual database were crawling against. This is important because this is the content database where you’ll need to query the eventcache table for the batch of changes processed during this crawl.  To find the actual database value you can run something like the following against Config DB:


select * from objects with (NOLOCK) where ID = ‘888eef75-d584-4edf-b242-f5161d4c3c44’


The last 4 digits 2564 and 2577 are the changes from the eventcache table that the crawler processed.   


D.)    Next, You want to run a query against the eventcache table and output to a file:


Select * from eventcache with (NOLOCK) where ID BETWEEN 2564 AND 2577


E.) Perform the above steps for both short/long running incremental crawls and compare the differences from the eventcache output via Step D.  


Conclusion from Method 1:  If you see eventtypes 4194304 and 2097152 only during your long running incremental crawls, it’s a fairly safe bet to say that the security only crawls are the issue.  I would combine both methods to confirm that 99 %. 



 


Method 2:


When the crawl log stops reporting events and appears hung again you should open the ULS logs on the indexer and see a ton of events like the following:


04:21.0  mssearch.exe (0x092C)     0x09A8 Search Server Common    GathererSql   0    Medium      CGatherer::CommitTransaction succeeded, URL sts3://site/siteurl=/siteid={ee7f629c-a36c-4327-b5f1-0652878fa4a3}/weburl=docs/webid={ba2b56ba-27d1-4cfb-90b4-e638d3f3b0ed}/listid={4d587236-3886-440f-a0d7-0d33622724cd}/folderurl=/itemid=27, CrawlID 51, SourceDocID 229 – File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:9449

This event tells us that the gatherer is busy committing transactions to the index.  You’ll notice some subtle differences between the events.  For Example: Itemid and SourceDocID will be different for every event.


The goal is finding out the following about the event: 


1.) What exactly is being committed to the index?
2.) Was this an item that was actually added/changed?    


To figure out which item is being hit you would run the following SQL queries based on the SourceDocID, List ID, and ItemID.  I use the above event as an example:


Query 1 against SearchDB:


select * from MSSCrawlURL where docID = ‘229’


“Look at the DisplayURL and see what URL it’s hitting as this usually includes a file name”


Query 2 against ContentDB:


select * from AllDocs where listid = ‘4d587236-3886-440f-a0d7-0d33622724cd’ and doclibrowid = ’27’


“Look at the LeafName and confirm this matches the same item as Query 1”


Both of these queries should result in the same file.  I would run the above two queries on multiple events. Maybe grab like 10/20 items and then hit the Crawl logs after the crawl completed and see if these items were crawled. If not, then it’s very safe to say that security only crawls were taking place in addition to the method I provided earlier.



Question:


Why do security only crawls take so long?


Answer:  


The time difference in crawl can be attributed to expansion of the SharePoint Group and also that the group is at the site collection level and affects items beyond the list.  If a Sharepoint group has several thousand users at site collection level, you can see how this can be very expensive.  Also, a large number of items within that site collection can add to the delay because new ACL changes will be pushed down to every item affected by the security change.   


Question: 


How can I work around this and prevent security only crawls from affecting incremental crawl times?


Answer:  


Instead of users explicitly added to Sharepoint groups, add AD groups instead.   Managing adding/removing users from Active Directory security groups will not cause ACL changes within Sharepoint.  Because of this, no security only crawls will occur.