Navigate Up
Sign In

SharePoint 2013: The Truth Behind Shredded Storage

Item is currently unrated. Press SHIFT+ENTER to rate this item.1 star selected. Press SHIFT+ENTER to submit. Press TAB to increase rating. Press SHIFT+ESCAPE to leave rating submit mode.2 stars selected. Press SHIFT+ENTER to submit. Press TAB to increase rating. Press SHIFT+TAB to decrease rating. Press SHIFT+ESCAPE to leave rating submit mode.3 stars selected. Press SHIFT+ENTER to submit. Press TAB to increase rating. Press SHIFT+TAB to decrease rating. Press SHIFT+ESCAPE to leave rating submit mode.4 stars selected. Press SHIFT+ENTER to submit. Press TAB to increase rating. Press SHIFT+TAB to decrease rating. Press SHIFT+ESCAPE to leave rating submit mode.5 stars selected. Press SHIFT+ENTER to submit. Press SHIFT+TAB to decrease rating. Press SHIFT+ESCAPE to leave rating submit mode.

 

I’ve had many people ask me about how Microsoft SharePoint 2013 Shredded Storage and Remote BLOB Storage work together, and how AvePoint supports this from a storage optimization perspective. Bill Baer, Senior Product Marketing Manager and Microsoft Certified Master for SharePoint, posted about this topic during SharePoint Conference 2012 and hopefully clarified a lot of the confusion around Shredded Storage. SharePoint MVP Dan Holme, AvePoint Enterprise Trainer & Evangelist Randy Williams, and I put together an article for Dan’s weekly column on Shredded Storage a few weeks ago which is also worth a read.

In my opinion, the name is the most confusing thing about it – by using the word “storage” it implies that this feature is for storage optimization and, as Bill Baer rightly points out, this functionality was totally focused on file i/o optimization, although there have been some benefits of storage savings as we’ll talk about below.

Shredded Storage comes in two parts: part one is about getting the document to the Microsoft SQL Server and part two is about storing it in SQL. Part one is an enhancement of the Cobalt feature introduced in SharePoint 2010 and part two is a new feature introduced to store deltas of documents in SQL. The feature set is only available on SharePoint 2013 and is available in SQL 2008 R2 + patch and SQL 2012.

In a nutshell, the Cobalt feature “shreds” the BLOB sent from the client machine (e.g. the user’s desktop PC) to the Web Front End (WFE) server and then directly through to the SQL Server. The SharePoint 2010 Cobalt framework sent the “shred” from the client to the WFE server ONLY. In SharePoint 2013, it continues to support only Office XML document types and sends the shreds all the way to the SQL server. Why is this a good thing? Well, because the shred only went to the WFE server in Cobalt v1, it meant that the WFE server had to fetch the whole BLOB binary file from the SQL server, doing a merge on the WFE and then send it back to the SQL server – which meant a lot of file i/o duplication and network hopping.

The confusion begins during the second part of storage in SQL, it doesn’t do a merge and create a new whole BLOB binary of the new version of the document, it only stores the shreds or “deltas” as discussed in the SharePoint Conference 2012 keynote. When you open the latest version of the document, it combines the shreds and returns it to the WFE which in turn sends it to the client. The shreds are stored in a new table called DocStreams in the SQL Content Database allocated for the site collection, and a separate table keeps a list of all the pointers that make the overall BLOB up for that version.

Supported Formats

The Cobalt aspects of Shredded storage work only with Office XML documents still. The storage to SQL shredding works with all documents we’ve tested from Office XML documents (2010/2013 format), Office Binary Documents (< 2007 format), PDF, JPEG, etc. But as discussed later, Office XML Documents have some benefits over others. This is interesting as all non-Office XML documents will actually be shredded on the SQL server as it receives the entire binary BLOB file, so in the case of non-Office XML documents, shredded storage is purely a storage optimization benefit.

Differentials

For each version of the BLOB saved in SharePoint, it only stores the differential shreds. It does not touch the existing shreds created, but “magically” works out the differential shreds to store. Interestingly, it actually shreds all documents over the defined size regardless of whether versioning is turned on in the library but, if versioning is turned on will store deltas of each cumulative version of that library item. This is a huge savings for companies that do have lots of document versions in their SharePoint libraries, but there is obviously no benefit of shredding a document if versioning is not enabled. What we have found already is that the efficiency of the deltas to save storage, compared to de-duplication if RBS is enabled, is significantly less optimal.

One thing to point out, though, is if I have 50 copies of the same document across multiple SharePoint sites, it does not do the differentials at this level, it only does it at the document (item) scope. So there is no saving in this scenario, either.

One nice feature that was not included in prior versions of SharePoint is that if I just edit the metadata in the list item within SharePoint without editing the attached Office XML Document, it doesn’t create a new version of the BLOB in the SQL table. Note this isn’t the case with non-Office XML documents. This will result in tremendous storage capacity savings for some customers.

Shred Size

It appears to shred any BLOB, and, to date, our research has shown that the shred size is inconsistent and varies depending on the file format. For example, a 156K JPEG file had 6 shreds in version 0.1, a 1Mb .docx had 12 shreds as shown in screenshot below. Please note that in this example, the sum total of the shreds is in fact LARGER than the original 1Mb document and is therefore inefficient storage optimization.

2012-12-12-ShreddedStorage-01.png

There are some variables in the API that can be set at content database level; the default is 64320Kb for the maximum size of the shred. If the file is less than the maximum size set, then it simply won’t shred the file at all. More details are available in Bill’s post.

Existing Data

A key issue to point out is that if you upgrade your existing SharePoint 2010 Content Databases to SharePoint 2013, they will not benefit from Shredded Storage until a new document version is created.

Turning Off Shredded Storage

Shredded Storage can be turned off for a web application, site collection, and site (web) level – the default setting is AlwaysDirectToShredded. If you turn off Shredded Storage, SharePoint goes back to acting like it did in SharePoint 2010…Cobalt v1 style. This means that you have potentially higher file i/o on between the WFE and no storage savings on deltas of versioned files.

What happens when you enable RBS?

When you turn on RBS with a content database that has Shredded Storage enabled, the real-time RBS provider receives each shredded BLOB individually. These shreds are extremely small and as our RBS research in 2010 proved with our white paper, storing BLOBs outside of the SQL database that are less than 1Mb is, in general, inefficient. This is why we recommend setting up RBS rules that leave files less than 1Mb in the content database.

Our scheduled RBS product (DocAve Storage Manager) will work fine with Shredded Storage, as when Storage Manager calls SharePoint to externalize it we do get the full BLOB. We can also do more sophisticated business rules to decide whether we externalize it with RBS also.

By adding the RBS Provider into the mix, when I’m fetching the 69th version of a document, it’s going to get REAL chatty with the RBS provider fetching all the individual shreds. The shred size can potentially be changed up to 1Mb to be more efficient from an RBS perspective, but until we get more data from our labs we have no concrete guidance here yet. Some preliminary performance stats are available below.

Fetch Performance

From a performance perspective, for instance, if I save a 10Mb document 100 times and store each version – changing randomly a few paragraphs all over the document – to fetch version #69 or even the latest version it must merge all of the relevant shreds and do so all in the SQL software layer. This concerns me A LOT as it will be a huge performance overhead to do this over simply fetching the entire BLOB version like in SharePoint 2010!

The table below illustrates the time it took to perform a full SP-Export on the entire site collection, based on the different configurations with exactly the same content data set:

Shredded Storage

DB size (Mb)

RBS size (Gb)

Export time (secs)

Off

24724.88

Off

1477

Off

54.58

23.40

1882

Default – 64Kb chunk

6000.31

Off

2471

Default – 64Kb chunk

103.25

6.35

3502

1Mb chunk

6749.30

Off

2005

1Mb chunk

95.19

6.25

3309

1Gb chunk

13349.81

Off

1745

1Gb chunk

74.00

12.40

2096

From this you can see that there is a 40% increase in the time it takes to perform an export with Shredded Storage switched on, and in this content sample set a 75% saving in storage size. This will differ a lot depending on the type of content you are versioning. You’ll note that with Shredded Storage and Remote BLOB Storage on, there is a 58% increase in time taken. More notably, there is only a 22% increase if only Remote BLOB Storage is enabled and de-duplication was switched on – dramatically reducing the externalized BLOBs.

These initial performance tests were done on virtualized hardware based on recommendations on TechNet and NetApp infrastructure for the externalized content.

Our Current Recommendation

The main reason that Microsoft built Shredded Storage was to overcome the file i/o problems in Office 365 – SharePoint Online. This problem may not exist in your farm, and if you are simply looking for storage optimization you should consider the same technique that was common in SharePoint 2010 with a Remote BLOB Storage provider and de-duplication in your attached storage.

De-duplication will also work across all externalized BLOBs – not just at library item scope – and also work with all document types. De-duplication is also a hardware file i/o operation rather than a software operation as its built into the bare metal of the attached storage devices. From speaking to various infrastructure vendors, a general rule of thumb is that you can realize 88% storage capacity saving by doing this.

A key point to note here is that if you do externalize your BLOBs using RBS and have de-duplication on, you will immediately realize the storage optimization savings unlike Shredded Storage, which requires you to create a new version of the document before you start receiving the benefits.

The Future

We are working hard in a lab environment right now to produce our own figures taking into account our DocAve platform. A white paper will be published shortly with these findings, so please keep a look out on DocAve.com.

Categories: SPF 2013; SQL; Performance and Optimization

Comments

Ben

FileWriteChunkSize

Hi Jemery, Cheers for the excellent article. A minor correction (I think) - the FileWriteChunkSize property is 64320 bytes by default as opposed to Kb. Ben

Posted 20-Dec-2012 by Ben
Talbott Crowell

Shredded Storage Test Results

Hi Jeremy,  Nice post.  I've been doing some testing, and I've found that if you have versioning turned on and the only changes you are making are to the document properties (metadata) then the storage difference between 2013 and previous versions of SharePoint are significantly different. -Talbott
 

Posted 06-Jan-2013 by Talbott Crowell
Alex Dean

Great Analysis and summary!!!

Thanks mate! The stats above show RBS with I assume De-Dupe not active, as the size seems to have remained?
 
One Gotcha I would add to Shredded Storage and Site Quotas. The calculations in the background for the amount of storage used do not take shredding or de-duplication into account. So when you set quotas based on how much space a division is allowed to consume, revise them against the actual storage used.

Posted 18-Jun-2013 by Alex Dean
Bob Cummings

Shredded disaster...

Hi A previous employee decided that the best way for staff to be able to load images and docs through a Sharepont CMS backend would by using attachments to list items. This worked by using a view written in sql server in a database and using contentdb tables. This worked so well in Sharepoint 2010 that no-one even thought about changing it (the more astute readers will already have seen where this is going...) Although the table structure and names has changed in 2013 (docstream and docs view) the view used in the asp.net coded CMS still works and the images which are already there display just as they should. If a new image is added it unsurprisingly errors. Having ignored all Microsoft recommendations regarding interfering with backend tables - can anyone think of a really clever way of enabling those images to be once more used as attachments to be displayed via asp.net? The File chunk size has already been increase to a meg which means there is now only one row entry per list item but this has failed to cure the problem. Any suggestions gratefully received. Bob

Posted 15-Aug-2013 by Bob Cummings

Notify me of comments to this article

E-mail:
   

Add Comment

Title:

 
Comment:
Email:

   


Name:

 
Url: