Navigate Up
Sign In

Copying documents between libraries with metadata - including version history

Item is currently unrated. Press SHIFT+ENTER to rate this item.1 star selected. Press SHIFT+ENTER to submit. Press TAB to increase rating. Press SHIFT+ESCAPE to leave rating submit mode.2 stars selected. Press SHIFT+ENTER to submit. Press TAB to increase rating. Press SHIFT+TAB to decrease rating. Press SHIFT+ESCAPE to leave rating submit mode.3 stars selected. Press SHIFT+ENTER to submit. Press TAB to increase rating. Press SHIFT+TAB to decrease rating. Press SHIFT+ESCAPE to leave rating submit mode.4 stars selected. Press SHIFT+ENTER to submit. Press TAB to increase rating. Press SHIFT+TAB to decrease rating. Press SHIFT+ESCAPE to leave rating submit mode.5 stars selected. Press SHIFT+ENTER to submit. Press SHIFT+TAB to decrease rating. Press SHIFT+ESCAPE to leave rating submit mode.

 

– “How do you move (or copy) documents from one library to another while keeping version history intact?” –

This is one of those tasks that, although it sounds pretty straightforward and simple is actually not one of the easiest things to take care of.

From what I've been able to find, there's really only a few different approaches to this that most folks take:

  • Explorer view (or Network place) copy/paste
  • Save site as a template – including content (*only works if library size is less than 10MB)
  • Or the new "Manage Content Structure" page offered in MOSS.

The first option, as proved numerous times by many users, does indeed copy the file(s) to the new library, but it doesn't preserve any version history (note – apparently for some users, doing a "move" rather than a "copy" has successfully brought over the version history, although I have never gotten his to work and realistically, I don't prefer this approach because I don't care for the idea of losing the original source document in this scenario).

Option two is also not a reliable approach due to the size limit's on how big the content can be that you're saving (*default limit only – size can be modified through STSADM – mentioned here: http://blogs.microsoft.co.il/blogs/meronf/archive/2006/08/22/2617.aspx), and it's kind of a “kludgy” approach in the first place because its really just a work-around to move documents one time (not a true “archiving” method since you cant just make updates based on the original documents and apply those changes to the “templated” library – obviously this isn't the point of this method, but some users may want this functionality and wont have it if following this approach).

Option three relies on your having MOSS installed.
Ok, fine…but what about those of us working with an instance of WSS only and not MOSS?

After researching all the possible approaches (minus paying someone an outrageous amount of money for a program that promised to do this), I decided to investigate if I can perform this programmatically through the SharePoint object model…which I'm happy to report was not quite as difficult as I had thought (although I ran into a few interesting logic problems along the way, but did manage to get past them).

Version history? What the Heck???

The first thing we need to understand is just how SharePoint deals with versions. Once you turn on versioning on a document library, you're enabling the use of a new “virtual” directory set aside for the sole purpose of providing a “web” interface to access previous versions of a document that are all stored in the content database. This new directory is called “_vti_history“, and includes a number in each document's URL that signifies it's actual version – it's also important to note that all the documents accessed in the virtual folder are previous versions of the document only, not the current version that is displayed in the document library itself (this will be important to remember later when programming on versions).

An example of these URL's for document versions would be something similar to:

  • http://www.mydomain.com/_vti_history/1/Shared%20Documents/Test.doc
  • http://www.mydomain.com/_vti_history/2/Shared%20Documents/Test.doc
  • http://www.mydomain.com/_vti_history/3/Shared%20Documents/Test.doc
  • http://www.mydomain.com/_vti_history/512/Shared%20Documents/Test.doc
  • http://www.mydomain.com/_vti_history/1024/Shared%20Documents/Test.doc
  • http://www.mydomain.com/_vti_history/1025/Shared%20Documents/Test.doc
  • http://www.mydomain.com/Shared%20Documents/Test.doc

(The last URL listed does not contain a number or the “_vti_history” path because it is the current version of the document.)

You'll notice in the URL's the number immediately following the “_vti_history/” part of the address. This number specifies exactly what the version number is for the document.

  • URL number “1” = version “0.1
  • URL number “2” = version “0.2
  • URL number “3” = version “0.3
  • URL number “512” = version “1.0
  • URL number “1024” = version “2.0
  • URL number “1025” = version “2.1

So, by looking at these numbers, we can start to see a pattern forming (which again will become very important later when we begin coding). You'll notice that all the minor versions (numbers to the right of the decimal point) are all based on a single number counting system, whereas the major versions (numbers on the left of the decimal point) are based on a “512″ increment system (I like to call this a “base-512” counting system).

For example, let's say we have a document that is version “14.7″. Following the pattern and the base-512 counting system, we'd come up with a number of “7175″ (512 * 14 + 7) making the URL http://www.mydomain.com/_vti_history/7175/Shared%20Documents/Test.doc.

Now, I do have to state that I absolutely despise mathematics. I hate it with a passion. It always has been, and will continue to be, my worst subject and is the constant source of many-a-migraine This particular base-512 system threw me for a bit of a loop when attempting some of the coding for this, but in the end I was able to tame it somewhat and come up with a workable solution for the logic it was confusing me with.

Logic? More of a fad if you ask me.

So, now that we know how SharePoint deals with document versions, let's take a quick look at the logic that will be involved in copying the contents of one library to another (then we'll jump into the code and get this post over withJ).

First and foremost, since we'll be programming against SharePoint, we'll need to make sure we can get access to the objects available in its object model so make sure in your web project that you add in a reference to the SharePoint.dll (located in the ISAPI folder of the 12 hive).

The steps the program will take are this:

  1. Enumerate all sites in the current web collection and populate the “source site” and “target site” dropdown lists.
    1. For this, I'm adding in a parsed version of the site's URL into the text field of the list that has the domain portion of the URL removed only (makes it easier to read).
    2. In the “value” attribute I'm adding in the full URL.
  2. On selection of the source or target site, the corresponding source or target “library” dropdowns get populated with a list of all the document libraries in the selected site.
    1. Allows us to build the connection for the document library to be copied and its destination.
    2. Additionally, I'm trimming the list of available libraries down to (standard) user-accessible libraries only (removing libraries such as the “Master Page Gallery”, “Workflows”, etc.)
  3. Next, after selection of the source and target have been established, on the button click event, we begin to format the URL's we'll need, setup our connections, then begin to enumerate the documents in the source library and process them.
    1. During the document processing, there's a specific approach that had to be taken due to how the documents are stored and accessed as follows:
    2. For each document in the library, check to see if it has any versions and if it does, grab it's URL and parse out the version number (the “base-512″ number mentioned earlier) and add it to a new SortedList object as a key with no value (we'll use this is a comparison object later) making sure to first convert it to an integer.
      1. Converting to an integer is actually a rather important step during this process that threw for awhile until I figured out what was happening. The SortedList object in .NET does perform automatic sorting on its keys (part of the appeal of the object), but it does treat the keys different depending on the type being entered. Since it's essentially a combination between an Array and a HashTable, it will take any object type you want to add, but adding in the parsed URL as a string has an inherent problem.

      If we add in for example, the following values: “1″, “2″, “512″, “1024″, they will actually be sorted in the list as “1″, “1024″, 2″, “512″. This is not good! Since we're working with versions, they must be in a specific order, and as described earlier in this post, the numbers follow a set pattern. Having them sorted based on the string sorting will make it so our copied versions will go to the new library in the wrong order (very badL). So, in order to have them sorted in the correct “numerical” order – convert the string number to an integer and all will be fine (finding this out took almost as long as writing the entire program itself).

    3. Next, for each of the keys we just added, we'll again loop through all the versions, again parse out the version number from the URL and find the one that matches the key.
    4. Once we have a match, we'll send the parsed out number (converted to an integer again) to a custom “version checking” method that looks to see if it is a minor or major version and adds it into the target document library as the appropriate version.
      1. In this method, a couple actions take place.
        1. In order to have an adequate checking system for versions that would allow me to be able to pass in any version number regardless of how large, I had to process the integer through a simple base-512 check ( I say “simple” only because after the time it took me to figure out how to do it, it wound up only being three small lines of code to process the number and then a simple “if/else” to check if it was a major or minor.did I mention earlier how much I hate Math?!?! L).
        2. Once the number has been processed and deemed either major or minor, it is then added to the target document library with the appropriate version. (Since versioning will be turned on in the target library, as I add in each successive version, the number will increment accordingly.)
    5. Once all the versions have been added, the last step is to then add the current version of the document.
      1. Since we already have the context of the current document during this entire process, this part is simply running a check to see if the current version is major or minor version then adding it to the target library and publishing it if it's a major version.

So, that's it. Not too complicated, just a logical step by step process to grab each document, process it for a few checks, then copy it to a target library. Here's the code to do all this:

(Note – There are of course, ways in which the code can be streamlined and made more efficient, but for the sake of this exercise, it should serve as the foundation for what you can build on to suit your needs.)

<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs" Inherits="_Default" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" >
<head runat="server">
<title>Copy document library contents</title>
</head>
<body>
  <form id="form1" runat="server">
    <table id="tblMain" runat="server">
      <tr>
        <td>
          <asp:Label ID="lblSourceSite" runat="server" Text="Source Site:" Width="115px"></asp:Label>
        </td>
        <td>
          <asp:Label ID="lblSourceLib" runat="server" Text="Source Library:" Width="115px"></asp:Label>
        </td>
      </tr>
      <tr>
        <td>
          <asp:DropDownList ID="ddlSourceSite" runat="server" AutoPostBack="True" OnSelectedIndexChanged="ddlSourceSite_SelectedIndexChanged">
            <asp:ListItem>– Choose Site –</asp:ListItem>
          </asp:DropDownList>
        </td>
        <td>
          <asp:DropDownList ID="ddlSourceLib" runat="server" AutoPostBack="True"></asp:DropDownList>
        </td>
      </tr>
      <tr>
        <td>
          <asp:Label ID="lblTargSite" runat="server" Text="Target Site:" Width="115px"></asp:Label>
        </td>
        <td>
          <asp:Label ID="lblTargLib" runat="server" Text="Target Library:" Width="115px"></asp:Label>
        </td>
      </tr>
      <tr>
        <td>
          <asp:DropDownList ID="ddlTargSite" runat="server" AutoPostBack="True" OnSelectedIndexChanged="ddlTargSite_SelectedIndexChanged">
            <asp:ListItem>– Choose Site –</asp:ListItem>
          </asp:DropDownList>
        </td>
        <td>
          <asp:DropDownList ID="ddlTargLib" runat="server" AutoPostBack="True"></asp:DropDownList>
        </td>
      </tr>
      <tr>
        <td>
          <asp:Button ID="btnStart" runat="server" OnClick="btnStart_Click" Text="Copy Files" />
        </td>
        <td>
          <asp:Button ID="btnReset" runat="server" Text="Reset Fields" OnClick="btnReset_Click" />
        </td>
      </tr>
    </table>
  </form>
</body> 
</html>

The corresponding “Default.aspx.cs” file will be as follows:

using System;
using System.Collections;
using System.Text.RegularExpressions;
using Microsoft.SharePoint;
using Microsoft.SharePoint.Administration;
using Microsoft.SharePoint.Utilities;
using Microsoft.SharePoint.WebControls;
using System.Web;
using System.Web.UI;
using System.Web.UI.HtmlControls;
using System.Web.UI.WebControls; public partial class _Default : System.Web.UI.Page
{
     #region GlobalVars
     string sourceFolder; //source site
     string sourceDocLib; //source library
     string destFolder; //target site
     string destDocLib; //target library
     string destURL = ""; //target site + library + filename
     SPSite siteCollection;
     SPFolder srcFolder;
     SPFileCollection destFiles; 
     SPWebApplication webApp; 
     byte[] verFile; //document to be copied
     #endregion          /// <summary>
     /// Page load event that calls method to populate dropdownlists
     /// </summary>
     /// <param name="sender"></param>
     /// <param name="e"></param>
     protected void Page_Load(object sender, EventArgs e) 
     {
          GetSites();
     }      
     /// <summary>
     /// Populates source and target (site) dropdownlists
     /// </summary>
     public void GetSites() 
     {
          SPSite mySite = SPContext.Current.Site; 
          SPWebCollection subSites = mySite.AllWebs;
          foreach (SPWeb site in subSites) 
          {
               //regex to strip out the domain name from the URL – makes it display nice in the dropdownlist
               //also, make sure to use your own domain name escape the "/" from the end of the domain
               ddlSourceSite.Items.Add(new ListItem(Regex.Replace(site.Url, "yourdomain.com\/", ""), site.Url));
               ddlTargSite.Items.Add(new ListItem(Regex.Replace site.Url, "your domain.com\/", ""), site.Url)); 
          }
     }      /// <summary>
     /// Populates list of source document libraries and filters out "admin" libraries
     /// </summary>
     /// <param name="sender"></param>
     /// <param name="e"></param>
     protected void ddlSourceSite_SelectedIndexChanged(object sender, EventArgs e) 
     {
          ddlSourceLib.Items.Clear();
          using (SPSite curSite = new SPSite(ddlSourceSite.SelectedItem.Value)) 
          {
               using (SPWeb curWeb = curSite.OpenWeb()) 
               {
                    foreach (SPList list in curWeb.Lists) 
                    {
                         if (list.Title == "List Template Gallery" || list.Title == "Master Page Gallery" || list.Title == "Site Template Gallery" || list.Title == "Web Part Gallery" || list.Title == "Workflows")
                         { 
                         }
                         else if (list.GetType().ToString() == "Microsoft.SharePoint.SPDocumentLibrary") 
                         {
                              ddlSourceLib.Items.Add(new ListItem(list.Title)); 
                         }
                    }
               }
          }
     } 
     /// <summary>
     /// Populates list of target document libraries and filters out "admin" libraries
     /// </summary>
     /// <param name="sender"></param>
     /// <param name="e"></param>
     protected void ddlTargSite_SelectedIndexChanged(object sender, EventArgs e) 
     {
          ddlTargLib.Items.Clear();
          using (SPSite curSite = new SPSite(ddlTargSite.SelectedItem.Value)) 
          {
               using (SPWeb curWeb = curSite.OpenWeb()) 
               {
                    foreach (SPList list in curWeb.Lists) 
                    {
                         if (list.Title == "List Template Gallery" || list.Title == "Master Page Gallery" || list.Title == "Site Template Gallery" || list.Title == "Web Part Gallery" || list.Title == "Workflows") 
                         {
                         }
                         else if (list.GetType().ToString() == "Microsoft.SharePoint.SPDocumentLibrary") 
                         {
                              ddlTargLib.Items.Add(new ListItem(list.Title)); 
                         }
                    }
               }
          }
     } 
     /// <summary>
     /// Gathers selections from dropdownlists and fires off copy process
     /// </summary>
     /// <param name="sender"></param>
     /// <param name="e"></param>
     protected void btnStart_Click(object sender, EventArgs e) 
     {
          //set paths – parses out site name from url in dropdownlists
          string sf = ddlSourceSite.SelectedItem.Text; 
          int index0 = sf.LastIndexOf("/");
          string newSf = sf.Substring(index0 + 1); 
          sourceFolder = newSf;
          string sd = ddlSourceLib.SelectedItem.Text; 
          int index1 = sd.LastIndexOf("/");
          string newSd = sd.Substring(index1 + 1); 
          sourceDocLib = newSd;
          string df = ddlTargSite.SelectedItem.Text; 
          int index2 = df.LastIndexOf("/");
          string newDf = df.Substring(index2 + 1); 
          destFolder = newDf;
          string dd = ddlTargLib.SelectedItem.Text; 
          int index3 = dd.LastIndexOf("/");
          string newDd = dd.Substring(index3 + 1); 
          destDocLib = newDd;
          //connection info and context
          siteCollection = SPControl.GetContextSite(Context); 
          //source folder (document library)
          srcFolder = siteCollection.AllWebs[sourceFolder].Folders[sourceDocLib];
          //destination folder (document library)
          destFiles = siteCollection.AllWebs[destFolder].Folders[destDocLib].Files;
          //Admin web application object
          webApp = siteCollection.WebApplication;
          //temporarily disables "security validation" to get around the
          //"The security validation for this page is invalid." error message
          webApp.FormDigestSettings.Enabled = false; 
          //enumerate source library
          foreach (SPFile file in srcFolder.Files) 
          {
               //standard check
               if (file.Exists) 
               {
                    //custom sorted list used to reorder versions
                    SortedList myList = new SortedList(); 
                    ICollection items = myList.Keys; 
                    //destination URL – path that will be used for copied file to target library including filename
                    destURL = destFiles.Folder.Url + "/" + file.Name; 
                    //checks to see if file has versions
                    if (file.Versions.Count != 0) 
                    {
                         //enumerate versions
                         foreach (SPFileVersion ver in file.Versions) 
                         {
                              string tempKey = ""; 
                              //parses version number from previous versions URL
                              tempKey = Regex.Replace(ver.Url, "_vti_history/", ""); 
                              tempKey = Regex.Replace(tempKey, "/" + sourceDocLib, "");
                              tempKey = Regex.Replace(tempKey, "/" + file.Name, ""); 
                              //converts string number to int in order to be sorted correctly
                              //adds to Sorted list as a new key
                              myList.Add(int.Parse(tempKey), ""); 
                         }
                    }
                    
                    //since items in sorted list are now actually sorted correctly
                    //we start with this list in order to process the versions
                    //to copy them in the correct order
                    foreach (object key in items) 
                    {
                         //as we iterate the keys in the sorted list (the version numbers)
                         //we then run a comparison on the actual versions to find which one matches
                         //the key so we can process each one in order
                         foreach (SPFileVersion newVer in file.Versions) 
                         {
                              string temp = ""; 
                              //parses version number from previous versions URL again 
                              //in order to compare it to key stored in SortedList.
                              temp = Regex.Replace(newVer.Url, "_vti_history/", ""); 
                              temp = Regex.Replace(temp, "/" + sourceDocLib, "");
                              temp = Regex.Replace(temp, "/" + file.Name, ""); 
                              //checks to see if version matches key 
                              if (temp == key.ToString()) 
                              {
                                   //opens file for processing and calls method to determine major/minor status
                                   verFile = newVer.OpenBinary();
                                   SetVersion(int.Parse(temp)); 
                              }
                         }
                    }
                    //Last step which copies current version
                    byte[] binFile;
                    SPFile copyFile; 
                    binFile = file.OpenBinary();
                    //checks if current version is major/minor version and publishes accordingly
                    if (file.Level.ToString() == "Published") 
                    {
                         copyFile = destFiles.Add(destURL, binFile, true);
                         copyFile.Publish(""); 
                    }
                    else
                    {
                         copyFile = destFiles.Add(destURL, binFile, true); 
                    }
               }
          }
          //re-enables "security validation"
          webApp.FormDigestSettings.Enabled = true; 
     }      /// <summary>
     /// method to determine major/minor status
     /// of version and publish accordingly
     /// </summary>
     /// <param name="num">parsed version number from file's URL</param>
     public void SetVersion(int num) 
     {
          int baseNum = 512; 
          decimal d = num / baseNum;
          int i = (int)Math.Floor(d) * 512; 
          //major publish (eg 1.0, 2.0, 3.0)
          if (num == i) 
          {
               SPFile copFileVers = destFiles.Add(destURL, verFile, true);
               copFileVers.Publish(""); 
          }
          //minor (eg 0.1, 1.1, 2.3)
          else
          {
               SPFile copFileVers = destFiles.Add(destURL, verFile, true); 
          }
     }      /// <summary>
     /// Resets dropdowns
     /// </summary>
     /// <param name="sender"></param>
     /// <param name="e"></param>
     protected void btnReset_Click(object sender, EventArgs e) 
     {
          Response.Redirect("default.aspx"); 
     }
}

Conclusion

In order to get this to work:

  • Modify the “GetSites” method to strip out your domain instead of the “yourdomain.com” sample listed.
  • Build and deploy using your preferred method ("layouts" directory, IIS site and view with page viewer, etc.)
  • Run the app and enjoy 2011-09-20-CopyingDocs-01.png

Due to the length of this post it is also very possible that I may have inadvertently skipped something, or possibly even typo'd in some area that may be rather important. If you see something obvious, please add a comment and I'll make sure and fix it as soon as I can. Also, if anyone has any issue with getting the code to work, again please post a comment to that effect and we'll work through it to see if we can get things going.

Until next time,

- Dessie

Categories: SharePoint; Metadata; Libraries and Lists; MOSS; WSS; 2007

Comments

Tom Resing

Great Question.

Dessie,
 
I have to agree this is a Frequently Asked Question. In fact, I asked a very similar question on sharepoint.stackexchange.com almost 2 years ago and got some great answers, some similar to yours.
 
One option we found was that move in explorer view does retain metadata including version history even though copy doesn't.
Also, if you have acces to the server, I think you'll find import and export commands in powershell will reduce the amount of code (or script) that you have to write.
 
-Tom

Posted 24-Sep-2011 by Tom Resing
Theo

You've duplicated the versions, but not the version "history"

You're code runs through and makes a copy of the versions of the document, however, they do not retain the created by, modified by, created date, nor modified date of each version. You've got all of the versions of the document in the end result, but it's not a complete version history duplication. The object model actually has a bug in it in regards to Document Libraries. With Lists you can stamp the created/modified data, forcing them to be their old values. With Libraries, however, you'll find that the trying to overwrite these values via the object model results in errors. What I suspect these 3rd party applications, that can copy documents exactly with versioning intact, is that they use a hybrid of the object model and direct database access. Copy the documents via the object model, but then they massage the raw database records for this type of data that the object model can't handle.

Posted 08-Oct-2013 by Theo

Notify me of comments to this article

E-mail:
   

Add Comment

Title:

 
Comment:
Email:

   


Name:

 
Url: