SQL Azure data Sync : Initialization of a large database

Hi, this week, I encountered a issue when I tried to initialize a synchronization between to database using SQL Azure DataSync.

My scenario is the following :

  • One database is located on West  Europe (B1)
  • One database is located on North Europe (B2 – this database is sync in case of failure)
  • The database represent 28 giga  and about 28 000 000 lines.

The problem with the datasync is the initialization due to the high capacity of the database.

You’ve for this multiple two first scenario.

  1. Create the Schema on base B1
  2. Set all the data on B1 (these two steps can be just one with an import database from a export or backup)
  3. Create the schema on base B2 (because DataSync is not able to create all the constraints of the schema)
  4. Configure the sync
  5. Launch the first sync

This will result on a long transfer of data between B1 and B2. (In my case about 15-20h maybe more)

The second scenario is :

  1. Create the Schema on base B1
  2. Set all the data on B1 (these two steps can be just one with an import database from a export or backup)
  3. Create the schema on base B2 (because DataSync is not able to create all the constraints of the schema)
  4. Set all the data on B1 (these two steps can be just one with an import database from a export or backup)
  5. Configure the sync
  6. Launch the first sync

This will result on a analysis of all the data present in the tracking table of each database and a merge of the data according to the conflict priority (hub or spoke, in my case, the hub). This scenario is very very very long, it’s take more than 36h and create some latency on the database.

So I decided one last solution. Try to set the two database identical with the import of the same backup then configure the DataSync (without schedule the sync)

image

and then try to by pass the first init forcing the service to think that data was already sync.

For this scenario, the steps are :

  1. Create the Schema on base B1
  2. Import the database B1
  3. Import the database B2
  4. Configure the sync
  5. When the status of the two database on the portal is to :”ready to sync”, launch the sql script of initialization
  6. Launch the first sync and be happy to see just a couple of second and 0 changes. After this you’re in an incremental situation of a 25 giga database and you can schedule the sync.

The scripts is the following :



DECLARE @table_name nvarchar(255)
DECLARE @message varchar(255)
DECLARE @totalrow int
DECLARE @numberofrowtoupdate int = 10000; -- Change this number to update more rows per trans

DECLARE datasync_table_cursor CURSOR FOR 
SELECT name FROM sys.tables with(nolock)
WHERE schema_name(schema_id) like '%datasync' AND name like '%dss_tracking'

OPEN datasync_table_cursor

FETCH NEXT FROM datasync_table_cursor 
INTO @table_name


DECLARE @totalsumrow int
SET @totalsumrow = 0
WHILE @@FETCH_STATUS = 0
BEGIN
    PRINT ' '
    SELECT @message = '----- Start of update table: ' + 
        @table_name

    PRINT @message
	SET @totalsumrow = 0
	DECLARE @SelectStatement NVARCHAR(2000)
	DECLARE @ParameterList NVARCHAR(500)
	SET @ParameterList = '@TotalRowsSelected INT OUTPUT'     
	SET @SelectStatement = 'Select @TotalRowsSelected = count(*) from DataSync.'+ @table_name
	EXECUTE sp_executesql @SelectStatement, @ParameterList,   @TotalRowsSelected=@totalrow output
	SELECT @message = '----- Table rows number '+@table_name+' : ' + 
    CAST(@totalrow AS NVARCHAR(10))
    PRINT @message



		WHILE 
			(2 > 1) 
			BEGIN 
			BEGIN TRANSACTION 


			DECLARE @UpdateStatement NVARCHAR(2000)
            DECLARE @FullStatement NVARCHAR(4000)   
			DECLARE @TotalRowsReturned  int
			DECLARE @rowcount int
			SET @UpdateStatement =' UPDATE TOP ( ' + CAST(@numberofrowtoupdate as nvarchar(6))+ ' ) DataSync.'+@table_name + ' set  [local_update_peer_timestamp] = 0,[last_change_datetime] = getdate() where local_update_peer_timestamp <> 0 '
			
			
			EXECUTE sp_executesql @UpdateStatement     
			SELECT @rowcount = @@ROWCOUNT

			SET @totalsumrow = @totalsumrow + @rowcount
			SELECT @message = 'number of modified rows = ' + CAST(@totalsumrow AS NVARCHAR(10)) + ' / ' + CAST(@totalrow AS NVARCHAR(10))
			print @message
			-- 
			IF @rowcount = 0 
			BEGIN 
			COMMIT TRANSACTION 
			BREAK 
			END 
			COMMIT TRANSACTION 
			END -- WHILE 
	

	SELECT @message = '-----End of table update: ' + 
        @table_name
    PRINT @message
        
		
    FETCH NEXT FROM datasync_table_cursor 
    INTO @table_name
END 
CLOSE datasync_table_cursor;
DEALLOCATE datasync_table_cursor;


the aim is to :

  • list the tracking table (using cursor)
  • set the local_update_timestamp to 0 because this is the value used in the first sync. (using a loop and a update top to see the progress and also to be sure to not have any error with the logs transaction – (you can’t update million of row because the log will be full and errors will occurs)

I will update this code next week to add a summary report at the end of the scripts .

This scripts succeed in 30min for 11 000 000 of rows (2,2 giga).

For my 25 giga  (approx 12 without index), the result are :

  • Provisioning : 2637,14 sec(WEST)  et 2797,84 sec(North)
  • Script WE : 1h26min29sec
  • Script NE  : 2h41min20sec
  • Init sync : 19,84 secondes, 0 UP 0 DL ( Rire )

I save up to one day or more  Clignement d'œil

This entry was posted in Azure, English, Windows Azure Platform and tagged , , . Bookmark the permalink.

1 Response to SQL Azure data Sync : Initialization of a large database

  1. ciccio says:

    Hi,
    I tried to run this script but given me this error at this line :
    set [local_update_peer_timestamp] = 0
    The error is: Cannot update a timestamp column.
    Any workaround?
    Thanks

Leave a comment