A while back I was working on doing a TFS upgrade and we decided to put the databases on a SQL 2014 AG. This is the standard out-of-the-box recommendation for HADR with SQL based applications now, so I figured we would get it done. Having had previous experience with TFS Backups and continuity, I also wanted to rework how the backups were running.
I’ve copied the contents of the document, some of the links will be broken. 😦
SQL Availability Group Backups
With Respect to TFS
SQL AG and TFS: SQL AG and TFS are supported my MS and the option of creating a redundant data tier for TFS decreases the chances of the TFS instance become unavailable. Considering the investment made in the development staff the extra time in setup seemed reasonable.
The TFS databases were then spread over a two server cluster using SQL 2014 AG for redundancy. This presented a few problems because the standard TFS backup tool would work, but would cause slowness and only perform full backups on the TFS databases.
Project Goals: The project goals were to create a TFS backup system which would minimize the loss of data assuming the primary database node fails and/or a SAN failure causes both nodes to go down. Special considerations needed to be made to accommodate the special needs of TFS.
TFS Backup Special Need: TFS backup requires the databases to have a consistent restoration point. In order to ensure a consistent restoration point a marked transaction is created in all databases per the TFS manual backup guide in MSDN (https://msdn.microsoft.com/en-us/library/ms253070.aspx).
AG Special Backup Needs: SQL AG has special backup needs as well. The distributed nature of the databases produces partial backup history pictures in the MSDB backup log. If the LSN chain isn’t preserved then the possibility of restoration becomes problematic. Also, the secondary database can be used for transaction log backups, but the primary should be used for full backups. With respect to full backups, the “Copy Only” option can be exercised against the secondary, however the MSDN backup history log is skipped when a copy only backup is created.
Copy_Only Backups: Copy_Only backups are a nice option that was added to SQL 2005. Copy_Only backups create a database backup that does not affect the LSN in MSDB. The downside of this is that differential backups cannot be made following a Copy_Only backup. If the solution does not ever plan to use differential backups, the the Copy_Only option performed at the secondary database is perfectly fine, and subsequent log backups can be applied to create an intact database. That mentioned, the Copy_Only backup is truly for making backups out of sequence and shouldn’t be part of the standard backup routine. (http://www.mssqltips.com/sqlservertip/1075/copyonly-backups-with-sql-server/)
Implementation of SQL AG and TFS
Backup Solution Design: The backup solution design is focused on being able to restore the databases to a single point in time. Each database will have a marked transaction within, and can therefor be restored to that named mark. Viewing transaction marks can be accomplished with the following: (if the user has permissions)
Transaction markers can only occur on the primary node of the AG. Also, full DB backups and differentials can only occur on the primary, so having them loosely tied in time becomes feasible. In the case of TFS backups, the transactions are marked 5 minutes prior to the full database backup execution and 5 minutes prior to each transaction log backup. This achieves the desired goal of a maximum of 60 minutes of work lost in the event of a major failure on the database system.
Backup Location: The backup location is a common location for all backups at \\<%ServerName%>\TFSBackup$. Permissions are set on this share to the TFS Backup user (<%YourDomain%>\TFSBackup) and the DBA team.
Backup Devices: Each database has a backup device registered. Viewing of the backup devices can be accomplished via:
Primary Node (<%ServerName%>\TFS2013SQL1): The primary node is used to create transaction markers and perform full database backups. If these backups become too long then differentials can be created provided a transaction marker is also ensured.
Secondary Node (<%ServerName%>\TFS2013SQL2): The secondary node is used to create transaction log backups.
SQL Jobs: The SQL jobs specific to the backup creations are:
Weekly Full TFS Backup: This job checks to see which node is it presently running on and then either creates a full backup of each database or exits when not running on the primary node.
Daily TFS Differential Backup: This job performs a differential backup of each database to be contained within the same media group as the full backup.
TFS Transaction Marker: Creates the transaction markers in each database via a stored procedure which performs a cross database transaction.
TFS Database Transaction Log Backups: This job checks to see which node it’s running on and creates transaction log backups or exits, based on the node role.
Each job is on each node of the cluster. This is by design, and each job is scheduled at exactly the same time. This way if the primary node fails and the secondary is online, the backups will continue to run.
Determining What Node the Code is Executing Against: The determination can be made in a couple of ways. One way to isolate the backup node is to use the configuration to create a preferred backup node. (sys.fn_hadr_is_preferred_replica) The other way is to check the sys.dm_hadr_availability_replica_states and sys.availability_replicas views to determine the node.
Declare @Servername nvarchar(128);
Declare @RoleDesc nvarchar(128);
Declare @ConnectedState nvarchar(128);
SELECT @Servername = @@SERVERNAME, @RoleDesc = N’PRIMARY’, @ConnectedState = N’CONNECTED’;
FROM sys.dm_hadr_availability_replica_states RS
INNER JOIN sys.availability_replicas AR ON
AR.replica_id = RS.replica_id
RS.role_desc = @RoleDesc
AND RS.connected_state_desc = @ConnectedState
AND AR.replica_server_name = @Servername)
This is the surefire way to interrogate the system and see what node the code is running against. Anytime the job needs to decide to proceed something of this sort should be used. It’s possible to embed something like this into the master DB and simply access it when required. (that didn’t happen for this project though)
Schedule: A full backup is made of the databases weekly. Thereafter differential backups are taken nightly, and transaction logs run every 3 hours beginning at 4am and running until 9pm every night of the week (M-F).
Retention: Backups are retained for 7 days, beginning on Sunday (Saturday night @5 min past midnight). It is possible to recover to within an hour during the retention timeframe.
Compression: Backups created with the jobs and scripts are compressed to save disk. While this can extend the recovery time, and also increase the CPU during the backup process, it was deemed required for this project.
Windows Account: YourDomain\tfsbackup was used to configure all backup operations within all databases.
Permissions: The above account needs to be given access to not only the server (As a login) but also the databases (as a user). Public permission is acceptable and then specific permissions need to be assigned within the databases. Be sure to pay special attention to the msdb database because the special roles used for SQL Agent need to be accessed. Refer to TFSBackupJobPermissions.sql for information on the setup of permissions within SQL Server, and note the special operation per node in the AG. The user must also be created as a CREDENTIAL and a PROXY is created within the SQL Agent to allow for execution as that account within the AGENT jobs.
***** Special Case ****
The user above must be given permission to “log on as a batch” to all nodes in the AG. This provides access to the external share for the CREDENTIAL created within SQL and its PROXY into the SQL Agent.
How to do this,.. (https://technet.microsoft.com/en-us/library/cc957131.aspx)
Restoring the Databases
Restore: Restoring the databases become a relatively easy task either on the same SQL Server of on a new server. The same backup devices should be installed per the paths and names originally listed on the old server. If those are not available, the device names are arbitrary. What is important is the path to the backup file itself and it’s contents.
Example: As an example I created a backup device on my local SQL 2014 Server instance: sys.sp_addumpdevice @devtype = ‘disk’, — varchar(20)
@logicalname = TfsConfiguration, — sysname
@physicalname = N’\\<%ServerName%>\TfsBackup$\Configuration.bak’, — nvarchar(260)
@cntrltype = 0, — smallint
@devstatus = ” — varchar(40)
After the device is created, which points at the backup file I created a local database called Tfs_Configuration. The UI in SSMS accomplishes this quickly. Once the database is in place, right click on the database and select tasks, then restore from the resultant menu. The UI will appear and be empty. In the source section, select the “device” radio button and the box containing (…) to display a list of backup devices, click add. In the popup window containing a dropdown select “backup devices” and then another small popup window will result, containing a drop down of the backup device names, select the appropriate name.
The UI will then scan through the file and decide what databases are available. The “database” drop down on the main UI will display the available databases in the backup device. Once the database name is selected, the interface will scan for the most recent backups and display those in the “backup sets to restore” box.
At this point you have the option of restoring the database to its most recent copy, or another by clicking on the “timeline” button. It should be noted that within the file are the backups from two separate servers, and the LSN’s are matched in sequence. This follows the strategy outlined above for backups on the AG.
And below you can see the LSN’s and sequencing.
Only Log restored support going to a marked transaction. Select all but the last log file and then go to the options page. Select the “overwrite existing” and leave the database in a state of norecovery. Also, be sure to uncheck the Tail-log backup section.
The result of the restore should be this:
The database will be in recovery and show in the Object Explorer as (Restoring,…)
Right click on the database and select tasks then restore and restore log. Again, select the radio button that says “from device” and use the (…) to go through the same process to get the backup device you did in the previous example.
I’ve amended this post to provide the screenshots of the restore process. Thanks for checking back to see what I was trying to describe.
Healthy Data to you,.
Data Chef out,..