With the arrival of SQL Server 2008 we gained access to a new feature called Change Data Capture (CDC). CDC enables you to constantly stream change data to a consuming process, most notably Extract, Transform, and Load (ETL) applications. In the SQL Server environment the ETL application used is of course SQL Server Integration Services (SSIS).
the first obvious constraint in regard to SQL Server 2008’s CDC feature is that it is only available to SSIS applications that source their data from an OLTP system built on SQL Server 2008. In the field of implementing Microsoft BI there are many relational database systems we have to source data from that are not SQL Server based (and if they are SQL Server based there is a good chance it is not using the 2008 version). All major relational database systems use some form of a transaction log to record the history of its actions in case of rollback or hardware failure. I would imagine that the actual contents of each database vendor’s log is different, however they have the same basic requirement to temporarily persist database actions.
So how to enable Change Data Capture on a database, and on a table, and how to keep track of Data Definition Language changes on a table.
Let’s create a database named SqlHero as shown below.
USE [master] GO /*** Object: Database [SqlHero] ***/ IF EXISTS (SELECT name FROM sys.databases WHERE name = N'SqlHero') DROP DATABASE [SqlHero] GO USE [master] GO /*** Object: Database [SqlHero] ***/ CREATE DATABASE [SqlHero] GO Step 2
Now let’s create a table named MyTable on the SqlHero database, as shown Below.
USE [SqlHero] GO /*** Object: Table [dbo].[MyTable] ***/ IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MyTable]') AND type in (N'U')) DROP TABLE [dbo].[MyTable] GO USE [MyDataBase] GO /*** Object: Table [dbo].[MyTable] ***/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_PADDING ON GO CREATE TABLE [dbo].[MyTable]( [ID] [int] NOT NULL, [Name] [varchar](100) NULL, CONSTRAINT [MyTable_PK] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO SET ANSI_PADDING OFF GO Step 3
In order to track changes on the table, we need to enable the Change Data Capture feature on the database. We can enable the Change Data Capture feature using the following Transact SQL command, as shown below.
Until this point, the only schema that exists on the database is dbo. Once we enable Change Data Capture, a new schema with a bunch of objects will be created.
USE [SqlHero] GO EXEC sys.sp_cdc_enable_db_change_data_capture GO
The following CDC tables are created under the CDC schema, as shown below
cdc.captured_columns cdc.change_tables cdc.ddl_history cdc.index_columns cdc.lsn_time_mapping
When you query these tables, you will see only zero number of rows.
select * from cdc.captured_columns select * from cdc.change_tables select * from cdc.ddl_history select * from cdc.index_columns select * from cdc.lsn_time_mapping
(0 row(s) affected) (0 row(s) affected) (0 row(s) affected) (0 row(s) affected) (0 row(s) affected)
In order to track changes on the table, we need to enable the Change Data Capture feature on the table as well. Let’s enable the Change Data Capture feature using the following Transact SQL command, as shown below.
USE [SqlHero] GO EXEC sys.sp_cdc_enable_table_change_data_capture @source_schema = 'dbo', @source_name = 'MyTable', @role_name = 'cdc_MyTable' GO
Job 'cdc.MyDataBase_capture' started successfully. Job 'cdc.MyDataBase_cleanup' started successfully.
By reading the result, we can easily understand that SQL Server Agent is a must to do the capture and cleanup. We can see these jobs are actually created as SQL Server Scheduled jobs. [Refer Fig 1.6, 1.7]
Now let’s execute the following Transact SQL to see if any data has been inserted on any of the CDC tables.
select * from cdc.captured_columns select * from cdc.change_tables select * from cdc.index_columns
object_id, column_name, column_id, column_type, column_ordinal, is_computed 389576426, ID, 1, int, 1, 0 389576426, Name, 2, varchar, 2, 0 (2 row(s) affected) object_id, version, source_object_id, capture_instance, start_lsn, end_lsn, supports_net_changes, has_drop_pending, role_name, index_name, filegroup_name, create_date 389576426, 0, 53575229, dbo_MyTable, NULL, NULL, 0, NULL, cdc_MyTable, MyTable_PK, NULL, (1 row(s) affected) object_id, column_name, index_ordinal, column_id 389576426, ID, 1, 1 (1 row(s) affected)
You can see that the CDC schema stores metadata information about which tables and columns are being tracked by Change Data Capture. It also stores information about what Index the table that has been tracked has.
We can check to see if Change Data Capture is enabled on a database by using the following transact SQL statement.
SELECT is_cdc_enabled FROM sys.databases WHERE name = 'SqlHero'
is_cdc_enabled -------------- 1 (1 row(s) affected)
We can check to see if Change Data Capture is enabled on a table, by using the following transact SQL statement.
SELECT is_tracked_by_cdc FROM sys.tables WHERE name = 'MyTable'
is_tracked_by_cdc ----------------- 1 (1 row(s) affected)
Now let’s make some changes to table structure and see if Change Data Capture captures the changes. Execute the following query as shown below.
Alter Table MyTable add Address varchar(500)
Alter Table MyTable add Salary money
Alter Table MyTable add Bonus money
Query the cdc table ddl_history as shown below.
select * from cdc.ddl_history
source_object_id, object_id, required_column_update, ddl_command, ddl_lsn, ddl_time 53575229, 389576426, 0, Alter Table MyTable add Address varchar(500) , 0x0000001C000001350001, 53575229, 389576426, 0, Alter Table MyTable add Salary money , 0x0000001C000001370018, 53575229, 389576426, 0, Alter Table MyTable add Bonus money , 0x0000001C0000013D0018, (3 row(s) affected)