Change Data Capture -Part2

June 17, 2009

How To Process Change Data Capture (CDC) in SQL Server Integration Services SSIS 2008

Assume that we have a customer table and we want to store all changes to our customer table in a customer_audit table.  We’ll write an SSIS package to query the CDC data and copy it to the customer_audit table.  We would like the option of running the SSIS package on demand and/or on a schedule (e.g. a SQL Agent job).  Each time we run the SSIS package we want to pickup whatever changed since the last time we ran the package.

Customer Tables

We will use the following customer table:

create table dbo.customer (

customer_id int identity primary key not null

, name nvarchar(50) not null

, sales_rep nvarchar(50) not null

, region nvarchar(50) not null

, credit_limit int not null

)

We will use the following customer_audit table to store changes to the customer table:

create table dbo.customer_audit (

customer_audit_id int identity primary key not null

, customer_id int not null

, name nvarchar(50) not null

, sales_rep nvarchar(50) not null

, region nvarchar(50) not null

, credit_limit int not null

, effective_date datetime not null

, __$start_lsn binary(10) not null

, __$seqval binary(10) not null

, __$operation int not null

, __$update_mask varbinary(128) not null

)

Enabling CDC

As we see before.

Logging

In order to allow our SSIS package to pickup just the changes since the last time it was run, we’ll populate a log table with the data we need.  We’ll use the following log table:

create table dbo.cdc_capture_log (

cdc_capture_log_id int identity not null

, capture_instance nvarchar(50) not null

, start_time datetime not null

, min_lsn binary(10) not null

, max_lsn binary(10) not null

, end_time datetime null

, insert_count int not null default 0

, update_count int not null default 0

, delete_count int not null default 0

, status_code int not null default 0

)

We’ll create two stored procedures to maintain the log:

  • init_cdc_capture_log will create a new row.
  • end_cdc_capture_log will update the row.

The init_cdc_capture_log is called at the beginning of our SSIS package.  It is shown below:

create procedure dbo.init_cdc_capture_log

@capture_instance nvarchar(50)

as

begin

set nocount on;

declare

@begin_lsn binary(10)

, @end_lsn binary(10)

, @prev_max_lsn binary(10)

— get the max LSN for the capture instance from

— the last extract

select @prev_max_lsn = max(max_lsn)

from dbo.cdc_capture_log

where capture_instance = @capture_instance

— if no row found in cdc_capture_log get the min lsn

— for the capture instance

if @prev_max_lsn is null

set @begin_lsn = sys.fn_cdc_get_min_lsn(@capture_instance)

else

set @begin_lsn = sys.fn_cdc_increment_lsn(@prev_max_lsn)

— get the max lsn

set @end_lsn = sys.fn_cdc_get_max_lsn()

insert into dbo.cdc_capture_log

(capture_instance,start_time,min_lsn,max_lsn)

values

(@capture_instance,getdate(),@begin_lsn,@end_lsn)

select cast(scope_identity() as int) cdc_capture_log_id

end

The end_cdc_capture_log stored procedure updates the row created  by the init_cdc_capture_log stored procedure.  It is shown below:

create procedure dbo.end_cdc_capture_log

@cdc_capture_log_id int

, @insert_count int

, @update_count int

, @delete_count int

as

begin

set nocount on;

update dbo.cdc_capture_log set

end_time = getdate()

, insert_count = @insert_count

, update_count = @update_count

, delete_count = @delete_count

, status_code = 1

where cdc_capture_log_id = @cdc_capture_log_id

end

Creating the SSIS Package

We will create an SSIS package that has the following control flow:

clip_image001

The main points about the above control flow are:

  • Init Log is an Execute SQL task; it calls the init_cdc_capture_log stored procedure (described above) and saves the identity value of the created cdc_capture_log row in a package variable.
  • Process Changes is a Data Flow task that retrieves the latest changes from the CDC table and copies them to our audit table.
  • End Log is an Execute SQL task that calls the end_cdc_capture_log stored procedure (described above) to update the cdc_capture_log row.

The Process Changes Data Flow task is implemented as shown below:

clip_image002

The main points about the above data flow are:

  • Extract Customer Changes is an OLE DB Source that executes the stored procedure extract_customer_capture_log to retrieve the customer changes since the last run.
  • Count Inserts Updates and Deletes is a Script Component Transform task that just counts the number of inserts, updates and deletes in the change data.
  • Save Customer Changes to Custom Audit Table is an OLE DB Destination used to insert each change row into the customer_audit table.

The extract_customer_capture_log stored procedure is shown below:

create procedure dbo.extract_customer_capture_log

@cdc_capture_log_id int

as

begin

set nocount on;

declare

@begin_lsn binary(10)

, @end_lsn binary(10)

— get the lsn range to process

select

@begin_lsn = min_lsn

, @end_lsn = max_lsn

from dbo.cdc_capture_log

where cdc_capture_log_id = @cdc_capture_log_id

— extract and return the changes

select m.tran_end_time modified_ts, x.*

from cdc.fn_cdc_get_all_changes_customer_all(

@begin_lsn, @end_lsn, ‘all’

) x

join cdc.lsn_time_mapping m

on m.start_lsn = x.__$start_lsn ;

end

The main points about the above stored procedure are:

  • The cdc_capture_log_id parameter value is the value returned from the call to the init_cdc_capture_log stored procedure (described above in the Logging section).
  • Retrieve the LSN range from the cdc_capture_log table row.  The LSN range represents all changes that have occurred since the last run of the SSIS package.
  • The cdc.fn_cdc_get_all_changes_customer_all function is generated when you enable CDC.  The function name includes the capture instance.  The function returns the changes that occurred in the LSN range.
  • The cdc.lsn_time_mapping table is populated by CDC with the mapping of transaction times to LSNs.  The join retrieves the transaction time.  This alleviates the need to manually track this in the source table.

Testing the SSIS Package

Before running the SSIS package, we need to execute a script that performs some inserts, updates and deletes.  We’ll use the following script:

use SqlHero

go

insert into dbo.customer

(name,sales_rep,region,credit_limit)

values

(N’BGM Systems’, N’Cane’, N’South’, 2500)

update dbo.customer

set sales_rep = N’Smith’

where [name] = N’BGM Systems’

update dbo.customer

set credit_limit = 3000

where [name] = N’BGM Systems’

delete dbo.customer

where [name] = N’BGM Systems’

delete dbo.customer

where [name] = N’BGM Systems’

After running the above script, execute the SSIS package, then check the customer_audit table to see that there are four rows; one for each change made in the script.  The partial contents of the customer_audit table are shown below:

clip_image003

The main points about the above table are:

  • effective_date is the date and time of the transaction, as retrieved from the cdc.lsn_time_mapping table.
  • Row 1 shows the insert; __$operation=2 for an insert.
  • Row 2 shows the update of the sales_rep; __$operation=4 for the update showing the values after the update.
  • Row 3 shows the update of the credit_limit.
  • Row 4 shows the delete; __$operation=1 for a delete.

The effective_date column provides the ability to query the customer_audit table and see the customer values at any point in time by filtering on the maximum effective_date that is less than or equal to some value.


SQL Server Recovery Model Types

June 16, 2009

SQL Server has 3 recovery model types which control types of backups so let’s discuss in more details all recovery types

when you do these operation sequential

right click any database—> properties –> select Pages –> options

you will see the dialog bellow

untitled

as you see in fig above you have 3 types

the relationship between backup types(Rows Headers) and Recovery Models(Columns Header) are

  Full Bulk-logged Simple
Full Backup Yes Yes Yes
Differential Backup Yes Yes Yes
Transaction Backup Yes Yes No

it’s very clear that you can’t make trasaction log backup in simple mode.

so what is the Differential between Full mode and Bulk-Logged mode?

Full mode log any transaction in Transaction log file but Bulk-logged can’t log huge transaction to log file like bulk insert or create index in huge tables

you can use Bulk logged for more performance to insert bulk data to database because in this mode you will not log in  transaction log.

you can use Full mode to log any small things to transaction log file to feel safe when to return to end point of disaster


SQL Server Major Backup Types

June 15, 2009

we have many types of backups but in this article i just discuses 3 types of backups

  1. Full Backup:- Backup all data and database objects
  2. Differential Backup:- Backup all data and database objects from last full backup
  3. Transaction Log Backup:- Backup all transaction from Transaction log file from last official backup official backup can be(Full backup or differential backup or transaction log backup)

let’s discuss fig bellow

Backup

F1 is full backup from database 

T1 is Transaction Log backup with difference from F1

D1 is differential backup with difference from F1

T2 is Transaction Log backup with difference from D2

D2 is differential backup with difference from F1

T3 is Transaction Log backup with difference from D2

D3 is differential backup with difference from F1

T4 is Transaction Log backup with difference from D3

D4 is differential backup with difference from F1

T5 is Transaction Log backup with difference from D4

D5 is differential backup with difference from F1


Partition Table in SQL Server 2008

June 13, 2009

1. What is Partition table?

Data is partitioned horizontally by range

Partition Table

2. How to Create Partition Table?

You can create partition table by 2 ways

  1. programmability
  2. wizard in SQL Server 2008 New

But you must understand 2 things before creation Partition Tables

Partition Function: Partition functions define partition boundaries and Boundary values can be assigned to LEFT or RIGHT

clip_image004

Partition Scheme: A partition scheme assigns partitions to filegroups

clip_image006

3. How to Create Partition Table by wizard in SQL Server 2008?

Right click on table then go to storage then select Create Partition as in fig bellow

1

Select a column you want to use it in partition as in fig bellow

2

Create Partition function as in fig bellow

3

Create Partition Scheme as in fig bellow

4

Determine ranges as in fig bellow

5

Create Partition by select run immediately as in fig bellow

6    

Congratulation your table now partitioned

4. How to Create Partition Table programmability?

/***************************************************

1)Create database TestDb

****************************************************/

CREATE DATABASE [TestDb] ON PRIMARY

( NAME = N’TestDb’, FILENAME = N’C:\db\TestDb.mdf’ , SIZE = 3072KB , FILEGROWTH = 1024KB ),

FILEGROUP [fg1]

( NAME = N’f1′, FILENAME = N’C:\db\f1.ndf’ , SIZE = 3072KB , FILEGROWTH = 1024KB ),

FILEGROUP [fg2]

( NAME = N’f2′, FILENAME = N’C:\db\f2.ndf’ , SIZE = 3072KB , FILEGROWTH = 1024KB ),

FILEGROUP [fg3]

( NAME = N’f3′, FILENAME = N’C:\db\f3.ndf’ , SIZE = 3072KB , FILEGROWTH = 1024KB ),

FILEGROUP [fg4]

( NAME = N’f4′, FILENAME = N’C:\db\f4.ndf’ , SIZE = 3072KB , FILEGROWTH = 1024KB )

LOG ON

( NAME = N’TestDb_log’, FILENAME = N’C:\db\TestDb_log.ldf’ , SIZE = 1024KB , FILEGROWTH = 10%)

GO

USE [TestDb]

GO

/***************************************************

3)Create Partition Function

****************************************************/

Create Partition Function pf_Date(Date)

as range right

For values(’01/01/2006′,’01/01/2007′,’01/01/2008′)

go

/***************************************************

3)Create Partition scheme

****************************************************/

Create Partition Scheme ps_Date

AS Partition pf_Date

to(fg1,fg2,fg3,fg4)

go

/***************************************************

3)Create Partition Contact Table

****************************************************/

CREATE TABLE [dbo].[Contact](

[ContactId] [int] IDENTITY(1,1) NOT NULL,

[FullName] [nvarchar](50) NULL,

[CreationDate] [date] NULL

)on ps_Date(CreationDate)

GO

/***************************************************

4)Insert rows to table

****************************************************/

insert into dbo.Contact values(‘amosa’,’01/01/2006′)

insert into dbo.Contact values(‘mmosa’,’01/01/2007′)

insert into dbo.Contact values(‘zemo’,’01/01/2008′)

insert into dbo.Contact values(‘test1′,’05/01/2005’)

insert into dbo.Contact values(‘test2′,’05/01/2006’)

insert into dbo.Contact values(‘test3′,’05/01/2007’)

insert into dbo.Contact values(‘test4′,’05/01/2008’)

insert into dbo.Contact values(‘test5′,’05/01/2009’)

go

/***************************************************

5)Query Table to know where did rows save

****************************************************/

Select *,$partition.pf_Date(CreationDate)FileGroupNo From dbo.Contact


Be Ready for Microsoft SQL Server 2008 R2

June 4, 2009

to get more information about Microsoft SQLServer 2008 R2 you can visit SQLServer2008 R2


Change Data Capture – Part I

May 31, 2009

With the arrival of SQL Server 2008 we gained access to a new feature called Change Data Capture (CDC). CDC enables you to constantly stream change data to a consuming process, most notably Extract, Transform, and Load (ETL) applications. In the SQL Server environment the ETL application used is of course SQL Server Integration Services (SSIS). 

the first obvious constraint in regard to SQL Server 2008’s CDC feature is that it is only available to SSIS applications that source their data from an OLTP system built on SQL Server 2008. In the field of implementing Microsoft BI there are many relational database systems we have to source data from that are not SQL Server based (and if they are SQL Server based there is a good chance it is not using the 2008 version). All major relational database systems use some form of a transaction log to record the history of its actions in case of rollback or hardware failure. I would imagine that the actual contents of each database vendor’s log is different, however they have the same basic requirement to temporarily persist database actions.

So how to enable Change Data Capture on a database, and on a table, and how to keep track of Data Definition Language changes on a table.

Step 1

Let’s create a database named SqlHero as shown below.

USE [master]
GO

/*** Object:  Database [SqlHero]   ***/
IF  EXISTS (SELECT name FROM sys.databases WHERE name = N'SqlHero')
DROP DATABASE [SqlHero]
GO
USE [master]
GO

/*** Object:  Database [SqlHero]   ***/
CREATE DATABASE [SqlHero]
GO

Step 2

Now let’s create a table named MyTable on the SqlHero database, as shown Below.

USE [SqlHero]
GO

/*** Object:  Table [dbo].[MyTable]   ***/
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MyTable]')
	AND type in (N'U'))
DROP TABLE [dbo].[MyTable]
GO
USE [MyDataBase]
GO

/*** Object:  Table [dbo].[MyTable]   ***/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[MyTable](
	[ID] [int] NOT NULL,
	[Name] [varchar](100) NULL,
 CONSTRAINT [MyTable_PK] PRIMARY KEY CLUSTERED
(
	[ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF,
	IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON,
	ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

Step 3

In order to track changes on the table, we need to enable the Change Data Capture feature on the database. We can enable the Change Data Capture feature using the following Transact SQL command, as shown below.

Until this point, the only schema that exists on the database is dbo. Once we enable Change Data Capture, a new schema with a bunch of objects will be created.

USE [SqlHero]
GO
EXEC sys.sp_cdc_enable_db_change_data_capture
GO

The following CDC tables are created under the CDC schema, as shown below

cdc.captured_columns
cdc.change_tables
cdc.ddl_history
cdc.index_columns
cdc.lsn_time_mapping

When you query these tables, you will see only zero number of rows.

select * from cdc.captured_columns
select * from cdc.change_tables
select * from cdc.ddl_history
select * from cdc.index_columns
select * from cdc.lsn_time_mapping

Result

(0 row(s) affected)
 (0 row(s) affected)
(0 row(s) affected)
 (0 row(s) affected)
 (0 row(s) affected)

 

Step 3

In order to track changes on the table, we need to enable the Change Data Capture feature on the table as well. Let’s enable the Change Data Capture feature using the following Transact SQL command, as shown below.

USE [SqlHero]
GO
EXEC sys.sp_cdc_enable_table_change_data_capture
@source_schema = 'dbo',
@source_name = 'MyTable',
@role_name = 'cdc_MyTable'
GO

Result

Job 'cdc.MyDataBase_capture' started successfully.
Job 'cdc.MyDataBase_cleanup' started successfully.

 

By reading the result, we can easily understand that SQL Server Agent is a must to do the capture and cleanup. We can see these jobs are actually created as SQL Server Scheduled jobs. [Refer Fig 1.6, 1.7]

Now let’s execute the following Transact SQL to see if any data has been inserted on any of the CDC tables.

select * from cdc.captured_columns
select * from cdc.change_tables
select * from cdc.index_columns

Result

object_id, column_name, column_id, column_type, column_ordinal, is_computed
389576426, ID, 1, int, 1, 0
389576426, Name, 2, varchar, 2, 0

(2 row(s) affected)

object_id,  version,  source_object_id, capture_instance, start_lsn, end_lsn, supports_net_changes, has_drop_pending,
	role_name, index_name, filegroup_name, create_date
389576426, 0, 53575229, dbo_MyTable, NULL, NULL, 0, NULL, cdc_MyTable, MyTable_PK, NULL,
(1 row(s) affected)

object_id, column_name, index_ordinal, column_id
389576426, ID, 1, 1

(1 row(s) affected)

You can see that the CDC schema stores metadata information about which tables and columns are being tracked by Change Data Capture. It also stores information about what Index the table that has been tracked has.

Step 4

We can check to see if Change Data Capture is enabled on a database by using the following transact SQL statement.

SELECT is_cdc_enabled FROM sys.databases WHERE name = 'SqlHero'

Result

is_cdc_enabled
--------------
1

(1 row(s) affected)

Step 5

We can check to see if Change Data Capture is enabled on a table, by using the following transact SQL statement.

SELECT is_tracked_by_cdc FROM sys.tables WHERE name = 'MyTable'

Result

is_tracked_by_cdc
-----------------
1

(1 row(s) affected)

Step 6

Now let’s make some changes to table structure and see if Change Data Capture captures the changes. Execute the following query as shown below.

USE [SqlHero]

GO

Alter Table MyTable add Address varchar(500)

GO

Alter Table MyTable add Salary money

GO

Alter Table MyTable add Bonus money

GO

 

Query the cdc table ddl_history as shown below.

select * from cdc.ddl_history

Result

source_object_id, object_id, required_column_update, ddl_command, ddl_lsn, ddl_time
53575229, 389576426, 0, Alter Table MyTable add Address varchar(500)
, 0x0000001C000001350001,
53575229, 389576426, 0, Alter Table MyTable add Salary money
, 0x0000001C000001370018, 53575229, 389576426, 0, Alter Table MyTable add Bonus money
, 0x0000001C0000013D0018,
(3 row(s) affected)


Using Policy management in SQL Server 2008 to manage table name

May 31, 2009

 

introduction

Policy-Based Management facet

A set of logical properties that model the behavior or characteristics for certain types of managed targets. The number and characteristics of the properties are built into the facet and can be added or removed by only the maker of the facet. A target type can implement one or more management facets, and a management facet can be implemented by one or more target types. Some properties of a facet can only apply to a specific version. For example, the Database Mail property of the Surface Area Configuration facet only applies to SQL Server 2005 and later versions.

Policy-Based Management condition

A Boolean expression that specifies a set of allowed states of a Policy-Based Management managed target with regard to a management facet.

Policy-Based Management policy

A Policy-Based Management condition and the expected behavior, for example, evaluation mode, target filters, and schedule. A policy can contain only one condition. Policies can be enabled or disabled.

Example

1) Create New Condition

image

image

2)Create New Policy

image

image

3)Enable Policy

image

Now try to create any table in any database if you start table by tbl creation will work otherwise creation will through exception