SQL Server: May 2013

Friday, May 31, 2013

Normalization in SQL Server

Defination : Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. There are several benefits for using Normalization in Database.
Benefits :

Eliminate data redundancy
Improve performance
Query optimization
Faster update due to less number of columns in one table
Index improvement

There are diff. - diff. types of Normalizations form available in the Database. Lets see one by one.
1. First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:

Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Remove repetative groups
Create Primary Key


Name	State	Country	Phone1	Phone2	Phone3
John	101	1	488-511-3258	781-896-9897	425-983-9812
Bob	102	1	861-856-6987
Rob	201	2	587-963-8425	425-698-9684
PK			[ Phone Nos ]
?				?
ID	Name	State	Country	Phone
1	John	101	1	488-511-3258
2	John	101	1	781-896-9897
3	John	101	1	425-983-9812
4	Bob	102	1	861-856-6987
5	Rob	201	2	587-963-8425
6	Rob	201	2	425-698-9684

2. Second Normal Form (2NF)Second normal form (2NF) further addresses the concept of removing duplicative data:

· Meet all the requirements of the first normal form.

· Remove subsets of data that apply to multiple rows of a table and place them in separate tables.

· Create relationships between these new tables and their predecessors through the use of foreign keys.

Remove columns which create duplicate data in a table and related a new table with Primary Key – Foreign Key relationship

ID	Name	State	Country	Phone
1	John	101	1	488-511-3258
2	John	101	1	781-896-9897
3	John	101	1	425-983-9812
4	Bob	102	1	861-856-6987
5	Rob	201	2	587-963-8425
6	Rob	201	2	425-698-9684


ID	Name	State	Country		PhoneID	ID	Phone
1	John	101			1	1	488-511-3258
2	Bob	102			2	1	781-896-9897
3	Rob	201			3	1	425-983-9812
					4	2	587-963-8425
					5	3	587-963-8425
					6	3	425-698-9684

3. Third Normal Form (3NF)

Third normal form (3NF) goes one large step further:

· Meet all the requirements of the second normal form.

· Remove columns that are not dependent upon the primary key.

Country can be derived from State also… so removing country

ID	Name	State	Country
1	John	101	1
2	Bob	102	1
3	Rob	201	2

4. Fourth Normal Form (4NF)

Finally, fourth normal form (4NF) has one additional requirement:

· Meet all the requirements of the third normal form.

· A relation is in 4NF if it has no multi-valued dependencies.

If PK is composed of multiple columns then all non-key attributes should be derived from FULL PK only. If some non-key attribute can be derived from partial PK then remove it

The 4NF also known as BCNF NF

TeacherID	StudentID	SubjectID	StudentName
101	1001	1	John
101	1002	2	Rob
201	1002	3	Bob
201	1001	2	Rob


TeacherID	StudentID	SubjectID	StudentName
101	1001	1	X
101	1002	2	X
201	1001	3	X
201	1002	2	X

Thursday, May 30, 2013

SQL SERVER – ACID (Atomicity, Consistency, Isolation, Durability)

Atomicity is an all-or-none proposition.

Consistency guarantees that a transaction never leaves your database in a half-finished state.

Isolation keeps transactions separated from each other until they’re finished.

Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination.

Durability ensures that any transaction committed to the database will not be lost. Durability is ensured through the use of database backups and transaction logs that facilitate the restoration of committed transactions in spite of any subsequent software or hardware failures.

Tuesday, May 28, 2013

RANKING Functions – ROW_NUMBER, RANK, DENSE_RANK, NTILE

SQL Server 2005 has total of 4 ranking function. Ranking functions return a ranking value for each row in a partition. All the ranking functions are non-deterministic.

ROW_NUMBER () OVER ([<partition_by_clause>] <order_by_clause>)
Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.

RANK () OVER ([<partition_by_clause>] <order_by_clause>)
Returns the rank of each row within the partition of a result set with gap in rank.

DENSE_RANK () OVER ([<partition_by_clause>] <order_by_clause>)
Returns the rank of rows within the partition of a result set, without any gaps in the ranking.

NTILE (integer_expression) OVER ([<partition_by_clause>] <order_by_clause>)
Distributes the rows in an ordered partition into a specified number of groups.

SELECT
ROW_NUMBER() OVER(ORDER BY sal DESC) AS 'row number',
RANK() OVER(ORDER BY sal DESC) AS 'rank',
DENSE_RANK() OVER(ORDER BY sal DESC)AS 'dense rank',
NTILE(3)OVER(ORDER BY sal DESC)AS 'Quartile'
,sal
from tbl_groupingSet

Result will display as

Saturday, May 25, 2013

Merge Statement

MERGE is a new feature that provides an efficient way to perform multiple DML operations. In previous versions of SQL Server, we had to write separate statements to INSERT, UPDATE, or DELETE data based on certain conditions, but now, using MERGE statement we can include the logic of such data modifications in one statement that even checks when the data is matched then just update it and when unmatched then insert it.

One of the most important advantage of MERGE statement is all the data is read and processed only once.
There is no maximum number limitation on merge statements in a stored procedure
Example:

Let’s create Student Details and StudentTotalMarks and inserted some records.

This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the data in the source and target tables are processed. However, performance gains depend on having correct indexes, joins, and other considerations in place.

Student Details:



CREATE TABLE StudentDetails

(

StudentID INTEGER PRIMARY KEY,

StudentName VARCHAR(15)

)

GO

INSERT INTO StudentDetails

VALUES(1,'SMITH')

INSERT INTO StudentDetails

VALUES(2,'ALLEN')

INSERT INTO StudentDetails

VALUES(3,'JONES')

INSERT INTO StudentDetails

VALUES(4,'MARTIN')

INSERT INTO StudentDetails

VALUES(5,'JAMES')

GO

StudentTotalMarks:

CREATE TABLE StudentTotalMarks

(

StudentID INTEGER REFERENCES StudentDetails,

StudentMarks INTEGER

)

GO

INSERT INTO StudentTotalMarks

VALUES(1,230)

INSERT INTO StudentTotalMarks

VALUES(2,255)

INSERT INTO StudentTotalMarks

VALUES(3,200)

GO

MERGE

 StudentTotalMarks AS stm

USING (SELECT StudentID,StudentName FROM StudentDetails) AS sd

ON stm.StudentID = sd.StudentID

WHEN MATCHED AND stm.StudentMarks > 250 THEN DELETE

WHEN MATCHED THEN UPDATE SET stm.StudentMarks = stm.StudentMarks + 25

WHEN NOT MATCHED THEN

INSERT(StudentID,StudentMarks)

VALUES(sd.StudentID,25);

Thursday, May 23, 2013

Common Table Expressions

A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.
A CTE can be used to:

Create a recursive query.
Substitute for a view when the general use of a view is not required; that is, you do not have to store the definition in metadata.
Enable grouping by a column that is derived from a scalar subselect, or a function that is either not deterministic or has external access.
Reference the resulting table multiple times in the same statement.

The basic syntax structure for a CTE is:

WITH expression_name [ ( column_name [,...n] ) ]

AS

( CTE_query_definition )

The list of column names is optional only if distinct names for all resulting columns are supplied in the query definition.
The statement to run the CTE is:

SELECT <column_list>

FROM expression_name;

Example:------

WITH cte_data(country,sal)
AS
(
SELECT country,SUM(sal)
FROM tbl_groupingSet
GROUP BY country
)

SELECT * FROM cte_data

CTE with multiple references

;WITH cte_data_1(country,sal)
AS
(
SELECT country,SUM(sal)
FROM tbl_groupingSet
GROUP BY country
),
cte_data_2(country,sal)
AS
(
SELECT country,SUM(sal)
FROM tbl_groupingSet
GROUP BY country
)
SELECT * FROM cte_data_1 CROSS JOIN cte_data_2

Bydefault Recursion level is 100

CTEs recursion level crosses the limit then following error is thrown by SQL Server engine:
Msg 530, Level 16, State 1, Line 11
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

you can change the level by using the MAXRECURSION option/hint. The recursion level ranges from 0 and 32,767

DECLARE
    @startDate DATETIME,
    @endDate DATETIME

SET @startDate = '11/10/1985'
SET @endDate = '03/25/2012'

; WITH CTE AS (
    SELECT
        YEAR(@startDate) AS 'yr',
        MONTH(@startDate) AS 'mm',
        DATENAME(mm, @startDate) AS 'mon',
        DATEPART(d,@startDate) AS 'dd',
        @startDate 'new_date'
    UNION ALL
    SELECT
        YEAR(new_date) AS 'yr',
        MONTH(new_date) AS 'mm',
        DATENAME(mm, new_date) AS 'mon',
        DATEPART(d,@startDate) AS 'dd',
        DATEADD(d,1,new_date) 'new_date'
    FROM CTE
    WHERE new_date < @endDate
    )
SELECT yr AS 'Year', mon AS 'Month', count(dd) AS 'Days'
FROM CTE
GROUP BY mon, yr, mm
ORDER BY yr, mm
OPTION (MAXRECURSION 32767)
Reverse a string without using TSQL’s REVERSE() function
DECLARE @StringToReverse VARCHAR(55)
SET @StringToReverse = 'Sandeep'

;WITH cte AS (
      SELECT @StringToReverse AS string, CAST('' AS VARCHAR(55)) AS revStr, LEN(@StringToReverse) AS ln
      UNION ALL
      SELECT SUBSTRING(string,0,ln) AS string, CAST(revStr + SUBSTRING(string,ln,1) AS VARCHAR(55)) AS revStr, ln-1 AS ln
      FROM cte
      WHERE ln >= 1)
SELECT @StringToReverse AS String, revStr,ln
FROM cte
WHERE ln = 0

CTE /Temp Table/Table variable

CTE:(Common Table Expression)
A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

Temp tables

1. The table "exists" - that is, it's materialized as a table, at least in memory, which contains the result set and can be reused.
2. In some cases, performance can be improved or blocking reduced when you have to perform some elaborate transformation on the data - for example, if you want to fetch a 'snapshot' set of rows out of a base table that is busy, and then do some complicated calculation on that set, there can be less contention if you get the rows out of the base table and unlock it as quickly as possible, then do the work independently. In some cases the overhead of a real temp table is small relative to the advantage in concurrency.

Table Variables

A table variable goes out of scope immediate after the batch ends -- just like regular variables. This means you don't have to explicitly drop them at the end of scripts.

Table variable and Temp table both resides in TempDB.

It is not required to drop Table variable after the usage.

Table variables can not have Non-Clustered Indexes
You can not create constraints in table variables
You can not create default values on table variable columns
Statistics can not be created against table variables

Advantages of derived tables:
1. A derived table is part of a larger, single query, and will be optimized in the context of the rest of the query. This can be an advantage, if the query optimization helps performance (it usually does, with some exceptions). Example: if you populate a temp table, then consume the results in a second query, you are in effect tying the database engine to one execution method (run the first query in its entirety, save the whole result, run the second query) where with a derived table the optimizer might be able to find a faster execution method or access path.
2. A derived table only "exists" in terms of the query execution plan - it's purely a logical construct. There really is no table.

Pages