Friday, May 31, 2013

Normalization in SQL Server

Defination : Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. There are several benefits for using Normalization in Database.
Benefits :
  1. Eliminate data redundancy
  2. Improve performance
  3. Query optimization
  4. Faster update due to less number of columns in one table
  5. Index improvement
There are diff. - diff. types of Normalizations form available in the Database. Lets see one by one.
1. First Normal Form (1NF)
 First normal form (1NF) sets the very basic rules for an organized database:
  • Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
    • Remove repetative groups
    • Create Primary Key

           
Name State Country Phone1 Phone2 Phone3
John 101 1 488-511-3258 781-896-9897 425-983-9812
Bob 102 1 861-856-6987    
Rob 201 2 587-963-8425 425-698-9684  
 PK

                 [ Phone Nos ]
   ?


?  
ID Name State Country Phone  
1 John 101 1 488-511-3258  
2 John 101 1 781-896-9897  
3 John 101 1 425-983-9812  
4 Bob 102 1 861-856-6987  
5 Rob 201 2 587-963-8425  
6 Rob 201 2 425-698-9684  
           
2. Second Normal Form (2NF)Second normal form (2NF) further addresses the concept of removing duplicative data:
·         Meet all the requirements of the first normal form.
·         Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
·         Create relationships between these new tables and their predecessors through the use of foreign keys.   
Remove columns which create duplicate data in a table and related a new table with Primary Key – Foreign Key relationship

ID Name State Country Phone


1 John 101 1 488-511-3258


2 John 101 1 781-896-9897


3 John 101 1 425-983-9812


4 Bob 102 1 861-856-6987


5 Rob 201 2 587-963-8425


6 Rob 201 2 425-698-9684


















ID Name State Country
PhoneID ID Phone
1 John 101  
1 1 488-511-3258
2 Bob 102
2 1 781-896-9897
3 Rob 201
3 1 425-983-9812





4 2 587-963-8425





5 3 587-963-8425





6 3 425-698-9684

3. Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
·         Meet all the requirements of the second normal form.
·         Remove columns that are not dependent upon the primary key.
  Country can be derived from State also… so removing country
  ID   Name   State   Country
  1   John    101       1
  2   Bob    102       1
  3   Rob    201       2

4. Fourth Normal Form (4NF)
Finally, fourth normal form (4NF) has one additional requirement:
·         Meet all the requirements of the third normal form.
·         A relation is in 4NF if it has no multi-valued dependencies.

If PK is composed of multiple columns then all non-key attributes should be derived from FULL PK only. If some non-key attribute can be derived from partial PK then remove it

The 4NF also known as BCNF NF

   TeacherID StudentID SubjectID  StudentName
     101   1001   1   John
     101   1002   2   Rob
     201   1002   3   Bob
     201   1001   2   Rob








   TeacherID    StudentID   SubjectID   StudentName
  101   1001   1          X
  101   1002   2          X
  201   1001   3          X
  201   1002   2         X

Thursday, May 30, 2013

SQL SERVER – ACID (Atomicity, Consistency, Isolation, Durability)

Atomicity is an all-or-none proposition.
Consistency guarantees that a transaction never leaves your database in a half-finished state.
Isolation keeps transactions separated from each other until they’re finished.
Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination.
OR
Durability ensures that any transaction committed to the database will not be lost. Durability is ensured through the use of database backups and transaction logs that facilitate the restoration of committed transactions in spite of any subsequent software or hardware failures.

Tuesday, May 28, 2013

RANKING Functions – ROW_NUMBER, RANK, DENSE_RANK, NTILE

SQL Server 2005 has total of 4 ranking function. Ranking functions return a ranking value for each row in a partition. All the ranking functions are non-deterministic.

ROW_NUMBER () OVER ([<partition_by_clause>] <order_by_clause>)
Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
RANK () OVER ([<partition_by_clause>] <order_by_clause>)
Returns the rank of each row within the partition of a result set with gap in rank.
DENSE_RANK () OVER ([<partition_by_clause>] <order_by_clause>)
Returns the rank of rows within the partition of a result set, without any gaps in the ranking.
NTILE (integer_expression) OVER ([<partition_by_clause>] <order_by_clause>)
Distributes the rows in an ordered partition into a specified number of groups.

SELECT
ROW_NUMBER() OVER(ORDER BY sal DESC) AS 'row number',
RANK() OVER(ORDER BY sal DESC) AS 'rank',
DENSE_RANK() OVER(ORDER BY sal DESC)AS 'dense rank',
NTILE(3)OVER(ORDER BY sal DESC)AS 'Quartile'
,sal
from tbl_groupingSet

Result will display as

Saturday, May 25, 2013

Merge Statement

MERGE is a new feature that provides an efficient way to perform multiple DML operations. In previous versions of SQL Server, we had to write separate statements to INSERT, UPDATE, or DELETE data based on certain conditions, but now, using MERGE statement we can include the logic of such data modifications in one statement that even checks when the data is matched then just update it and when unmatched then insert it.
One of the most important advantage of MERGE statement is all the data is read and processed only once.
There is no maximum number limitation on merge statements in a stored procedure
Example:
Let’s create Student Details and StudentTotalMarks and inserted some records.

This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the data in the source and target tables are processed. However, performance gains depend on having correct indexes, joins, and other considerations in place.  

Student Details:

CREATE TABLE StudentDetails
(
StudentID INTEGER PRIMARY KEY,
StudentName VARCHAR(15)
)
GO
INSERT INTO StudentDetails
VALUES(1,'SMITH')
INSERT INTO StudentDetails
VALUES(2,'ALLEN')
INSERT INTO StudentDetails
VALUES(3,'JONES')
INSERT INTO StudentDetails
VALUES(4,'MARTIN')
INSERT INTO StudentDetails
VALUES(5,'JAMES')
GO
StudentTotalMarks:
CREATE TABLE StudentTotalMarks
(
StudentID INTEGER REFERENCES StudentDetails,
StudentMarks INTEGER
)
GO
INSERT INTO StudentTotalMarks
VALUES(1,230)
INSERT INTO StudentTotalMarks
VALUES(2,255)
INSERT INTO StudentTotalMarks
VALUES(3,200)
GO

MERGE StudentTotalMarks AS stm
USING
(SELECT StudentID,StudentName FROM StudentDetails) AS sd
ON stm.StudentID = sd.StudentID
WHEN MATCHED AND stm.StudentMarks > 250 THEN DELETE
WHEN
MATCHED THEN UPDATE SET stm.StudentMarks = stm.StudentMarks + 25
WHEN NOT MATCHED THEN
INSERT
(StudentID,StudentMarks)
VALUES(sd.StudentID,25);
 
 

Thursday, May 23, 2013

Common Table Expressions

A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.
A CTE can be used to:
  • Create a recursive query.
  • Substitute for a view when the general use of a view is not required; that is, you do not have to store the definition in metadata.
  • Enable grouping by a column that is derived from a scalar subselect, or a function that is either not deterministic or has external access.
  • Reference the resulting table multiple times in the same statement.

    The basic syntax structure for a CTE is:

    WITH expression_name [ ( column_name [,...n] ) ]
    AS
    ( CTE_query_definition )

    The list of column names is optional only if distinct names for all resulting columns are supplied in the query definition.
    The statement to run the CTE is:
    SELECT <column_list>
    FROM expression_name;

    Example:------

    WITH cte_data(country,sal)
    AS
    (
    SELECT country,SUM(sal)
    FROM tbl_groupingSet
    GROUP BY country
    )

    SELECT * FROM cte_data

    CTE with multiple references

    ;WITH cte_data_1(country,sal)
    AS
    (
    SELECT country,SUM(sal)
    FROM tbl_groupingSet
    GROUP BY country
    ),
    cte_data_2(country,sal)
    AS
    (
    SELECT country,SUM(sal)
    FROM tbl_groupingSet
    GROUP BY country
    )
    SELECT * FROM cte_data_1 CROSS JOIN cte_data_2

    Bydefault Recursion level is 100

    CTEs recursion level crosses the limit then following error is thrown by SQL Server engine:
    Msg 530, Level 16, State 1, Line 11
    The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

    you can change the level by using the MAXRECURSION option/hint. The recursion level ranges from 0 and 32,767 

    DECLARE
        @startDate DATETIME,
        @endDate DATETIME

    SET @startDate = '11/10/1985'
    SET @endDate = '03/25/2012'

    ; WITH CTE AS (
        SELECT
            YEAR(@startDate) AS 'yr',
            MONTH(@startDate) AS 'mm',
            DATENAME(mm, @startDate) AS 'mon',
            DATEPART(d,@startDate) AS 'dd',
            @startDate 'new_date'
        UNION ALL
        SELECT
            YEAR(new_date) AS 'yr',
            MONTH(new_date) AS 'mm',
            DATENAME(mm, new_date) AS 'mon',
            DATEPART(d,@startDate) AS 'dd',
            DATEADD(d,1,new_date) 'new_date'
        FROM CTE
        WHERE new_date < @endDate
        )
    SELECT yr AS 'Year', mon AS 'Month', count(dd) AS 'Days'
    FROM CTE
    GROUP BY mon, yr, mm
    ORDER BY yr, mm
    OPTION (MAXRECURSION 32767) 
    Reverse a string without using TSQL’s REVERSE() function
    DECLARE @StringToReverse VARCHAR(55)
    SET @StringToReverse = 'Sandeep'
     
    ;WITH cte AS (
          SELECT @StringToReverse AS string, CAST('' AS VARCHAR(55)) AS revStr, LEN(@StringToReverse) AS ln
          UNION ALL
          SELECT SUBSTRING(string,0,ln) AS string, CAST(revStr + SUBSTRING(string,ln,1) AS VARCHAR(55)) AS revStr, ln-1 AS ln
          FROM cte
          WHERE ln >= 1)
    SELECT @StringToReverse AS String, revStr,ln
    FROM cte
    WHERE ln = 0
  •  
CTE /Temp Table/Table variable


 
CTE:(Common Table Expression)
A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.


Temp tables 
1. The table "exists" - that is, it's materialized as a table, at least in memory, which contains the result set and can be reused.
2. In some cases, performance can be improved or blocking reduced when you have to perform some elaborate transformation on the data - for example, if you want to fetch a 'snapshot' set of rows out of a base table that is busy, and then do some complicated calculation on that set, there can be less contention if you get the rows out of the base table and unlock it as quickly as possible, then do the work independently. In some cases the overhead of a real temp table is small relative to the advantage in concurrency.
Table Variables
 A table variable goes out of scope immediate after the batch ends -- just like regular variables. This means you don't have to explicitly drop them at the end of scripts.
Table variable and Temp table both resides in TempDB.
It is not required to drop Table variable after the usage.
  • Table variables can not have Non-Clustered Indexes
  • You can not create constraints in table variables
  • You can not create default values on table variable columns
  • Statistics can not be created against table variables

Advantages of derived tables: 
1. A derived table is part of a larger, single query, and will be optimized in the context of the rest of the query. This can be an advantage, if the query optimization helps performance (it usually does, with some exceptions). Example: if you populate a temp table, then consume the results in a second query, you are in effect tying the database engine to one execution method (run the first query in its entirety, save the whole result, run the second query) where with a derived table the optimizer might be able to find a faster execution method or access path.
2. A derived table only "exists" in terms of the query execution plan - it's purely a logical construct. There really is no table.