Friday 3 August 2018

SQL Server Index Architecture

SQL server Indexes are implemented as type of BALANCED TREE, or B-tree (absolute difference between heights of left sub-tree and right sub-tree is not greater than 1) structure similar more concepts tree, the B-tree diagram consists of roots non-leaf level and leaf-level, however in the B-tree diagram, the tree analogy is inverted, indexes have a single root page.


SQL server always uses this (root page) as starting point for traversing an index. In an Index tree all the index levels above the leaf-levels include in the root known as non-leaf levels. The leaf-level is the bottom level of index structure. It contains the key value index entries either reference rows in the data pages or complete data rows.


SQL server supports two types of indexes, clustered and non-clustered, both types of the indexes have similar B-tree structures. In a clustered index the leaf-level is the actual data page, for a table with clustered index the data is physically stored on a data page in ascending order, the order of the value is in the index page is also ascending.


SQL server internally maintains uniqueness of key values for a clustered index, even if the column data is not unique. It does this automatically by generating the value that stored with the duplicate key value. For example, in the first instance of “Jones” is added, all key values are still unique, when subsequent values of “Jones” are added, SQL server generates an internal number that system uses to maintains uniqueness among these duplicate keys, this enables SQL server to differentiate between all instances of “Jones”.


Table:  member
Column Name
Data Type
member_no
Int
Lastname
Varchar(50)
Firstname
Varchar(50)
Salary
Decimal(10,2)

Let us take a close look at how the SQL server navigates a clustered index, in this example a clustered index is build on lastname column, the query, searching for rows where lastname equal to “Rudd”.
CREATE TABLE [dbo].[member](
       [member_no] [int] IDENTITY(1,1) NOT NULL,
       [lastname] [varchar](50) NULL,
       [firstname] [varchar](50) NULL,
      [salary] [decimal](10,2) NULL
)
GO   

CREATE CLUSTERED INDEX [IX_member_lastname] ON [member]
(
       [lastname] ASC
)
WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
       ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
SELECT lastname, firstname FROM member WHERE lastname = 'Rudd'




SQL Server starts at the root page, the value with “Rudd” >= the key value “Akthar”, the comparison of value is true, and sql server moves the next key value, SQL server value with “Rudd” is >= the key value “Martin” the comparison of value which is true and this(Martin) is the last key value on the page.





SQL server moves to the page at next level where the key value “Martin” points, SQL server a value with “Rudd” is >= the key value “Martin”, the comparison of value which is true and SQL server moves the next key value, SQL server a value with “Rudd” is >= the key value “Smith”, the comparison of value which is false.





SQL server uses the previous key value “Martin” and moves to the page at next level where the key value “Martin” points, SQL server reads through the data page until it finds “Rudd” and then returns to the query processors.





Now let us take a look at non-clustered indexes, non-clustered indexes have the same B-tree structure as clustered indexes with one significant difference, the leaf-level of a non-clustered index contains key values not the actual data. These key values map to pointers or clustering keys the locate rows in the data pages,





the implementation of a non-clustered index depends on whether the data pages of a table are managed as a heap or as a clustered index, in a non-clustered index built on a heap SQL server uses pointers in a leaf-level index pages the pointer row in the data pages, if you have a table with clustered index, SQL server builds the non-clustered indexes on top of the clustered index structure, in a non-clustered index built on a clustered index, SQL server uses clustering keys in the leaf-level pages of the non-clustered indexes to point to the clustered index



Now let us take a closer look at how SQL server navigates on non-clustered index on a heap, in this example the non-clustered index has been created on the lastname column. The query is searching for rows where lastname =”Rudd” from member table
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[member]') AND type in (N'U'))
       DROP TABLE [dbo].[member]
GO

CREATE TABLE [dbo].[member](
       [member_no] [int] IDENTITY(1,1) NOT NULL,
       [lastname] [varchar](50) NULL,
       [firstname] [varchar](50) NULL,
       [salary] [decimal](10,2) NULL
)
GO   

CREATE NONCLUSTERED INDEX [IX_member_lastname] ON [member]
(
       [lastname] ASC
)
WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
       ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
GO


SELECT lastname, firstname FROM member WHERE lastname = 'Rudd'





SQL server starts at the root level of the non-clustered index, the value with “Rudd” >= the key value “Akthar”, the comparison of value is true, and SQL server moves the next key value, SQL server value with “Rudd” is >= the key value “Martin” the comparison of value which is true and Martin is the last key value on the page and the value with true as the result, SQL server moves to the page at next level where the key value “Martin” points,  





SQL server a value with whether “Rudd” is >= the key value “Martin”, the comparison of value which is true and SQL server moves the next key value, SQL server a value with whether “Rudd” is >= the key value “Smith”, the comparison of value which is false.





 SQL server uses the previous key value “Martin” and moves to the page at next level where the key value “Martin” points, SQL server reads through the leaf page to find “Rudd” and it uses the pointers to locate the data page and fetch the key value is = “Rudd”, SQL server returns to this row to the query processors.



Now let us take a look at how SQL server navigates a non-clustered index using the clustering key, in this example, a non-clustered index has been created on member_no column, the table has a clustered index to find on the lastname column, the query is searching for rows the member_no is = 6078,
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[member]') AND type in (N'U'))
       DROP TABLE [dbo].[member]
GO

CREATE TABLE [dbo].[member](
       [member_no] [int] IDENTITY(1,1) NOT NULL,
       [lastname] [varchar](50) NOT NULL,
       [firstname] [varchar](50) NULL,
       [salary] [decimal](10,2) NULL
)
GO   

CREATE CLUSTERED INDEX [IX_member_lastname] ON [member]
(
       [lastname] ASC
)
WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
       ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
GO

/*
ALTER TABLE [member] ADD CONSTRAINT [PK_member_lastname] PRIMARY KEY --- > CLUSTERED
(
       [lastname] ASC
      
)WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF,  
       ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
*/

CREATE NONCLUSTERED INDEX [IX_member_member_no] ON [member]
(
       [member_no] ASC
)
WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
       ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
GO

SELECT lastname, firstname FROM member WHERE member_no = 6078
SQL server starts at root level of the non-clusterd index the value with whether 6078 is >= 1234, the comparison is true, then SQL server moves to the next key value, SQL server the value with whether 6078 is >= the key value 5678, the comparison is true, 5678 is the last key value on the page and value with true as the result. 
SQL server moves to the next levelwhere the key value 5678 points, SQL server evaluates whether 6078 is >= the key value 5678, the comparison is true, and SQL server moves to the next key value, SQL server evaluates the key value whether 6078 is >= 7678, the comparison is false, SQL server uses the previous key value 5678, and moves to the page at the next level where the key value 5678 points, SQL server reads through the leaf-level pages until it finds the key value 6078, this key value maps to the clustering key value “Rudd”, SQL server uses this value (“Rudd”) to navigate through the clustered index from the root level to retrieve the row in which “Rudd” is the lastname, 



In summary the leaf-level of a clustered index is the actual data page, data is physically stored on a data page in ascending order, the leaf-level of a non-clustered index contains key values not the actual data, these key values map to pointers or clustering key that locates rows in data pages.





CLUSTERD INDEX vs NONCLUSTERD INDEX

What are the differences between CLUSTERD INDEX and NONCLUSTERD INDEX [CLUSTERD INDEX vs NONCLUSTERD INDEX


CLUSTERED INDEX
NONCLUSTERED INDEX
1
The leaf-level of a CLUSTERED INDEX is the actual data page; data is physically stored on a data page in ascending order.
The leaf-level of a NONCLUSTERED INDEX contains key values not the actual data, these key values map to pointers or clustering key that locates rows in data pages.
2
Table can have only one CLUSTERED INDEX.
Table can have more than one NONCLUSTERED INDEX.
SQL server 2008 and later versions, Maximum 999 Non Clustered Indexes can be created on a table.
3
When you create a PRIMARY KEY on a table, SQL server creates a CLUSTERED INDEX automatically on the table unless specified “CLUSTERED or NON CLUSTERED”

NOTE: Create the PRIMARY Key column with CLUSTERED INDEX  by specifying “CLUSTERED”)

EXAMPLE:

CREATE TABLE [dbo].[member](
[member_no] [int] IDENTITY(1,1) NOT NULL,
[lastname] [varchar](50) NOT NULL,
[firstname] [varchar](50) NULL,
[salary] [decimal](10,2) NULL
)
GO   


ALTER TABLE [member] ADD CONSTRAINT [PK_member_lastname] PRIMARY KEY --NONCLUSTERED   
(
[lastname] ASC
)WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF,ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
When you create a PRIMARY KEY on table, SQL server creates a NONCLUSTERED INDEX automatically by specifying “NONCLUSTERED”, it is not recommended.





EXAMPLE:

CREATE TABLE [dbo].[member](
[member_no] [int] IDENTITY(1,1) NOT NULL,
[lastname] [varchar](50) NOT NULL,
[firstname] [varchar](50) NULL,
[salary] [decimal](10,2) NULL
)
GO   


ALTER TABLE [member] ADD CONSTRAINT [PK_member_lastname] PRIMARY KEY NONCLUSTERED
(
[lastname] ASC
)WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF,ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
4
Data retrieval is faster than NONCLUSTERED INDEX,because the leaf level is the bottom level of CLUSTERED INDEX contains the data pages or complete data rows.
Data retrieval is slower than CLUSTERED INDEX, because the leaf-level is the bottom level of a NONCLUSTERED INDEX contains key values not the actual data. These key values map to pointers or clustering keys the locate rows in the data pages.
5
Do not need extra space to store logical structure.

CLUSTERED INDEX stores the base table data in same physical order as index’s logical order, so it does not require additional storage space.
Use extra space to store logical structure.


In a NONCLUSTERED INDEX, the index is stored in a separate location which requires additional storage space.


What are the differences between Table Scan, Index Scan and Index Seek?

TABLE SCAN:

The SQL Server optimizer's job is to choose the best way to execute a query. The optimizer uses indexes to improve query execution time. When you query a table that doesn't have indexes, or if the optimizer decides not to use an existing index or indexes, the system performs a table scan.

In a table scan, SQL Server sequentially (row by row) reads the table's data pages to find rows that belong to the result set.

EXAMPLE:

-- Create a Table with out Index
CREATE TABLE [dbo].[ScanSeek](
        [ScanSeekID] [int] IDENTITY(1,1) NOT NULL,
        [ScanSeekValue] [varchar](100) NULL
)
GO

-- Insert 1 Lac Records to ScanSeek table.
INSERT INTO ScanSeek (ScanSeekValue)
VALUES (CONVERT (VARCHAR(100), (SELECT ISNULL(MAX(ScanSeekID),0) + 1  FROM [ScanSeek] WITH (NOLOCK))))
GO 100000

-- Select all the records from ScanSeek table.
SELECT [ScanSeekID], [ScanSeekValue]
FROM [dbo].[ScanSeek] WITH (NOLOCK)
GO

-- Select a record from ScanSeek table where ScanSeekID = 100.
SELECT [ScanSeekID], [ScanSeekValue]
FROM [dbo].[ScanSeek] WITH (NOLOCK)
WHERE [ScanSeekID] = 100
GO






From the above two select queries, either to fetch all the records from table OR to fetch only one the matched record from table where ScanID=100, while seeing the execution plan the Cost is 100%, so it is obviouly bottleneck for the performance,

From the second SELECT query, the query optimizer used the table scan which means it reads/scans 100000 (1Lac) records sequentially row by row of the table for fetching the matched row.

How to overcome this issue OR to fetch only the matched records from table?
Adding/creating appropriate indices on table that helps for retriving the data fast.



INDEX SCAN:

An index scan is, scan the data or index pages to find the appropriate records, an index scan retrieves all the rows from the table, it means that all the leaf-level of the index was searched to find the information for the query,

There are two types of Index Scan.

  • Index Scan on Clustered Index
  • Index Scan On Non-Clustred Index

1.     Index Scan on Clustered Index -  In a table, when the index is a clustered index and search key column is not indexed, in this case, SQL server navigates from root level to leaf-level (data pages)  until the data match, which means that  scanning the data page by using the clustring key value as a pointer/reference.


-- Create a Table without Index
CREATE TABLE [dbo].[ScanSeek](
    [ScanSeekID] [int] IDENTITY(1,1) NOT NULL,
    [ScanSeekValue] [varchar](100) NULL
)
GO

-- Insert 1 Lac Records to ScanSeek table.
INSERT INTO ScanSeek (ScanSeekValue)
VALUES (CONVERT (VARCHAR(100), (SELECT ISNULL(MAX(ScanSeekID),0) + 1 
FROM [ScanSeek] WITH (NOLOCK))))
GO 100000

-- Create a Clustered index on ScanSeek table.
CREATE CLUSTERED INDEX [PK_ScanSeek_ScanSeekID] ON ScanSeek(ScanSeekID)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

-- Select all the records from ScanSeek table.
SELECT [ScanSeekID], [ScanSeekValue]
FROM [dbo].[ScanSeek] WITH (NOLOCK)
GO

-- Select a record from ScanSeek table where ScanSeekValue = 100.
SELECT [ScanSeekID], [ScanSeekValue]
FROM [dbo].[ScanSeek] WITH (NOLOCK)
WHERE ScanSeekValue = '100'
GO








From the above two select queries, either to fetch all the records from table OR to fetch only one the matched record from table where ScanID=100, while seeing the execution plan the Cost is 100%, search column [ScanSeekValue ] is not Indexed on table,

From the both SELECT queries, the query optimizer used the clustered Index scan which means retrieving the data by just using the index tree from root level to leaf-level (data pages) .

2.     Index Scan On Non-Clustred Index – a table has a cluster and  a non-clustered index and search key column is not specifed, retrieving the data by just using the index tree from root level to leaf-level pages (key values not the actual data), which means that  scanning the leaf-level pages by using the non-clustring key value as a pointer to the data pages/complete rows.

A non-cluster index built on top of the cluster index structure. SQL server uses the clustring keys in the leaf-level pages of the non-cluster index to point to the cluster index.



-- Create a Table without Index
CREATE TABLE [dbo].[ScanSeek](
    [ScanSeekID] [int] IDENTITY(1,1) NOT NULL,
    [ScanSeekValue] [varchar](100) NULL
)
GO

-- Insert 1 Lac Records to ScanSeek table.
INSERT INTO ScanSeek (ScanSeekValue)
VALUES (CONVERT (VARCHAR(100), (SELECT ISNULL(MAX(ScanSeekID),0) + 1  FROM [ScanSeek] WITH (NOLOCK))))
GO 100000

-- Create a Clustered index on ScanSeek table.
CREATE CLUSTERED INDEX [PK_ScanSeek_ScanSeekID] ON ScanSeek(ScanSeekID)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

-- Create a Non Clustered index on ScanSeek table.
CREATE NONCLUSTERED INDEX [IX_ScanSeek_ScanSeekValue] ON ScanSeek(ScanSeekValue)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

-- Select all the records from ScanSeek table.
SELECT [ScanSeekID], [ScanSeekValue]
FROM [dbo].[ScanSeek] WITH (NOLOCK)
GO





The Clustered Index Scan operator scans the clustered index specified in the Argument column of the query execution plan. When an optional WHERE:() predicate is present, only those rows that satisfy the predicate are returned. If the Argument column contains the ORDER clause, the query processor has requested that the output of the rows be returned in the order in which the clustered index has sorted it. If the ORDER clause is not present, the storage engine scans the index in the optimal way, without necessarily sorting the output.

INDEX SEEK:

Seek uses the index to pinpoint the records that are needed to satisfy the query.

There are two types of Index Seek.

1.     Index Seek on Clustered Index -  a table has clustered index, retrieving the data by using cluster index column, In Index tree, SQL server navigates from root level to leaf-level (data pages)  until the data match, which means that  pointing the data pages by using the clustring key value as a pointer/reference.

-- Create a Table without Index
CREATE TABLE [dbo].[ScanSeek](
   [ScanSeekID] [int] IDENTITY(1,1) NOT NULL,
   [ScanSeekValue] [varchar](100) NULL
)
GO

-- Insert 1 Lac Records to ScanSeek table.
INSERT INTO ScanSeek (ScanSeekValue)
VALUES (CONVERT (VARCHAR(100), (SELECT ISNULL(MAX(ScanSeekID),0) + 1  FROM [ScanSeek] WITH (NOLOCK))))
GO 100000

-- Create a Clustered index on ScanSeek table.
CREATE CLUSTERED INDEX [PK_ScanSeek_ScanSeekID] ON ScanSeek(ScanSeekID)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)


-- Select a record from ScanSeek where ScanSeekID = 100.
SELECT [ScanSeekID], [ScanSeekValue]
FROM [dbo].[ScanSeek] WITH (NOLOCK)
WHERE [ScanSeekID] = 100
GO







2.     Index Seek on Non-Clustered Index -  a table has clustered index and non cluster index, retrieving the data by using non-cluster index column, In Index tree, SQL server navigates from root level to data pages until the data match, which means, a non-cluster index built on top of the cluster index structure. SQL server uses the clustring keys in the leaf-level pages (key values not the actual data)  of the non-cluster index to point to the data pages.


-- Create a Table without Index
CREATE TABLE [dbo].[ScanSeek](
    [ScanSeekID] [int] IDENTITY(1,1) NOT NULL,
    [ScanSeekValue] [varchar](100) NULL
)
GO

-- Insert 1 Lac Records to ScanSeek table.
INSERT INTO ScanSeek (ScanSeekValue)
VALUES (CONVERT (VARCHAR(100), (SELECT ISNULL(MAX(ScanSeekID),0) + 1  FROM [ScanSeek] WITH (NOLOCK))))
GO 100000

-- Create a Clustered index on ScanSeek table.
CREATE CLUSTERED INDEX [PK_ScanSeek_ScanSeekID] ON ScanSeek(ScanSeekID)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

-- Create a Non Clustered index on ScanSeek table.
CREATE NONCLUSTERED INDEX [IX_ScanSeek_ScanSeekValue] ON ScanSeek(ScanSeekValue)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
-- Select a record from ScanSeek where ScanSeekID = 100.
SELECT [ScanSeekID], [ScanSeekValue]
FROM [dbo].[ScanSeek] WITH (NOLOCK)
WHERE [ScanSeekID] = 100
GO
-- Select a record from ScanSeek table where ScanSeekValue = 100.
SELECT [ScanSeekID], [ScanSeekValue]
FROM [dbo].[ScanSeek] WITH (NOLOCK)
WHERE ScanSeekValue = '100'
GO






LOOKUP HEAP:

A table has only non cluster index, this case SQL server manages the table as Heap,  retrieving the data by using non-cluster index column, In Index tree, SQL server navigates from root level to data pages until the data match, which means, SQL server uses the leaf-level pages (key values not the actual data)  of the non-cluster index to point to the Heap.

-- Create a Table without Index
CREATE TABLE [dbo].[ScanSeek](
    [ScanSeekID] [int] IDENTITY(1,1) NOT NULL,
    [ScanSeekValue] [varchar](100) NULL
)
GO

-- Insert 1 Lac Records to ScanSeek table.
INSERT INTO ScanSeek (ScanSeekValue)
VALUES (CONVERT (VARCHAR(100), (SELECT ISNULL(MAX(ScanSeekID),0) + 1  FROM [ScanSeek] WITH (NOLOCK))))
GO 100000

-- Create a Non Clustered index on ScanSeek table.
CREATE NONCLUSTERED INDEX [IX_ScanSeek_ScanSeekValue] ON ScanSeek(ScanSeekValue)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

-- Select a record from ScanSeek where ScanSeekID = 100.
SELECT [ScanSeekID], [ScanSeekValue]
FROM [dbo].[ScanSeek] WITH (NOLOCK)
WHERE [ScanSeekID] = 100
GO
-- Select a record from ScanSeek table where ScanSeekValue = 100.
SELECT [ScanSeekID], [ScanSeekValue]
FROM [dbo].[ScanSeek] WITH (NOLOCK)
WHERE ScanSeekValue = '100'