Monitoring MDW purge job

During one of my latest investigations on SQL Server MDW, I had to learn a few things the hard way – there are several stored procedures, that run on the central MDW server, such as the sp_purge_data which are designed to give you a constant update on what are they doing, but you need to manually enable it. Most of the stored procedures have PRINT/RAISEERROR statements in their code and in the case of the purge SP, I wanted to see what is going on behind the scenes. The sp_purge_data procedure is called via a SQL Agent job, called mdw_purge_data. So in my case I was lucky – what I needed to do is to simply set an output of the job step (called Step 1) like this:

2

 

Then just save the job and run it – the log is immediately created and now you can see what is going on:

3

Fixing a long running purge in Management Data Warehouse

Recently I had a situation where one of the SQL Servers we are supporting had over 1 TB of tempdb log and after a short digging around, one of my colleagues noticed that this is caused by a long running purge in SQL Server MDW (Management Data Warehouse). The purge job (mdw_purge_data) was running for more than 15 days until we noticed the problem. It was quite strange to me and I started troubleshooting it. The purge job should be quite quick and as far as my expperience goes, it runs in several minutes tops. But this was not the case, though. So I looked first at what exactly the job does – it is a stored procedure in the MDW database – core.sp_purge_data.

Looking at the definition, the procedure was doing a lot of things and starting from the beginning would take me quite a lot of time and effort to troubleshoot. Luckily, the procedure was still running and I was able to trace where exactly the purging got stuck. Ultimately it was one of the stored procedures called from within the purge job – [core].[sp_purge_orphaned_notable_query_text]. Going further, this procedure’s definition is quite simple – it is deleting by default records from mdw database on batches of 500 records, using the following statement:

DELETE TOP (@delete_batch_size) snapshots.notable_query_text 
         FROM snapshots.notable_query_text AS qt, #tmp_notable_query_text AS tmp
        WHERE tmp.[sql_handle] = qt.[sql_handle]

OK, now I got it – the #tmp_notable_query_text should be somehow involved. I checked how this table is populated:

SELECT qt.[sql_handle]
INTO #tmp_notable_query_text
FROM snapshots.notable_query_text AS qt
WHERE NOT EXISTS (
   SELECT snapshot_id 
   FROM snapshots.query_stats AS qs
   WHERE qs.[sql_handle] = qt.[sql_handle])

Obviously this statement was causing me the headaches – I tried to execute the select statement to see how many records would it return. I have to say, that my impression so far was that everything around MDW – collection, display and purging is optimized and I would not be suffering from not optimized stored procedures/code. Well, it is not the case it seems – my select statement was not completing in more than 2 minutes, so I immediately canceled it and looked at the estimated execution plan. What worried me there is that there were two clustered index scans and the one on core.query_stats table was with relative cost of 99%. Looking at the table snapshot.query_stats, the records there were more than a million (the purge was not running for some days now) and the scan was not effective at all, especially when both of the default created indexes were clustered and included almost all of the columns in the tables. Based on the query I have, I went with creating non-clustered index on sql_handle column in both tables – snapshots.query_stats and snapshots.notable_query_text.

After doing that, I ran the select statement again and the result was returned immediately.  So I went and ran the purge procedure right away and waited for some minutes to complete. The results you can see for yourselves – last successful runs took almost 4 days each and the last one, which we cancelled was running for 17 days. After the optimization – the purge ran for 32 minutes and have been running since then for not more than minute.

1

So be careful with MDW and monitor the purge job :)

Edit 27.03.2013 – one of my colleagues just implemented the same NC index on another server and the purge job ran from 1 day and 3 hours to 36 seconds!! :-)

How to check if your database has been upgraded

A month ago I noticed a very neat post about finding if a given SQL Server database has been upgraded or not. The original post shows what to use and where to look for the info. In short, DBCC DBINFO reveals two fields – dbi_createversion and dbi_version, which can ultimately help us understanding if our database has been upgraded or not.

However, imagine if you need to do this for all your databases (usually a lot more than 5-6 :) ). This would not be an easy task, so I took the script I am using for checking the last successful run of DBCC (original post here) and modified it a bit, so now the output is a lot more readable.

DatabaseUpgradedOutput

The entire script you can find below:

CREATE TABLE #temp (
       Id INT IDENTITY(1,1), 
       ParentObject VARCHAR(255),
       [Object] VARCHAR(255),
       Field VARCHAR(255),
       [Value] VARCHAR(255)
)

INSERT INTO #temp
EXECUTE SP_MSFOREACHDB'DBCC DBINFO ( ''?'') WITH TABLERESULTS';

;WITH CHECKDB1 AS
(
    SELECT [Value],
ROW_NUMBER() OVER (ORDER BY ID) AS rn1 
FROM #temp WHERE Field IN ('dbi_dbname'))
    ,CHECKDB2 AS ( SELECT [Value], 
ROW_NUMBER() OVER (ORDER BY ID) AS rn2 
FROM #temp WHERE Field IN ('dbi_createVersion'))
	,CHECKDB3 AS ( SELECT [Value], 
ROW_NUMBER() OVER (ORDER BY ID) AS rn3 
FROM #temp WHERE Field IN ('dbi_version')
)      
SELECT CHECKDB1.Value AS DatabaseName
        , CHECKDB2.Value AS DatabaseCreateVersion
		, CHECKDB3.Value AS DatabaseCurrentVersion
		, CASE 
			WHEN CHECKDB2.Value < CHECKDB3.Value THEN 'Yes'
			ELSE 'No' 
			END as DatabaseUpgradedStatus
FROM CHECKDB1 
INNER JOIN CHECKDB2
	ON rn1 =rn2
INNER JOIN CHECKDB3
	ON rn2=rn3

DROP TABLE #temp

Probably you wonder why do you need to know this info – to be honest, so far I do not have  a reason besides that it is cool and geeky. Have fun!

How to check for active trace flags on multiple SQL Server instances

Following one of my old posts, I would like to share one new multiserver query. This script you can use with your CMS (Central Management Server) as multiserver query and check all active trace flags on all SQL Server instances.

DECLARE @TraceFlags table (TraceFlag smallint, status bit,global bit, session bit) 
insert into @TraceFlags execute('DBCC TRACESTATUS(-1)')
select * from @TraceFlags

Note that I am using table variable and not temporary table (my own preference :-) ) so if you need – you can easily transform the statement.

Change compatibility level on multiple databases and multiple instances

Following my previous post on mass change of PAGE_VERIFY option, I want to share one other script that you can use if you are using a Centralized Management Server and multi-server queries in order to change the compatibility level of multiple databases and instances.

With the below script you can change the compatibility level of all databases and in my example I am setting it to compatibility_level = 100

--mass change the compatibility level option for all instances and 
--databases that are not read only and have current comp. set to 80 or 90
EXEC
sp_MSforeachdb '
use [?]
if (select count(*) 
	from sys.databases 
	where compatibility_level <100 and 
	name=''?'' and database_id>4 and
	is_read_only = 0
)=1
begin
alter database [?] set compatibility_level = 100
PRINT ''compatibility changed to 100 for database = '' + DB_NAME()
end'

 

How to change PAGE_VERIFY on multiple databases and multiple instances

In the recent days I wanted to change the PAGE_VERIFY option on all databases and make sure I am using CHECKSUM everywhere. First question I guess would be “Why?”. The answer lies here.

The article gives a very good example on how to change the PAGE_VERIFY for all databases on one instance, but how do we do that when we use a CMS and multi-server queries? I managed to write this simple statement, that is doing the work. It needs a lot of polishing though (for example narrowing down the list of databases, where we do not need the page verification mechanism). But anyway, it is really good point for starting the task.

--mass change the PAGE_VERIFY option for all instances and databases 
--that are not read only and have current page_verify = NONE or TORN_PAGE
Exec
sp_MSforeachdb '
use [?]
if (select count(*) from sys.databases where page_verify_option <2 and name=''?'' and database_id>4 and is_read_only = 0)=1
begin
alter database [?] set page_verify CHECKSUM WITH NO_WAIT
PRINT '': PAGE_VERIFY changed to CHECKSUM for database = '' + DB_NAME()
end'

 

Disabling clustered index blocks access to underlaying table

Did you know that disabling a clustered index actually blocks the access to the underlying table? Below is a script you can use to test it, if you do not believe me :)

--First create the test table with a clustered index

USE [FTest]
GO

/****** Object:  Table [dbo].[FTable]    Script Date: 11/21/2012 2:02:56 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[FTable](
	[FTableID] [int] IDENTITY(1,1) NOT NULL,
	[LargeString] [nvarchar](600) NULL,
	[LargeInt] [bigint] NULL,
 CONSTRAINT [clIX_PK_FTable_FTableID] PRIMARY KEY CLUSTERED 
	(
		[FTableID] ASC
	) ON [PRIMARY]
) ON [PRIMARY]

GO

--now test the insert of one row

INSERT INTO [dbo].[FTable] ([LargeString],[LargeInt]) VALUES('This is a test row','123')
GO

--disable the clustered index

ALTER INDEX [clIX_PK_FTable_FTableID] 
ON [dbo].[FTable] DISABLE
GO

--try inserting one more row. Note the error: "The query processor 
--is unable to produce a plan because the index 'clIX_PK_FTable_FTableID' 
--on table or view 'FTable' is disabled." 

INSERT INTO [dbo].[FTable] ([LargeString],[LargeInt]) VALUES('This is a test row','123')
GO

--rebuild the clustered index so the table to be accessible again
ALTER INDEX [clIX_PK_FTable_FTableID] ON [dbo].[FTable] REBUILD PARTITION = ALL 
GO

--insert one more row to validate the access

INSERT INTO [dbo].[FTable] ([LargeString],[LargeInt]) VALUES('This is a test row','123')
GO

If you go and try to disable the clustered index from SSMS you even got a nice warning message:

Probably you are wondering why is that – well, as you know, the clustered index is actually containing the table data in its leaf level, so if you disable it, the SQL Server Query Optimizer will not be able to use this very index for creating a plan to access the data, which ultimately prevents the access to the table itself.

 

Enable additional logging for SQL Server management data warehouse collection sets

Today I finally found some spare time to dig into some SQL Server management data warehouse issues we were experiencing. The collection jobs failed for no visible reason and with (believe me) no meaningful error in the logs. So I wanted to enable some additional logging and while searching, I found this (I lost the link, so I cannot reference it :\ … )

If you want to enable additional logging for any of the collection sets you are running, you can use the below statement.

update msdb.dbo.syscollector_collection_sets
set logging level = 2
where collection_set_id = <set id>

If you are wondering what set_id to use, the default ones are:

  • Server Activity – 1
  • Disk Activity – 2
  • Query Activity – 3
  • Utility information – 4

The rest can be discovered using the msdb.dbo.syscollector_collection_sets view.

The difference in logging is that actually if you enable advanced logging, you will be able to see the output of each single component of the SSIS package, used to run the upload jobs for example.

BEFORE:

AFTER:

 

SQL Server 2012 Virtual labs

If you do not have your own lab and you want to access SQL Server 2012 state-of-the art features (such as AlwaysOn), you can easily use Microsoft Virtual Labs. I have to say it is really convenient :) And slick :) I have never used it before, but I am just taking the AlwaysOn lab and I got 3 VMs for a period of 150 minutes, Lab scenarios and additional materials.

Generate script for mass setting database in read only mode

The following script will help you to generate a script which you can use for setting all databases (or a certain list) in READ_ONLY mode. Yes, this is quite a trivial task, however directly setting the database read only does not work, as there are user connections, which are blocking the operation. So I am using the following logic – set the database to single user, make it read only and then alter it once again for multi user access. Just have in mind that when using the script, just get the Result as text (Ctrl+T). You should also be careful with the distribution(s) databases you have on the instance, so you do not set them in read only state.

select 
'ALTER DATABASE [' + name + '] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
ALTER DATABASE [' + name + '] SET READ_ONLY WITH NO_WAIT
GO
ALTER DATABASE [' + name + '] SET MULTI_USER
GO'
from sys.databases
where database_id > 4