How to setup and use the AMR Tool in SQL Server 2014

The Analysis, Migrate and Report tool (AMR) is part of SQL Server Management Studio 2014 and it aims at helping you evaluate your workloads and how effective/time consuming would the transformation to in-memory be.

Under the covers the tool is nothing more than couple of reports in SSMS, integrated as part of Management Data Warehouse feature. So before you start using it, you must enable the MDW and start the collection jobs.

In SQL Server 2014 they are completely rewritten and now instead of having the well-known Server Activity, Disk Activity and Query Activity collections, we have Stored procedure analysis and Table usage analysis collectors:

AMR_01

Once this is implemented, the ARM reports are accessible through the MDW reports in SSMS:

AMR_02

Note: MDW is my Management Data Warehouse database.

Now let me tell you a bit more about the “tool” – there are three reports, that are useful and that aim helping you identify all tables and procedures you can plan transform in to in-memory OLTP.

The Transaction Performance Analysis Overview report:

AMR_1

This report is simply the starting point where you can start your exploring :)

 

The Stored Procedure analysis report:

AMR_2

This report shows you the most expensive stored procedures in terms of worker time and those are suggested to be the best candidates for migration to in-memory OLTP.

The Table Usage report:

AMR_3

This report is one of my favorite ones – it shows you graphically the best options for tables to transform to in-memory OLTP. On the left axis you have the “Gain” i.e. how much will you gain if you transform. On the bottom axis you have the work that needs to be done in order to transform. So – the best candidates for in-memory OLTP are the top-right ones!

The Table contention analysis report:

AMR_4

This one is basically the same as the previous one, but it shows you the tables that are mostly affected or part of contention. Again you have migration work and gain on the axis.

Well, that’s the AMR tool – nothing more than couple of reports and an activated MDW.

I am now really interested in the logic behind the reports and I’ll start investigating – especially how MS decided how much will it take to migrate a table/SP and what the gain would be. I suppose there are some magic numbers here as well :) But that is for another post ;-)

 

How to check if your databases are part of a replication

If you need to check if there is any database on your SQL instance, related to replication, you have couple of different options – from manually checking through SSMS to using different options with T-SQL. Now I will share a simple script that you can use if you need to quickly check if your databases are part of a replication.

The script itself is exposing information in sys.databases, so you will need proper permissions to do that (I guess you are a sysadmin already :) )

SELECT 
	name as [Database name],
	CASE is_published 
		WHEN 0 THEN 'No' 
		ELSE 'Yes' 
		END AS [Is Published],
	CASE is_merge_published 
		WHEN 0 THEN 'No' 
		ELSE 'Yes' 
		END AS [Is Merge Published],
	CASE is_distributor 
		WHEN 0 THEN 'No' 
		ELSE 'Yes' 
		END AS [Is Distributor],
	CASE is_subscribed 
		WHEN 0 THEN 'No' 
		ELSE 'Yes' 
		END AS [Is Subscribed]
FROM sys.databases
WHERE database_id > 4

Once you have the output you will have to dig up a little bit to see the details about the replication your database(s) is part of. The script is just pointing you the direction. Here is an output:

replication_script

 

Note that you have two different columns for the publishing part – one for merge and one for transactional/snapshot replication.

If you want to write your own version of the script, you can also use DATABASEPROPERTYEX function - http://technet.microsoft.com/en-us/library/ms186823.aspx

What are the minimum permissions for Activity Monitor

Activity monitor is something that I do not use very often – it is just easier for me to work with the DMVs and the rest of the tools out there that show you what really is happening on your server. Although I am not using it, there are DBAs and support guys, who count really much on the Activity Monitor in SQL Server Management Studio. And as every DBA should do, they are all trying to access it remotely instead of RDC-ing the database server with Terminal Services. So far so good. If you are a member of the Local Administrators on the DB Server and sysadmin on the particular instance you want to monitor – everything is just great – you just open a management studio, connect to the instance remotely and start your Activity Monitor. What are you going to do if you are not a Local Administrator and you are still a junior DBA, not blessed with sysadmin permissions over the SQL Server instances? Well here comes the interesting stuff. Here is how to enable Activity Monitor for remote connection for non-admins.  Read More →

Getting NTFS cluster size using Powershell

Hi folks,

These days I was working on an IO issue and I as every IO performance issue – I started with checking the basics. One of these is to verify that the NTFS cluster size of the partitions, holding the SQL Server data and log files are formatted with 64 KB (default for Windows is 4K so it is a step everyone should take prior deploy the SQL Server in production).

So far the way to check that (or the way I used to do it) was using chkdisk command. This is very tedious job, though. If you need to do it for 20 disks for example…So I needed something better. Checked google and found this very neat blog post, describing exactly what I was needed:

foreach ($drive in Get-PSDrive | where {$_.Provider.name -eq "FileSystem"}) {
      if ($drive.free -ne $null) {
            "$drive - $((get-ntfsinfo $drive.name).bytes_per_cluster)"
      }
}

The output is great, so if you need to do this – you can count on Powershell to help.

First impressions of Microsoft Data Explorer for Excel

Today, while I was going over my RSS feeds, I noticed that Microsoft released their Data Explorer as official Preview version. You can read the announcement here. There you can also find and a very slick video showing off the functionality of Data Explorer.

Download the tool from here.

I decided to play with it by myself and it was real fun, I have to say. So in short I installed the tool (it is embedded in Excel) and once you open MS Excel (I am running Office 2013) there is a new ribbon:

image

The sources you can use are basically the same as what you can use with Excel and PowerPivot, with one enormous exception – there is a driver for connecting to HDFS (Hadoop File System) – well that is now something you can explore Smile

Anyway, at that moment I do not have a “big data” setup I can work with, so I will just show you how quickly I managed to gather information on country population and alcohol consumption using public web data (I did not search about alcohol consumption, but for population density, however this data set popped up, so I decided it will be fun). So once you open the Data Explorer ribbon and hit Online Search (the leftmost button on the ribbon), you got a simple window where you just type what are you searching for and you get dozens of suggested datasets which you can use:

image

When you hover the mouse over the provided dataset, you get a preview of the data on the left of your screen and a little button “USE” just below the dataset description. If you hit it, the data will be embedded in a new sheet in Excel and you can do lots of things with it. What I want to show you is how I used two datasets and combined them to a new set, joining them on the fly and ultimately producing dataset for my particular reasons (without needing to VLOOKUP, import/export and so on).

So once you have the two sets of data in two different sheets (remember, the connection to the source of data is life), you can go to the Data Explorer Window and press the Merge button – it will show you the following window:

image

You have a two-part window – on the top you select the source dataset and on the bottom – the one that you wish to join to the source one. You just mark the columns using the mouse (use Ctrl for multiple selection) on both datasets (top and bottom). Internally it is building an LEFT OUTER JOIN and if your data does not match completely (just as mine) you will not get values from the bottom dataset. You can also see the statistics of how many of the rows you can have matched, which will further help you cleanse the data further. So here is what I had as final report:

image

Of course this is solely is not the entire functionality of Data Explorer, but it will get you to start testing and playing with by yourself.

Enjoy Winking smile

Monitoring Data Quality activities on your server

In this short post I will show you how to check for the activities run or being run on your Data Quality Server.

First you need to open the Data Quality Client, placed in Start menu –> All programs\Microsoft SQL Server 2012\Data Quality Services. Once you connect to the DQS server, go to the right-most panel (Administration) and click on Activity monitoring button:

image

A new window opens and then you can see a list of the activities that your DQS instance is currently busy (or not) with.

image

In the monitor you can find the following information:

  • Activities, that are performed or currently running on the system (including their start and end time, duration, type, which Data Quality database they have been run against)
  • Activity steps
  • Profiler output (what the activity did in terms of knowledge discovery, data cleansing, etc.)

SNAGHTML13f179b

How to enable PowerPivot and Power View in Excel 2013

I am working on a presentation for our newest local PASS BI Chapter (Bulgaria BI & .NET Geeks) and around the subject I am preparing there are a lot of questions that are popping up. So I decided to show you here how to do some of the stuff I will be presenting.

Before we start though, first we will have to enable PowerPivot and Power View in Microsoft Excel 2013. So now I am going to share exactly this.

So first you open a blank workbook in excel and go to File->Options

options

Once done, go to Ad-ins page, from the add-ins drop-down menu select COM addins and press GO.

add-options

 

After that there is a window opened where you can see all you COM add-ins (both enabled and disabled). So next you have to check PowerPivot and Power View boxes.

add-ins

 

Hit OK and you are done. How to tell that everything is alright? You should check in the INSERT ribbon if you have a Power View button and also if you have a PowerPivot ribbon on top:

 

 

 

powerpivot powerview

 

That is it. Now you are ready to use both add-ins and author reports you’ve never imagined you can do with Excel :)

Monitoring MDW purge job

During one of my latest investigations on SQL Server MDW, I had to learn a few things the hard way – there are several stored procedures, that run on the central MDW server, such as the sp_purge_data which are designed to give you a constant update on what are they doing, but you need to manually enable it. Most of the stored procedures have PRINT/RAISEERROR statements in their code and in the case of the purge SP, I wanted to see what is going on behind the scenes. The sp_purge_data procedure is called via a SQL Agent job, called mdw_purge_data. So in my case I was lucky – what I needed to do is to simply set an output of the job step (called Step 1) like this:

2

 

Then just save the job and run it – the log is immediately created and now you can see what is going on:

3

Fixing a long running purge in Management Data Warehouse

Recently I had a situation where one of the SQL Servers we are supporting had over 1 TB of tempdb log and after a short digging around, one of my colleagues noticed that this is caused by a long running purge in SQL Server MDW (Management Data Warehouse). The purge job (mdw_purge_data) was running for more than 15 days until we noticed the problem. It was quite strange to me and I started troubleshooting it. The purge job should be quite quick and as far as my expperience goes, it runs in several minutes tops. But this was not the case, though. So I looked first at what exactly the job does – it is a stored procedure in the MDW database – core.sp_purge_data.

Looking at the definition, the procedure was doing a lot of things and starting from the beginning would take me quite a lot of time and effort to troubleshoot. Luckily, the procedure was still running and I was able to trace where exactly the purging got stuck. Ultimately it was one of the stored procedures called from within the purge job - [core].[sp_purge_orphaned_notable_query_text]. Going further, this procedure’s definition is quite simple – it is deleting by default records from mdw database on batches of 500 records, using the following statement:

DELETE TOP (@delete_batch_size) snapshots.notable_query_text 
         FROM snapshots.notable_query_text AS qt, #tmp_notable_query_text AS tmp
        WHERE tmp.[sql_handle] = qt.[sql_handle]

OK, now I got it – the #tmp_notable_query_text should be somehow involved. I checked how this table is populated:

SELECT qt.[sql_handle]
INTO #tmp_notable_query_text
FROM snapshots.notable_query_text AS qt
WHERE NOT EXISTS (
   SELECT snapshot_id 
   FROM snapshots.query_stats AS qs
   WHERE qs.[sql_handle] = qt.[sql_handle])

Obviously this statement was causing me the headaches – I tried to execute the select statement to see how many records would it return. I have to say, that my impression so far was that everything around MDW – collection, display and purging is optimized and I would not be suffering from not optimized stored procedures/code. Well, it is not the case it seems – my select statement was not completing in more than 2 minutes, so I immediately canceled it and looked at the estimated execution plan. What worried me there is that there were two clustered index scans and the one on core.query_stats table was with relative cost of 99%. Looking at the table snapshot.query_stats, the records there were more than a million (the purge was not running for some days now) and the scan was not effective at all, especially when both of the default created indexes were clustered and included almost all of the columns in the tables. Based on the query I have, I went with creating non-clustered index on sql_handle column in both tables – snapshots.query_stats and snapshots.notable_query_text.

After doing that, I ran the select statement again and the result was returned immediately.  So I went and ran the purge procedure right away and waited for some minutes to complete. The results you can see for yourselves – last successful runs took almost 4 days each and the last one, which we cancelled was running for 17 days. After the optimization – the purge ran for 32 minutes and have been running since then for not more than minute.

1

So be careful with MDW and monitor the purge job :)

Edit 27.03.2013 – one of my colleagues just implemented the same NC index on another server and the purge job ran from 1 day and 3 hours to 36 seconds!! :-)

How to check if your database has been upgraded

A month ago I noticed a very neat post about finding if a given SQL Server database has been upgraded or not. The original post shows what to use and where to look for the info. In short, DBCC DBINFO reveals two fields – dbi_createversion and dbi_version, which can ultimately help us understanding if our database has been upgraded or not.

However, imagine if you need to do this for all your databases (usually a lot more than 5-6 :) ). This would not be an easy task, so I took the script I am using for checking the last successful run of DBCC (original post here) and modified it a bit, so now the output is a lot more readable.

DatabaseUpgradedOutput

The entire script you can find below:

CREATE TABLE #temp (
       Id INT IDENTITY(1,1), 
       ParentObject VARCHAR(255),
       [Object] VARCHAR(255),
       Field VARCHAR(255),
       [Value] VARCHAR(255)
)

INSERT INTO #temp
EXECUTE SP_MSFOREACHDB'DBCC DBINFO ( ''?'') WITH TABLERESULTS';

;WITH CHECKDB1 AS
(
    SELECT [Value],
ROW_NUMBER() OVER (ORDER BY ID) AS rn1 
FROM #temp WHERE Field IN ('dbi_dbname'))
    ,CHECKDB2 AS ( SELECT [Value], 
ROW_NUMBER() OVER (ORDER BY ID) AS rn2 
FROM #temp WHERE Field IN ('dbi_createVersion'))
	,CHECKDB3 AS ( SELECT [Value], 
ROW_NUMBER() OVER (ORDER BY ID) AS rn3 
FROM #temp WHERE Field IN ('dbi_version')
)      
SELECT CHECKDB1.Value AS DatabaseName
        , CHECKDB2.Value AS DatabaseCreateVersion
		, CHECKDB3.Value AS DatabaseCurrentVersion
		, CASE 
			WHEN CHECKDB2.Value < CHECKDB3.Value THEN 'Yes'
			ELSE 'No' 
			END as DatabaseUpgradedStatus
FROM CHECKDB1 
INNER JOIN CHECKDB2
	ON rn1 =rn2
INNER JOIN CHECKDB3
	ON rn2=rn3

DROP TABLE #temp

Probably you wonder why do you need to know this info – to be honest, so far I do not have  a reason besides that it is cool and geeky. Have fun!