First impressions of Microsoft Data Explorer for Excel

Today, while I was going over my RSS feeds, I noticed that Microsoft released their Data Explorer as official Preview version. You can read the announcement here. There you can also find and a very slick video showing off the functionality of Data Explorer.

Download the tool from here.

I decided to play with it by myself and it was real fun, I have to say. So in short I installed the tool (it is embedded in Excel) and once you open MS Excel (I am running Office 2013) there is a new ribbon:

image

The sources you can use are basically the same as what you can use with Excel and PowerPivot, with one enormous exception – there is a driver for connecting to HDFS (Hadoop File System) – well that is now something you can explore Smile

Anyway, at that moment I do not have a “big data” setup I can work with, so I will just show you how quickly I managed to gather information on country population and alcohol consumption using public web data (I did not search about alcohol consumption, but for population density, however this data set popped up, so I decided it will be fun). So once you open the Data Explorer ribbon and hit Online Search (the leftmost button on the ribbon), you got a simple window where you just type what are you searching for and you get dozens of suggested datasets which you can use:

image

When you hover the mouse over the provided dataset, you get a preview of the data on the left of your screen and a little button “USE” just below the dataset description. If you hit it, the data will be embedded in a new sheet in Excel and you can do lots of things with it. What I want to show you is how I used two datasets and combined them to a new set, joining them on the fly and ultimately producing dataset for my particular reasons (without needing to VLOOKUP, import/export and so on).

So once you have the two sets of data in two different sheets (remember, the connection to the source of data is life), you can go to the Data Explorer Window and press the Merge button – it will show you the following window:

image

You have a two-part window – on the top you select the source dataset and on the bottom – the one that you wish to join to the source one. You just mark the columns using the mouse (use Ctrl for multiple selection) on both datasets (top and bottom). Internally it is building an LEFT OUTER JOIN and if your data does not match completely (just as mine) you will not get values from the bottom dataset. You can also see the statistics of how many of the rows you can have matched, which will further help you cleanse the data further. So here is what I had as final report:

image

Of course this is solely is not the entire functionality of Data Explorer, but it will get you to start testing and playing with by yourself.

Enjoy Winking smile

Monitoring Data Quality activities on your server

In this short post I will show you how to check for the activities run or being run on your Data Quality Server.

First you need to open the Data Quality Client, placed in Start menu –> All programs\Microsoft SQL Server 2012\Data Quality Services. Once you connect to the DQS server, go to the right-most panel (Administration) and click on Activity monitoring button:

image

A new window opens and then you can see a list of the activities that your DQS instance is currently busy (or not) with.

image

In the monitor you can find the following information:

  • Activities, that are performed or currently running on the system (including their start and end time, duration, type, which Data Quality database they have been run against)
  • Activity steps
  • Profiler output (what the activity did in terms of knowledge discovery, data cleansing, etc.)

SNAGHTML13f179b

How to enable PowerPivot and Power View in Excel 2013

I am working on a presentation for our newest local PASS BI Chapter (Bulgaria BI & .NET Geeks) and around the subject I am preparing there are a lot of questions that are popping up. So I decided to show you here how to do some of the stuff I will be presenting.

Before we start though, first we will have to enable PowerPivot and Power View in Microsoft Excel 2013. So now I am going to share exactly this.

So first you open a blank workbook in excel and go to File->Options

options

Once done, go to Ad-ins page, from the add-ins drop-down menu select COM addins and press GO.

add-options

 

After that there is a window opened where you can see all you COM add-ins (both enabled and disabled). So next you have to check PowerPivot and Power View boxes.

add-ins

 

Hit OK and you are done. How to tell that everything is alright? You should check in the INSERT ribbon if you have a Power View button and also if you have a PowerPivot ribbon on top:

 

 

 

powerpivot powerview

 

That is it. Now you are ready to use both add-ins and author reports you’ve never imagined you can do with Excel :)

Monitoring MDW purge job

During one of my latest investigations on SQL Server MDW, I had to learn a few things the hard way – there are several stored procedures, that run on the central MDW server, such as the sp_purge_data which are designed to give you a constant update on what are they doing, but you need to manually enable it. Most of the stored procedures have PRINT/RAISEERROR statements in their code and in the case of the purge SP, I wanted to see what is going on behind the scenes. The sp_purge_data procedure is called via a SQL Agent job, called mdw_purge_data. So in my case I was lucky – what I needed to do is to simply set an output of the job step (called Step 1) like this:

2

 

Then just save the job and run it – the log is immediately created and now you can see what is going on:

3

Fixing a long running purge in Management Data Warehouse

Recently I had a situation where one of the SQL Servers we are supporting had over 1 TB of tempdb log and after a short digging around, one of my colleagues noticed that this is caused by a long running purge in SQL Server MDW (Management Data Warehouse). The purge job (mdw_purge_data) was running for more than 15 days until we noticed the problem. It was quite strange to me and I started troubleshooting it. The purge job should be quite quick and as far as my expperience goes, it runs in several minutes tops. But this was not the case, though. So I looked first at what exactly the job does – it is a stored procedure in the MDW database – core.sp_purge_data.

Looking at the definition, the procedure was doing a lot of things and starting from the beginning would take me quite a lot of time and effort to troubleshoot. Luckily, the procedure was still running and I was able to trace where exactly the purging got stuck. Ultimately it was one of the stored procedures called from within the purge job - [core].[sp_purge_orphaned_notable_query_text]. Going further, this procedure’s definition is quite simple – it is deleting by default records from mdw database on batches of 500 records, using the following statement:

DELETE TOP (@delete_batch_size) snapshots.notable_query_text 
         FROM snapshots.notable_query_text AS qt, #tmp_notable_query_text AS tmp
        WHERE tmp.[sql_handle] = qt.[sql_handle]

OK, now I got it – the #tmp_notable_query_text should be somehow involved. I checked how this table is populated:

SELECT qt.[sql_handle]
INTO #tmp_notable_query_text
FROM snapshots.notable_query_text AS qt
WHERE NOT EXISTS (
   SELECT snapshot_id 
   FROM snapshots.query_stats AS qs
   WHERE qs.[sql_handle] = qt.[sql_handle])

Obviously this statement was causing me the headaches – I tried to execute the select statement to see how many records would it return. I have to say, that my impression so far was that everything around MDW – collection, display and purging is optimized and I would not be suffering from not optimized stored procedures/code. Well, it is not the case it seems – my select statement was not completing in more than 2 minutes, so I immediately canceled it and looked at the estimated execution plan. What worried me there is that there were two clustered index scans and the one on core.query_stats table was with relative cost of 99%. Looking at the table snapshot.query_stats, the records there were more than a million (the purge was not running for some days now) and the scan was not effective at all, especially when both of the default created indexes were clustered and included almost all of the columns in the tables. Based on the query I have, I went with creating non-clustered index on sql_handle column in both tables – snapshots.query_stats and snapshots.notable_query_text.

After doing that, I ran the select statement again and the result was returned immediately.  So I went and ran the purge procedure right away and waited for some minutes to complete. The results you can see for yourselves – last successful runs took almost 4 days each and the last one, which we cancelled was running for 17 days. After the optimization – the purge ran for 32 minutes and have been running since then for not more than minute.

1

So be careful with MDW and monitor the purge job :)

Edit 27.03.2013 – one of my colleagues just implemented the same NC index on another server and the purge job ran from 1 day and 3 hours to 36 seconds!! :-)

How to check if your database has been upgraded

A month ago I noticed a very neat post about finding if a given SQL Server database has been upgraded or not. The original post shows what to use and where to look for the info. In short, DBCC DBINFO reveals two fields – dbi_createversion and dbi_version, which can ultimately help us understanding if our database has been upgraded or not.

However, imagine if you need to do this for all your databases (usually a lot more than 5-6 :) ). This would not be an easy task, so I took the script I am using for checking the last successful run of DBCC (original post here) and modified it a bit, so now the output is a lot more readable.

DatabaseUpgradedOutput

The entire script you can find below:

CREATE TABLE #temp (
       Id INT IDENTITY(1,1), 
       ParentObject VARCHAR(255),
       [Object] VARCHAR(255),
       Field VARCHAR(255),
       [Value] VARCHAR(255)
)

INSERT INTO #temp
EXECUTE SP_MSFOREACHDB'DBCC DBINFO ( ''?'') WITH TABLERESULTS';

;WITH CHECKDB1 AS
(
    SELECT [Value],
ROW_NUMBER() OVER (ORDER BY ID) AS rn1 
FROM #temp WHERE Field IN ('dbi_dbname'))
    ,CHECKDB2 AS ( SELECT [Value], 
ROW_NUMBER() OVER (ORDER BY ID) AS rn2 
FROM #temp WHERE Field IN ('dbi_createVersion'))
	,CHECKDB3 AS ( SELECT [Value], 
ROW_NUMBER() OVER (ORDER BY ID) AS rn3 
FROM #temp WHERE Field IN ('dbi_version')
)      
SELECT CHECKDB1.Value AS DatabaseName
        , CHECKDB2.Value AS DatabaseCreateVersion
		, CHECKDB3.Value AS DatabaseCurrentVersion
		, CASE 
			WHEN CHECKDB2.Value < CHECKDB3.Value THEN 'Yes'
			ELSE 'No' 
			END as DatabaseUpgradedStatus
FROM CHECKDB1 
INNER JOIN CHECKDB2
	ON rn1 =rn2
INNER JOIN CHECKDB3
	ON rn2=rn3

DROP TABLE #temp

Probably you wonder why do you need to know this info – to be honest, so far I do not have  a reason besides that it is cool and geeky. Have fun!

How to check for active trace flags on multiple SQL Server instances

Following one of my old posts, I would like to share one new multiserver query. This script you can use with your CMS (Central Management Server) as multiserver query and check all active trace flags on all SQL Server instances.

DECLARE @TraceFlags table (TraceFlag smallint, status bit,global bit, session bit) 
insert into @TraceFlags execute('DBCC TRACESTATUS(-1)')
select * from @TraceFlags

Note that I am using table variable and not temporary table (my own preference :-) ) so if you need – you can easily transform the statement.

Change compatibility level on multiple databases and multiple instances

Following my previous post on mass change of PAGE_VERIFY option, I want to share one other script that you can use if you are using a Centralized Management Server and multi-server queries in order to change the compatibility level of multiple databases and instances.

With the below script you can change the compatibility level of all databases and in my example I am setting it to compatibility_level = 100

--mass change the compatibility level option for all instances and 
--databases that are not read only and have current comp. set to 80 or 90
EXEC
sp_MSforeachdb '
use [?]
if (select count(*) 
	from sys.databases 
	where compatibility_level <100 and 
	name=''?'' and database_id>4 and
	is_read_only = 0
)=1
begin
alter database [?] set compatibility_level = 100
PRINT ''compatibility changed to 100 for database = '' + DB_NAME()
end'

 

How to change PAGE_VERIFY on multiple databases and multiple instances

In the recent days I wanted to change the PAGE_VERIFY option on all databases and make sure I am using CHECKSUM everywhere. First question I guess would be “Why?”. The answer lies here.

The article gives a very good example on how to change the PAGE_VERIFY for all databases on one instance, but how do we do that when we use a CMS and multi-server queries? I managed to write this simple statement, that is doing the work. It needs a lot of polishing though (for example narrowing down the list of databases, where we do not need the page verification mechanism). But anyway, it is really good point for starting the task.

--mass change the PAGE_VERIFY option for all instances and databases 
--that are not read only and have current page_verify = NONE or TORN_PAGE
Exec
sp_MSforeachdb '
use [?]
if (select count(*) from sys.databases where page_verify_option <2 and name=''?'' and database_id>4 and is_read_only = 0)=1
begin
alter database [?] set page_verify CHECKSUM WITH NO_WAIT
PRINT '': PAGE_VERIFY changed to CHECKSUM for database = '' + DB_NAME()
end'

 

Disabling clustered index blocks access to underlaying table

Did you know that disabling a clustered index actually blocks the access to the underlying table? Below is a script you can use to test it, if you do not believe me :)

--First create the test table with a clustered index

USE [FTest]
GO

/****** Object:  Table [dbo].[FTable]    Script Date: 11/21/2012 2:02:56 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[FTable](
	[FTableID] [int] IDENTITY(1,1) NOT NULL,
	[LargeString] [nvarchar](600) NULL,
	[LargeInt] [bigint] NULL,
 CONSTRAINT [clIX_PK_FTable_FTableID] PRIMARY KEY CLUSTERED 
	(
		[FTableID] ASC
	) ON [PRIMARY]
) ON [PRIMARY]

GO

--now test the insert of one row

INSERT INTO [dbo].[FTable] ([LargeString],[LargeInt]) VALUES('This is a test row','123')
GO

--disable the clustered index

ALTER INDEX [clIX_PK_FTable_FTableID] 
ON [dbo].[FTable] DISABLE
GO

--try inserting one more row. Note the error: "The query processor 
--is unable to produce a plan because the index 'clIX_PK_FTable_FTableID' 
--on table or view 'FTable' is disabled." 

INSERT INTO [dbo].[FTable] ([LargeString],[LargeInt]) VALUES('This is a test row','123')
GO

--rebuild the clustered index so the table to be accessible again
ALTER INDEX [clIX_PK_FTable_FTableID] ON [dbo].[FTable] REBUILD PARTITION = ALL 
GO

--insert one more row to validate the access

INSERT INTO [dbo].[FTable] ([LargeString],[LargeInt]) VALUES('This is a test row','123')
GO

If you go and try to disable the clustered index from SSMS you even got a nice warning message:

Probably you are wondering why is that – well, as you know, the clustered index is actually containing the table data in its leaf level, so if you disable it, the SQL Server Query Optimizer will not be able to use this very index for creating a plan to access the data, which ultimately prevents the access to the table itself.