Today, while I was going over my RSS feeds, I noticed that Microsoft released their Data Explorer as official Preview version. You can read the announcement here. There you can also find and a very slick video showing off the functionality of Data Explorer.
Download the tool from here.
I decided to play with it by myself and it was real fun, I have to say. So in short I installed the tool (it is embedded in Excel) and once you open MS Excel (I am running Office 2013) there is a new ribbon:
The sources you can use are basically the same as what you can use with Excel and PowerPivot, with one enormous exception – there is a driver for connecting to HDFS (Hadoop File System) – well that is now something you can explore
Anyway, at that moment I do not have a “big data” setup I can work with, so I will just show you how quickly I managed to gather information on country population and alcohol consumption using public web data (I did not search about alcohol consumption, but for population density, however this data set popped up, so I decided it will be fun). So once you open the Data Explorer ribbon and hit Online Search (the leftmost button on the ribbon), you got a simple window where you just type what are you searching for and you get dozens of suggested datasets which you can use:
When you hover the mouse over the provided dataset, you get a preview of the data on the left of your screen and a little button “USE” just below the dataset description. If you hit it, the data will be embedded in a new sheet in Excel and you can do lots of things with it. What I want to show you is how I used two datasets and combined them to a new set, joining them on the fly and ultimately producing dataset for my particular reasons (without needing to VLOOKUP, import/export and so on).
So once you have the two sets of data in two different sheets (remember, the connection to the source of data is life), you can go to the Data Explorer Window and press the Merge button – it will show you the following window:
You have a two-part window – on the top you select the source dataset and on the bottom – the one that you wish to join to the source one. You just mark the columns using the mouse (use Ctrl for multiple selection) on both datasets (top and bottom). Internally it is building an LEFT OUTER JOIN and if your data does not match completely (just as mine) you will not get values from the bottom dataset. You can also see the statistics of how many of the rows you can have matched, which will further help you cleanse the data further. So here is what I had as final report:
Of course this is solely is not the entire functionality of Data Explorer, but it will get you to start testing and playing with by yourself.