The problem with Data Discovery

Despite the power of data discovery tools like Qlik, Tableau, and Power BI, a vast majority of businesses are failing in their efforts to become data driven. Qlik estimates that for their existing customers, there is only a 25% adoption rate (Qonnections 2016). That means that even for companies that have invested in becoming data driven, 75% of their employees are just winging it. Without the right data in the right format for each type of business user, these businesses can end up with entire departments that are not benefiting from data discovery.

There are 2 core reasons why this adoption rate is so low: 1) the right data is not available at all within the Qlik environment; and 2) the data that is available within Qlik is not refined in such a way that it can be useful to the 75%’ers.

If you are struggling with these challenges, you should check out TimeXtender (disclaimer: TimeXtender is an awesome company; disclaimer 2: I work for TimeXtender:). TimeXtender provides software that gets complex data into Qlik, and enables business intelligence (BI) teams to integrate, join, cleanse, denormalize, and otherwise refine the data for each business user. TimeXtender does the job of 3 tools: ETL software (like Informatica and SSIS), database management software (like SQL Server Management Studio), and semantic modeling (like Qlik Data Load Editor). Not only does TimeXtender do the work of these 3 tools, but it does it without required BI teams to write a single line of code.

If you are struggling to increase adoption of data discovery tools like Qlik within your organization, consider whether you are providing all of your users with the data they need and in the format that makes sense to them. If not, take a look at TimeXtender.



Getting Click Data from Twitter API

I was trying to figure out whether the Twitter API allows you to get “click” counts via the API. This information is available via Twitter app (Tweet activity as “Link clicks”), but I did not see it as part of the API.


It looks like Twitter has made an interesting decision to have this as a commercially, but not publicly, available part of the API (e.g., you can get it via Gnip).

REST API Support for Impressions Data.  Short answer: buy it.

I am hoping that there is some way that Twitter has provided to pull this data (perhaps the Analytics API?). But, as a short-term fix I am going to try this:

The Twitter API will give me the retweets and favorites, and the Bitly API will give me the clicks.

I will write a follow-up based on what I learn. I am curious to see whether stitching together this data from multiple APIs introduces some noise, and whether masking links with Bitly decreases clicks (e.g., a greater number of hops => increased time to hit the destination page, people may be afraid of masked Bitly links, etc).

If you have a better way, I would love to hear it.





Quick Qlik Tip: How to Automatically Identify Relationships in Data Models

Have you ever dreamed of being able to visually explore your data? If you have, Qlik is your dream come true. Qlik, and in particular Qlik Sense, allows you to perform data discovery on just about any data set with relative ease. In this post, I am going to walk you through my favorite way let Qlik help you create valid relationships between tables in an unfamiliar data set.  All told, this article should take 5 minutes to read, and less than 30 minutes if you are following along with the actual Qlik app.

Qlik’s First Question: to Add Data or Not to Add Data

Whenever you create a new Qlik application, you are presented with an option of “Add Data” and “Data load editor.”


The way I think about these differences is that “Add data” is easy, and “Data load editor” is for people who enjoy punishment (they will claim that they prefer greater control over the import process). Both approaches have their strengths, but if you are just looking to jump in with both feet and start exploring your data, I would start with Add data.

If you want to play with a simple dataset, you can download this Excel file that I created to demonstrate some of the fundamentals of adding data to Qlik Sense. I will be using this data for the rest of this post.

Associate Data in Qlik Sense

After you add data, you are going to be slapped in the face with a simple truth: Qlik Sense, which allows you to bring visual meaning to very complex data, has some very complex user interface (Qlik is radically improving this in Qlik Sense 3!). Unless you are just dealing with a single set of data, the first thing you need to do after adding data is associate the data you added. Here is a screen shot that you are presented with…do you see how to create the relationships between your data? Of course you do! It is that little tiny “Associations” button at the bottom of the screen. 🙂


Once you have clicked on the Associations tab, you are presented with an option to associate each of the tables together. For each table to table relationship (e.g., Customers – Theaters), you need to select either “No Association” or one of the recommended associations. For the Customers – Theaters combination, there is no relationship, so you select “No Association” and then select the down arrow to advance to the next table.


NOTE ABOUT OVER-JOINING: Notice how Qlik recommends the association between Customer and Theater with 100% confidence.  There is no relationship between customers and theaters (a person can purchase a ticket at any theater, and the only proper relationship is through the purchase entity). But, because Customer and Theater both have a city field and these have the same values, Qlik recommends a relationship 100%. I still love Qlik for recommending these joins, because it is a lot easier to see a recommendation and determine that it is invalid than having to try to imagine what relationships might exist – particularly on a large and complex data model.

Now that we advance to the Purchases – Theaters table, we see some pretty amazing smarts on display by Qlik: we have the purchaser’s name as “name” and the theater’s name as “name” – but a person’s name (e.g., Matt) is not the same as a theater’s name (e.g., Regal 8). Qlik recognizes that the proper join is between Purchase.Theater and Theater.Name and suggests this association with 100% confidence.


After you go through all of the possible associations, you should have a pretty good object model.


And if you switch over to the Data Load Editor, you will see a nice clean script that Qlik created for your data.

[Name] AS [Theater],
[City] AS [Theaters.City],
[State] AS [Theaters.State],
[Rating] AS [Theaters.Rating];
FROM [lib://Desktop/Movies.xlsx]
(ooxml, embedded labels, table is Theaters);

[FirstName] AS [FirstName],
[City] AS [Customers.City],
[State] AS [Customers.State];
FROM [lib://Desktop/Movies.xlsx]
(ooxml, embedded labels, table is Customers);

[Name] AS [FirstName],
[Movie] AS [Title],
[Theater] AS [Theater],
[Price] AS [Price];
FROM [lib://Desktop/Movies.xlsx]
(ooxml, embedded labels, table is Purchases);

[Title] AS [Title],
[Movie Length] AS [Movie Length],
[Rating] AS [Movies.Rating];
[Movie Length],
FROM [lib://Desktop/Movies.xlsx]
(ooxml, embedded labels, table is Movies);

If you have any questions, drop a comment below.



Learn Machine Learning with Datacamp (on the Side)

I have worked with Machine Learning off and on for the last couple of years. Unfortunately, it is not part of my day job, so I need to find ways to integrate machine learning into an already busy day and week.

I stumbled across Datacamp, and so far I love what I see. After creating a free account in about 10 seconds, I was watching my first sub-10 minute video. Yes, no biggie there, but that is where it gets great: Datacamp starts providing an interactive R session, along with detailed instructions and readily available datasets, where you can actually start doing your very own work – no software required. Pretty impressive.


I would love to know more about Datacamp: who are the founders, what is their business model, and how did they get so awesome? If you know the founders, please drop a comment below.

Thanks, and if you end up testing it out let me know what you think!


Mongo v Cassandra

Here is a post from my good friend, Dharshan Rangegowda, Founder and CEO at Scalegrid. Check it out. And if you are looking to host Mongo on AWS, give him a shout. He will take care of you!


Deploying TimeXtender on Azure

When I was an IT contractor for the Marine Corps in Washington DC, the summers could get pretty hot. The only respite from the heat was, you guessed it, the server room. Sometimes, I would just fire up one of the servers and do some research on a web browser to bask in that high-powered AC (please do not tell anybody). The biggest problem from my IT worker perspective was that when the AC went down or some other piece of hardware decided to die over the weekend, one of us would have to drive in and fix it.

Fast forward 10 years, and the days of the boutique server room seem to be drawing to a close. Companies are moving to public and private cloud at an accelerated rate. There are a number of benefits to moving to the cloud: shifting capex to opex, just-in-time capacity, and not having to come in on the weekend when the air conditioner goes down.

But what does that mean for on-premise software – software that used to operate inside that soon-to-be-shut-down boutique server room? One trend I am seeing is that companies are shifting on-premise software to the cloud with a Virtual Machine (VM) architecture. Granted, there are differences between a purpose-built cloud offering and a VM-based cloud offering, but it seems that the VM architecture is the optimal approach for companies that have invested time and money creating valuable on-premise software.

With a little bit of creativity, companies can realize all of the key benefits of the of the cloud (e.g., no on-premise footprint, opex, just-in-time capacity, subscription-/consumption-based pricing, and not having to come in on the weekend when the air conditioner goes down), and these benefits are realized without having to throw away a valuable piece of software that is already delivering value to customers.

Deploying TimeXtender on an Azure VM

Let’s get to the Technical Part. I assume that you already have an Azure account. The first step is to create a new SQL Server VM (not a SQL Database…that is for Azure SQL, Microsoft’s SQL as a Service offering). I chose SQL Server 2014 SP1 Standard on Windows Server 2012 R2.


Accept the defaults when you create this VM, but make sure that you write down the username and password that you use to create the VM, because you will log into the VM with this information. I did use a resource group instead of the Classic VM, because I think that is the direction Microsoft is heading.

For the VM sizing, I chose DS1_V2 mostly from a cost perspective. It costs me $104.16 a month, so I can leave it spinning on the cloud for demos without burning through too much budget.  This is not going to handle monster workloads, but particularly if you are just exploring TimeXtender and building super fast data warehouses/data discovery hubs, it will have enough muscle for your needs.

Azure VM Sizing for TimeXtender

For the SQL Settings, select Public – assuming that you are going to want to consume data from your data discovery hub from other machines (e.g., a Qlik application). For the port, you can stick with 1433 or you can be sneaky (just be sure to remember your sneakiness when trying to connect to your database).

Install TimeXtender Software

The easiest way to get the software installed on the VM is to download it to your desktop and then drag it onto the VM: (chose 64-bit server version).

For full details on the installation process,

Connect to a Data Source + Create Data Structures

Now let’s get started. My preferred approach is to grab a copy of the Data Discovery Hub template, Import the Template as a new project, and Run the Connection Wizard.

Let’s start by importing the project. First, download the Data Discovery Hub template at Place it on the desktop of your VM. Then open TimeXtender and select Import/Export, and then Import Project, like this:

Import TimeXtender Solution Template


As soon as you import the project, run the connection wizard. You will see that there are 4 different connections: ODS, Stage, MDW, and Dynamics. Dyanamics represents an existing Dynamics database. The other 3 (ODS, Stage, MDW) are part of the data discovery hub. This is the data repository that we are going to create to store our Dyanamics data.

Run TimeXtender Connection Wizard

In short, the data discovery hub includes the ODS (Operational Data Store) where we land non-transformed data right out of the source, the Stage Data Warehouse where we do our transformations and denormalizations, and the MDW (or Modern Data Warehouse) where we house the “business ready” data. Power Users can be granted access to the ODS so that they have access to all data across the enterprise and can conduct unfettered data discovery. The MDW is used to populate Qlik (or other data visualization/front-end systems). The data discovery hub is the most effective way that I have ever seen to get world class data to all employees within a company.

TimeXtender Data Discovery Hub

But I digress…let’s get back to setting up our system. When we run the wizard, it will first ask us where we want to store our ODS. Keep the defaults, but you have to click the “Create” button before hitting OK to actually create this database on your machine. Once you hit “Create” then you can hit OK.

Next, the wizard prompts us to connect to a Dynamics database. You can either connect to your own Dynamics database, or you can connect to a sample database we are hosting on Azure. To connect to our Dynamics database, the server name is, the SQL Username is TxSQL, and the SQL Password is TimeXtenderRocks! (with an exclamation point!). Your wizard should look something like the image below.

Connect to Remote Data Source

Notice that there is no option to create this database, because we are pulling data from an existing data source, not creating a repository where we can store data.

Run through MDW and Stage (do not forget to click Create for ODS, MDW, and Stage!), and then click the Test Connections button on the wizard. Everything should be green, like shown below.

TimeXtender Connection Wizard Green

For starters, I have just created the Data Discovery Hub architecture, and I will leave it to you to pull tables into your ODS, structure tables from your ODS into business ready data in your Stage DW, and then present business ready data to your business users in the MDW.

If you have any problems with any of these, let me know!



Creating a Map of Your Travels with Power BI

I have been hearing a lot of great things about Power BI recently, so I decided to take it for a spin. We use Hubspot at work for our CRM, and Hubspot does not have meaningful reporting (sorry Hubspot). For example, you cannot create a report for all of your active deals in a particular location (e.g., to figure out who you should visit during a trip), because deals do not have a location – they have an associated company that has a location, but there is no way to “chain” (or denormalize) data in Hubspot.

Download Power BI

Instead of lamenting, I decided to get to work with Power BI. Simply head to this URL: and select “Download” on the left.


A minute or so later, and I was ready to roll. Now, what I did was export my Deals data and then my Companies data from Hubspot, import this into Power BI, associate the Company>Name and Deal>Associated Company, and then add these to a map. Super easy, but too much for an intro to Power BI, so let’s just create some interesting data that we can visualize with Power BI.

Enter Your Travel Data

After my download completed, I opened Power BI and clicked on the Enter Data icon on the Home tab at the top.


This allowed me to enter the countries I visited, and the years I visited those countries, into an Excel like field.


Click on the Load button and I am ready to go.

Add Data to a Map

There are 2 types of maps in Power BI: bubble maps (or point maps) and fill maps (and shape maps). Microsoft has provided an excellent primer for how to add these maps to your Power BI application. I went with a bubble map, and simply dragged the Country column from Table 1 to the Location field and ended up with a map that looks like this:


How Did I Add Smarter Labels to Power BI Maps?

For all that I love about Power BI, this seemed to be a weakness: you cannot add content to the text of the pin. For example, when I hover over Ecuador, all I see is Ecuador. I cannot add the year that I visited Ecuador in any logical way. What I did to overcome this was to drag the year into the “color saturation” column, and this adds the year to the dropped pin hover text (it also changes the color intensity, but not in a way that is noticeable). You should end up with a view like this when you hover over Ecuador:


Finally, it is always good to have a listing of your data outside of your map. I did this by clicking on the table field and then adding the Year and the Country (note: I do not think that you can chose the order of these fields, so if you chose Country as the first column you will end up with an alphabetic listing of countries, and there is nothing you can do to change this…and that is why I put year first so that Power BI defaults to order by year…which kind of makes sense here)


And that is a pretty fun map that you can go ahead and share on social media, like I did here:


How to Change Data in Power BI

Everything is easy once you know how to do it, but when I first tried to change the data that I entered into Power BI’s “Enter Data” interface, I was a bit stuck. Here is what I did: click on the elipses next to Table1 on the right side under Fields, select Edit Query, and on the right side of the modal window that pops up, I selected the gear under Source, which is under Applied steps. Seems a bit tricky, so if you have a better way, lemme know!

Elipses   Source

Initial Impression of Power BI

Overall, I am impressed with Power BI. It is powerful, and it is affordable, and Microsoft has made it very easy to experiment with. If you end up creating a map of the places you have been, let me know so I can check it out!