The problem with Data Discovery

Despite the power of data discovery tools like Qlik, Tableau, and Power BI, a vast majority of businesses are failing in their efforts to become data driven. Qlik estimates that for their existing customers, there is only a 25% adoption rate (Qonnections 2016). That means that even for companies that have invested in becoming data driven, 75% of their employees are just winging it. Without the right data in the right format for each type of business user, these businesses can end up with entire departments that are not benefiting from data discovery.

There are 2 core reasons why this adoption rate is so low: 1) the right data is not available at all within the Qlik environment; and 2) the data that is available within Qlik is not refined in such a way that it can be useful to the 75%’ers.

If you are struggling with these challenges, you should check out TimeXtender (disclaimer: TimeXtender is an awesome company; disclaimer 2: I work for TimeXtender:). TimeXtender provides software that gets complex data into Qlik, and enables business intelligence (BI) teams to integrate, join, cleanse, denormalize, and otherwise refine the data for each business user. TimeXtender does the job of 3 tools: ETL software (like Informatica and SSIS), database management software (like SQL Server Management Studio), and semantic modeling (like Qlik Data Load Editor). Not only does TimeXtender do the work of these 3 tools, but it does it without required BI teams to write a single line of code.

If you are struggling to increase adoption of data discovery tools like Qlik within your organization, consider whether you are providing all of your users with the data they need and in the format that makes sense to them. If not, take a look at TimeXtender.




Software People need Copyrights and Contracts (not patents)

As a patent attorney, I have something to admit: understanding patents is not as valuable as understanding copyright and contract law for software people. This may have been different a decade ago, before cases like Alice put software patent law squarely in the “nobody knows the rules” bucket. But I think that the changes in case law have only accelerated the shift in importance, and that copyright and contract have always been the driving forces for software.

Why is Copyright Law Important for Software Companies

Did you know that whoever writes software, by default, has rights to that software under 17 USC Section 201(a)? You have a brilliant idea, tell a contractor or employee what you need to happen, pay him or her handsomely for their time, and…they own the rights to the software.

Unless you contractually (and up front) negotiate rights to the software, the person who wrote the code can end up in a position where they can sell the code to a 3rd party, such as your competitor. Bummer.

Why is Contract Law Important for Software Companies

Think of contracts as legal systems that operate between the parties of the contract. So, if you do not like the default behavior of the US Copyright laws, you can enter into a contract where you define how rights are shared between the members of the contract. Instead of petitioning Congress to create a smarter system, you write up your own rules.

The first thing you should do is, for all of your employees and contractors, have them (as part of their employment agreement or a specific engagement agreement or statement of work) assign their rights in any copyrighted material to the company. Because this is done before wages are earned or fees are paid, there is consideration for this transfer of rights from coder to company.

Second, you can agree that any work that cannot be assigned to the company was created as part of “work for hire” arrangement.  You can learn more about a work for hire in Pillsbury Law’s article entitled “Work Made for Hire does not Generally Apply to Computer Software.” In short, work for hire grants employers additional rights, but may also create rights beyond the expectations/desires of the parties (particularly in California, where an employee relationship can be created when works for hire contracts are in place).


In short, the most important priority to keep in mind when hiring employees or contractors to write software is to have a framework in place that AS CLEARLY AS POSSIBLE lays out the rights afforded to each party. Most software developers understand that, when they are working for a company, the company owns that software. By creating a contract that reflects this understanding up front, confusion down the road can be eliminated.

If you have ideas about how to protect either software companies or software employees, I would love to hear about them.






Getting Click Data from Twitter API

I was trying to figure out whether the Twitter API allows you to get “click” counts via the API. This information is available via Twitter app (Tweet activity as “Link clicks”), but I did not see it as part of the API.


It looks like Twitter has made an interesting decision to have this as a commercially, but not publicly, available part of the API (e.g., you can get it via Gnip).

REST API Support for Impressions Data.  Short answer: buy it.

I am hoping that there is some way that Twitter has provided to pull this data (perhaps the Analytics API?). But, as a short-term fix I am going to try this:

The Twitter API will give me the retweets and favorites, and the Bitly API will give me the clicks.

I will write a follow-up based on what I learn. I am curious to see whether stitching together this data from multiple APIs introduces some noise, and whether masking links with Bitly decreases clicks (e.g., a greater number of hops => increased time to hit the destination page, people may be afraid of masked Bitly links, etc).

If you have a better way, I would love to hear it.





Quick Qlik Tip: How to Automatically Identify Relationships in Data Models

Have you ever dreamed of being able to visually explore your data? If you have, Qlik is your dream come true. Qlik, and in particular Qlik Sense, allows you to perform data discovery on just about any data set with relative ease. In this post, I am going to walk you through my favorite way let Qlik help you create valid relationships between tables in an unfamiliar data set.  All told, this article should take 5 minutes to read, and less than 30 minutes if you are following along with the actual Qlik app.

Qlik’s First Question: to Add Data or Not to Add Data

Whenever you create a new Qlik application, you are presented with an option of “Add Data” and “Data load editor.”


The way I think about these differences is that “Add data” is easy, and “Data load editor” is for people who enjoy punishment (they will claim that they prefer greater control over the import process). Both approaches have their strengths, but if you are just looking to jump in with both feet and start exploring your data, I would start with Add data.

If you want to play with a simple dataset, you can download this Excel file that I created to demonstrate some of the fundamentals of adding data to Qlik Sense. I will be using this data for the rest of this post.

Associate Data in Qlik Sense

After you add data, you are going to be slapped in the face with a simple truth: Qlik Sense, which allows you to bring visual meaning to very complex data, has some very complex user interface (Qlik is radically improving this in Qlik Sense 3!). Unless you are just dealing with a single set of data, the first thing you need to do after adding data is associate the data you added. Here is a screen shot that you are presented with…do you see how to create the relationships between your data? Of course you do! It is that little tiny “Associations” button at the bottom of the screen. 🙂


Once you have clicked on the Associations tab, you are presented with an option to associate each of the tables together. For each table to table relationship (e.g., Customers – Theaters), you need to select either “No Association” or one of the recommended associations. For the Customers – Theaters combination, there is no relationship, so you select “No Association” and then select the down arrow to advance to the next table.


NOTE ABOUT OVER-JOINING: Notice how Qlik recommends the association between Customer and Theater with 100% confidence.  There is no relationship between customers and theaters (a person can purchase a ticket at any theater, and the only proper relationship is through the purchase entity). But, because Customer and Theater both have a city field and these have the same values, Qlik recommends a relationship 100%. I still love Qlik for recommending these joins, because it is a lot easier to see a recommendation and determine that it is invalid than having to try to imagine what relationships might exist – particularly on a large and complex data model.

Now that we advance to the Purchases – Theaters table, we see some pretty amazing smarts on display by Qlik: we have the purchaser’s name as “name” and the theater’s name as “name” – but a person’s name (e.g., Matt) is not the same as a theater’s name (e.g., Regal 8). Qlik recognizes that the proper join is between Purchase.Theater and Theater.Name and suggests this association with 100% confidence.


After you go through all of the possible associations, you should have a pretty good object model.


And if you switch over to the Data Load Editor, you will see a nice clean script that Qlik created for your data.

[Name] AS [Theater],
[City] AS [Theaters.City],
[State] AS [Theaters.State],
[Rating] AS [Theaters.Rating];
FROM [lib://Desktop/Movies.xlsx]
(ooxml, embedded labels, table is Theaters);

[FirstName] AS [FirstName],
[City] AS [Customers.City],
[State] AS [Customers.State];
FROM [lib://Desktop/Movies.xlsx]
(ooxml, embedded labels, table is Customers);

[Name] AS [FirstName],
[Movie] AS [Title],
[Theater] AS [Theater],
[Price] AS [Price];
FROM [lib://Desktop/Movies.xlsx]
(ooxml, embedded labels, table is Purchases);

[Title] AS [Title],
[Movie Length] AS [Movie Length],
[Rating] AS [Movies.Rating];
[Movie Length],
FROM [lib://Desktop/Movies.xlsx]
(ooxml, embedded labels, table is Movies);

If you have any questions, drop a comment below.



Learn Machine Learning with Datacamp (on the Side)

I have worked with Machine Learning off and on for the last couple of years. Unfortunately, it is not part of my day job, so I need to find ways to integrate machine learning into an already busy day and week.

I stumbled across Datacamp, and so far I love what I see. After creating a free account in about 10 seconds, I was watching my first sub-10 minute video. Yes, no biggie there, but that is where it gets great: Datacamp starts providing an interactive R session, along with detailed instructions and readily available datasets, where you can actually start doing your very own work – no software required. Pretty impressive.


I would love to know more about Datacamp: who are the founders, what is their business model, and how did they get so awesome? If you know the founders, please drop a comment below.

Thanks, and if you end up testing it out let me know what you think!


Mongo v Cassandra

Here is a post from my good friend, Dharshan Rangegowda, Founder and CEO at Scalegrid. Check it out. And if you are looking to host Mongo on AWS, give him a shout. He will take care of you!


Deploying TimeXtender on Azure

When I was an IT contractor for the Marine Corps in Washington DC, the summers could get pretty hot. The only respite from the heat was, you guessed it, the server room. Sometimes, I would just fire up one of the servers and do some research on a web browser to bask in that high-powered AC (please do not tell anybody). The biggest problem from my IT worker perspective was that when the AC went down or some other piece of hardware decided to die over the weekend, one of us would have to drive in and fix it.

Fast forward 10 years, and the days of the boutique server room seem to be drawing to a close. Companies are moving to public and private cloud at an accelerated rate. There are a number of benefits to moving to the cloud: shifting capex to opex, just-in-time capacity, and not having to come in on the weekend when the air conditioner goes down.

But what does that mean for on-premise software – software that used to operate inside that soon-to-be-shut-down boutique server room? One trend I am seeing is that companies are shifting on-premise software to the cloud with a Virtual Machine (VM) architecture. Granted, there are differences between a purpose-built cloud offering and a VM-based cloud offering, but it seems that the VM architecture is the optimal approach for companies that have invested time and money creating valuable on-premise software.

With a little bit of creativity, companies can realize all of the key benefits of the of the cloud (e.g., no on-premise footprint, opex, just-in-time capacity, subscription-/consumption-based pricing, and not having to come in on the weekend when the air conditioner goes down), and these benefits are realized without having to throw away a valuable piece of software that is already delivering value to customers.

Deploying TimeXtender on an Azure VM

Let’s get to the Technical Part. I assume that you already have an Azure account. The first step is to create a new SQL Server VM (not a SQL Database…that is for Azure SQL, Microsoft’s SQL as a Service offering). I chose SQL Server 2014 SP1 Standard on Windows Server 2012 R2.


Accept the defaults when you create this VM, but make sure that you write down the username and password that you use to create the VM, because you will log into the VM with this information. I did use a resource group instead of the Classic VM, because I think that is the direction Microsoft is heading.

For the VM sizing, I chose DS1_V2 mostly from a cost perspective. It costs me $104.16 a month, so I can leave it spinning on the cloud for demos without burning through too much budget.  This is not going to handle monster workloads, but particularly if you are just exploring TimeXtender and building super fast data warehouses/data discovery hubs, it will have enough muscle for your needs.

Azure VM Sizing for TimeXtender

For the SQL Settings, select Public – assuming that you are going to want to consume data from your data discovery hub from other machines (e.g., a Qlik application). For the port, you can stick with 1433 or you can be sneaky (just be sure to remember your sneakiness when trying to connect to your database).

Install TimeXtender Software

The easiest way to get the software installed on the VM is to download it to your desktop and then drag it onto the VM: (chose 64-bit server version).

For full details on the installation process,

Connect to a Data Source + Create Data Structures

Now let’s get started. My preferred approach is to grab a copy of the Data Discovery Hub template, Import the Template as a new project, and Run the Connection Wizard.

Let’s start by importing the project. First, download the Data Discovery Hub template at Place it on the desktop of your VM. Then open TimeXtender and select Import/Export, and then Import Project, like this:

Import TimeXtender Solution Template


As soon as you import the project, run the connection wizard. You will see that there are 4 different connections: ODS, Stage, MDW, and Dynamics. Dyanamics represents an existing Dynamics database. The other 3 (ODS, Stage, MDW) are part of the data discovery hub. This is the data repository that we are going to create to store our Dyanamics data.

Run TimeXtender Connection Wizard

In short, the data discovery hub includes the ODS (Operational Data Store) where we land non-transformed data right out of the source, the Stage Data Warehouse where we do our transformations and denormalizations, and the MDW (or Modern Data Warehouse) where we house the “business ready” data. Power Users can be granted access to the ODS so that they have access to all data across the enterprise and can conduct unfettered data discovery. The MDW is used to populate Qlik (or other data visualization/front-end systems). The data discovery hub is the most effective way that I have ever seen to get world class data to all employees within a company.

TimeXtender Data Discovery Hub

But I digress…let’s get back to setting up our system. When we run the wizard, it will first ask us where we want to store our ODS. Keep the defaults, but you have to click the “Create” button before hitting OK to actually create this database on your machine. Once you hit “Create” then you can hit OK.

Next, the wizard prompts us to connect to a Dynamics database. You can either connect to your own Dynamics database, or you can connect to a sample database we are hosting on Azure. To connect to our Dynamics database, the server name is, the SQL Username is TxSQL, and the SQL Password is TimeXtenderRocks! (with an exclamation point!). Your wizard should look something like the image below.

Connect to Remote Data Source

Notice that there is no option to create this database, because we are pulling data from an existing data source, not creating a repository where we can store data.

Run through MDW and Stage (do not forget to click Create for ODS, MDW, and Stage!), and then click the Test Connections button on the wizard. Everything should be green, like shown below.

TimeXtender Connection Wizard Green

For starters, I have just created the Data Discovery Hub architecture, and I will leave it to you to pull tables into your ODS, structure tables from your ODS into business ready data in your Stage DW, and then present business ready data to your business users in the MDW.

If you have any problems with any of these, let me know!