Business Intelligence with Idera
This post will talk about how to build a business intelligence solution for Idera SQL Diagnostic Manager data using the Microsoft BI stack of software. This includes SQL Server, Integration Services, Analysis Services and PowerBI. As Idera comes with its own suite of software to display dashboards for data, this project as a bit of overlap, however, it’s a good source of data to learn the BI stack and present information in a way that Idera may not be able to.
To get started, we are going to create a data warehouse in SQL Server. This will consist of some simple tables in a dedicated database modeled after the Kimball design. For those who are unfamiliar with dimensional modeling I suggest finding some material and studying up on it. There’s a great deal of information regarding the topic.
We will have one fact table for our measures and three dimensional tables for our modeling. The fact table will consist of counters collected by the Idera instance (which can be thought of as machine data, by the way), along with the counter type ID, the datetime the counter was collected and the ID’s for the SQL instance and the database, if applicable. The counters themselves are the measures, values that will be calculated into averages, maximums and minimums for our analysis. All four other columns (the datetime, SQL Server ID, database ID and counter ID) will be used by the dimensional tables for modeling and slicing.
The three dimensional tables are for the counter types, SQL Server and database instances and the datetime values. The DimCounter table holds the counter category and name while the DimSQLServer table holds the SQL Server and database names. The DimDate table is mostly a calculated table. It stores the datetime corresponding to the measure in the fact table, along with calculated columns representing attributes such as year, month, day and others, derived from the datetime value.
Notice that there is a defined primary key for each of the dimensional tables and a foreign key from each of the key columns in the fact table.
At the end of this post I’ve included the code for creating the database and the tables. I also create a staging table and an error table for use later.
Extract Transform Load (ETL)
The next step is to design the ETL package to use for getting data into the dimensional model. For this I use SQL Server Integration Services (SSIS), a tool designed for this purpose. I have connections defined for both the Idera database and the data warehouse itself, and I use one package as part of a master project. This package loads the dimension tables first and then the fact table. The date table is seeded from the fact table.
The most important thing to understand about SSIS is what it does well vs. what it doesn’t do well. Generally, you want to perform set operations within the SQL Server database instance. Although I don’t show it, I perform lookups for the DimCounter and DimSQLServer dimension tables in SSIS, because the data volume is not large and SSIS can handle errors and data flow paths very well. For FactIdera, however, I use the staging table as a fast import for SSIS to dump data, and then use the SQL MERGE function to merge data into the real fact table. This processing on the database side saves a great deal of processing time that SSIS would have to do. Depending on how much memory you have on the machine running the package, it would take minutes to hours to finish if SSIS had to cache the data to load into the fact table.
The DimDate table is seeded from the fact table, as I mentioned earlier. Because Idera is constantly writing data to its database, importing to the date table before importing to the fact table can result in referential integrity errors if your package runs for a long time.
The main data flow part of the package for the FactIdera table is below.
The hardest part here to understand is the data types for the data you are working with. If SSIS cannot convert between data types in various tables, it won’t let you perform operations like lookups and joins. Solving this requires the use of data conversion tasks as you can see above. Generally, I find that using a unicode data type (NVARCHAR in SQL Server) for character data is easiest to work with, but it depends on your data.
Once you have the package done and it runs without error (via debugging), you can deploy it to your target SQL Server instance and schedule it using a SQL Server Agent job.
I’ve included the SSIS package below in the download link.
Online Analytical Processing (OLAP)
In the Microsoft world, SQL Server Analysis Services (SSAS) is the tool for performing the next part of our project, that being preparing the data for OLAP processing. OLAP processing, and cube building, as it’s also referred to, pre-configures many of the common operations we want to do against the data, including sums, maximums and minimums, averages and counts. This is part of what makes this method of reporting flexible and fast. The cube does some of the work work up front, so that when it’s presented as a data source, it’s optimized for reporting.
Two different ways of OLAP processing exist. One is called multidimensional and is the traditional model. The other is the tabular method and is newer. Both serve roughly the same purpose and have pros and cons. For our purposes, we’ll take a look at both.
There are various resources out in the wild that can teach you more about each method and when to use which one.
Multidimensional
The multidimensional model involves creating a data source and view (which is just a view of your tables), as well as dimensions and cubes. For the dimensions, it’s almost a view of your table, but you have some additional tasks to complete. One important task is to create hierarchies, especially for your date dimension. This is what will enable you to drill down in your reports.
Besides hierarchies, attribute relationships can also be created, although I don’t do it here.
Once you’ve loaded your dimensions, you can build and process them, which you should do to see if there are any errors you need to correct. SSAS allows you to browse your data in the application, but this is mostly for checking to make sure your data is presented as you intended.
Most of your design work will be within your cube.
The two key aspects here to review are measures and calculations. There are others as you can see, but we’ll focus on measures and calculations. Measures are where you define your sums, maximums and minimums, etc. You should have as many measures as you need for your reports.
Calculations are where you can control the leaf cells of your cube. For us, we can define an average calculation, as average is not included in the list of measures you can choose from in the main page.
Since the average is a measure of sum divided by count, it’s fairly easy to express. The multidimensional model for SSAS uses a language called MDX, which you can find resources online to learn about if you so desire.
Once your cube is done, you can process it to view any errors, and after that, you can deploy your cube to an Analysis Services instance of SQL Server. You actually have to use a deployment wizard buried in your start menu folder structure to do that. There’s no way that I can find to deploy it from the application (it will deploy locally).
I’ve included my multidimensional project in the download below.
Tabular
In response to the ever decreasing cost of storage and memory, and based on the somewhat steep learning curve of the multidimensional model, the tabular model was born. This is simpler to learn and is run by the same code base as the Microsoft Excel PowerPivot package. If you know one, you know the other.
Instead of cubes and dimensions, now we have something that looks a lot like Excel. As with multidimensional, we still have to create hierarchies, and as an added bonus, we have to create a concatenated column to join DimSQLServer to FactIdera, as the tabular model does not support joins on more than one key column.
Other than those two things, it’s pretty straightforward. Deployment is actually specified in the beginning when you create your project. You can connect to a tabular instance or you can continue as an isolated project.
The biggest gotcha with tabular is that it’s an in-memory application (as in, your data is processed and presented in-memory). You can spool to disk, but performance will degrade. If you have a model with terabytes of data, tabular may not work for you.
I’ve included my tabular model project in the download below.
PowerBI Reports
The final part of this project is the fun part. PowerBI is great to use and since you’ve spent all this time creating a dimensional model, your reward is the data will respond fast to your queries.
There are many resources out there to learn PowerBI, but the best suggestion I have for you is to play and work with it yourself. Eventually you’ll get the hang of it.
Connecting to a data source is easy. We can choose either our multidimensional model or our tabular model. It doesn’t matter in our case, but I chose the tabular model.
After successfully connecting you’ll see your measures and dimensions on the right side, and you can start dragging visualizations as you please. With some practice, you’ll be able to create some pretty cool looking reports (I’ve hidden the slicers on the right showing server and database names)
Notice that we are connected live to the data in the bottom right corner. That means each time we click a slicer option or click on a report (to drillthrough), the data will update in real time. PowerBI is based on the technologies from PowerPivot, PowerView and PowerBI in Excel, and will use the DAX query language to reach out to the tabular model.
After having worked with it for a little while now, the best advice I can give you regarding PowerBI is to take advantage of your filters. You have report, page and visual filters, and they take affect in that order. If you try to work with all your data at once, you’ll start running into issues as the data won’t be able to fit on one screen.
That’s it for this project. Please click here for the projects and code. I’ve also included a brief presentation as well.