One of the biggest challenges I face as a Microsoft 365 solutions architect around data warehousing is, which has the best architecture
? What’s the best approach that conforms to Microsoft best practices? All architects should be future-proofing their designs, not just aiming to meet business requirements. Turns out, there are a few levels of reusability that should be considered up front during the initial stages of designing your data warehouse.
: If you’re new to Power BI, and just want to publish some reports for your employees, (like converting a spreadsheet into a report), then level 1 is a great start.
: If you have a system like SharePoint, or a SQL Server database, you will naturally connect to that data source, import the data into the Power BI Service with a data refresh scheduled every hour or so. You may not know this, but a dataset in Power BI is a SSAS tabular model underneath the hood. That means there is already a decent data warehouse running your data model with no expert tuning.
: This is where life gets interesting, because traditionally, SSAS
(SQL Server Analysis Server on-prem) or AAS
(Azure Analysis Services - in the Azure cloud), has been used to create enterprise scale data warehouses.
Here is a typical Azure data warehouse architecture using AAS:
This diagram shows a typical data loading BI pattern for a data warehouse and then a data/semantic model built with AAS. Then Power BI is used for visualisations like reporting and dashboards on top of that semantic model. This is a very complex solution as there are many moving parts, therefore many points of possible failure. A typical solution would include:
- Azure PowerShell – To trigger refresh of table partitions.
- SQL Server Data Tools (SSDT) software – to maintain data partitions and models.
- SQL Server Management Studio (SSMS) software – to query the database created by each AAS model.
- Azure SQL Server instance – So AAS can create databases as required for each model.
- Azure Data Factory – Create ETL pipelines to load and cleanse the data for the AAS data warehouse.
- Azure LogicApps – Azure workflow tool to refresh the data models, and datasets for Power BI.
- Visual Studio, Partition Manager software - To create and manage partitions.
Make no mistake, you need expert level understanding of all these tools, not to mention a deep knowledge of Tabular Model Scripting Language (TMSL) to manage all the object partitions needed for incremental refresh – especially if you have millions of rows of data.
My point is that I rate AAS complexity as high because of the sheer volume of moving parts, and possible points of failures, which therefore needs an expert data engineer to maintain. Unfortunately, there is no possibility of reusability in AAS, just duplication of data into manageable partitions, not to mention all the software that needs to be installed just to maintain models and partitions.
: Power BI Dataflows
does most of the things you read about in Level 3
, plus a whole lot more, with nowhere near the level of complexity or expertise needed. Microsoft acknowledged that their BI stack was unnecessarily complex and cumbersome to get even a basic data warehouse working. In fact, Microsoft are migrating all SSAS/AAS features to Power BI, and over time plan to retire SSAS/AAS completely
. They realized there was no need to have two competing solutions.
Today the entire BI market is led by products that excel at the self-service capabilities. Dataflows are a huge step in that direction. They are created and managed in the Power BI service, directly in the browser.
You can have as many dataflows as you like, representing connections to your data sources or ETL data transformations. The example below connects to Dynamics 365 Business Central OData data source.
Imagine you have billions of rows to load and transform, then loading that into your Power BI Desktop software. Well, you would run out of PC memory, fast! So, leave the data in the Power BI Service, in this case Dataflows, and only query what you need in your reports via DirectQuery mode. That is, you can read the data live from your Dataflows.
If you’re a DBA, or data engineer, I know what you’re thinking, “what about query performance Tony, if I have billions of rows of data, connecting over many tables?”. It turns out there are some really powerful performance-related work arounds you can apply to your models, namely:
- Incremental refresh (especially if you need live data)
- Composite Models
In time, I will be releasing articles to address each of these as they have specific criteria for your model implementations.
Dataflows: No need for partition management
As I mentioned in Level 3 that in a typical SSAS data warehouse, you need to manually partition your big data tables using a date range to help speed up your queries, then they would have to be frequently refreshed/re-processed/merged. All this complexity goes away with the Dataflow Incremental Refresh
because Dataflows takes care of it all! However, if you really wanted to have a crack at optimising your Dataflows partitions, you can connect to an XMLA endpoint found in the Power BI service, under your workspace settings. You would need to install the SQL Server Data Tools software first.
Just because you can, it does not mean you should.
Don’t tinker with the Dataflow partitions if you’re already happy with your query performance.
With the advent of Power Query Online
, you can connect to an extended list of data sources, load their data, then apply your transformations over the top. No SQL coding needed, but you might need to create some DAX expressions to create new columns or summarise data. Power Query Online can connect to any Dynamics 365 product, SQL Server, or even a Web API.
The above diagram identifies the major components in your Dataflow architecture you need to be aware of.
If you'd like to learn more about Power BI capabilities and what it could do for your business, get in touch with us today to discuss your requirements and how to get started.
Email Fiona - f[email protected]
or call on 0415 870 135