In our previous Power BI blogs, we’ve introduced the Power BI platform and described why it’s such a great tool for developing custom business and clinical analytics reports for MEDITECH Data Repository. In this blog, we’ll discuss a vitally important concept at the heart of all Power BI report development: the data model.
Data models are essential in Power BI, as they provide a structured representation of the underlying data that allows users to easily analyze and visualize it. A data model is a logical representation of the data sources and relationships between them, which enables users to understand and interpret data in a more meaningful way. In Power BI Desktop, the data model is created as you create data source connections, and optionally use the Power Query Editor to manipulate, transform, and merge data from different sources. What you’re left with is a representation of entities (let’s call them tables) and the relationships between them, much like a database diagram in SQL Server – although with Power BI, what I call a “table” may or may not be a table from your SQL Server database – for us most commonly, a table from the Data Repository.
Here are some reasons why data models are important in Microsoft Power BI:
- With a well-designed data model, users can easily explore data and discover insights without having to write complex SQL queries or analyze raw data.
- By organizing data in a data model, Power BI can efficiently process and compress large volumes of data, which improves query performance and reduces load times.
- An effective data model ensures data consistency and accuracy by defining clear relationships between data tables and fields. This makes it easier to ensure that data is consistent and accurate across different reports and dashboards.
- Power BI allows users to create their own reports and dashboards, and a well-designed data model makes it easier for them to do so. Users can create their own visualizations and explore data in a way that meets their needs.
- A well-designed data model can accommodate new data sources and changes in business requirements, making it easier to scale the solution as the organization grows.
The data model is critical to the success of any Power BI project, as it enables users to analyze and visualize data in a way that meets their needs.
Do I have to use a star schema for a data model?
The star schema is a specific type of data model that organizes data into a central fact table with related dimension tables. This approach simplifies queries and improves performance by reducing the number of tables that need to be joined. Here's an example of a surgery data model we've built based on MEDITECH Data Repository in Expanse:
Our main fact table is fSurgeryCases (in blue), with supporting fact tables (in yellow) for patient account information, surgery events, and surgery blocks. The other tables are dimensions, as you might guess, built from MEDITECH dictionary data as well as standard dimensions for dates and times. This approach, built on a custom datamart, allows for efficient storage of fact-table data, and normalized relationships between all the tables.
Star schemas however, may not always be the best option for every data modeling scenario. Other modeling approaches you can use in Power BI include:
- A rowset-based model from a SQL query or stored procedure
- A snowflake schema
Using a rowset-based model to return all the data in a single dataset in Microsoft Power BI is a simple approach, very well suited to "one report at a time" development. Each report is customized per the user request, and has a corresponding SQL query or stored procedured behind it. Using a stored procedure can simplify the process of retrieving data by allowing you to encapsulate complex logic into a single query. This can be especially useful when working with large, complex datasets or when you need to apply complex data transformations. A stored procedure can help improve performance by reducing the number of round trips between the database server and the Power BI service. By returning all the data in a single call, you can reduce the overhead associated with multiple calls. The model depicted below is based on a stored procedure that returns lab test results for the fact table, as well as "friendly" data columns (from MT dictionaries) for things like test names, location names, etc. It also includes a single dimension table, once again a standard date dimension.
However, there are also some potential downsides to using a stored procedure in Power BI. For example, if the stored procedure returns a large amount of data, it may take longer to load into Power BI and could result in performance issues. Additionally, if the stored procedure includes complex logic or data transformations, it may be more difficult to debug or troubleshoot issues.
A snowflake schema is similar to the star schema, but the dimension tables are further normalized into multiple related tables. This can provide more flexibility in some scenarios but may be more complex to manage. This approach is closer to building a full-fledged, relational datamart based on Data Repository data, but is also the most complex to manage and maintain.
Ultimately, the data modeling approach you choose will depend on the specific needs of your organization and your data sources. It's important to carefully consider your options and choose the approach that best meets your requirements.
-- Ian Proffer
Microsoft Certified Power BI Data Analyst Associate