As report writers and analysts, we sometimes have to ask: how do I go about optimizing data retrieval from the MEDITECH Data Repository? When is it appropriate to construct a data mart and what should I consider? Before we explore the answer, let's first, define a few terms (thanks to Technopedia and Wikipedia):
Data warehouse: a system used for reporting and data analysis; also a collection of corporate information and data derived from operational systems and external data sources
Data repository: a destination designated for partitioned data storage, often for specific types of data
Data mart: a subset of the data warehouse that is usually oriented to a specific business line or team; data marts are small slices of the data warehouse.
MEDITECH’s Data Repository, in one sense, is like a traditional data warehouse, especially if you consider all of discrete applications (ABS, BAR, etc.) separately. Here we have one (sometimes two) databases in which all the data from our EHR live.
Very often I’m asked about the most efficient way to retrieve data when using the DR: Does it make sense to run reports against what at many hospitals is now considered a “production” system? My first answer is “yes,” since the DR is purpose built for reporting. After all, you’re not slowing down MEDITECH when you’re running reports out of DR. That being said, sometimes you have complex reports with large amounts of data, or multiple subsets of data assembled in a series of steps, either one of which can take a long time to produce, depending on the overall size of your DR and other reports or applications running against it. In cases like these, sometimes building your own data mart is a good solution for organizing and presenting reports tailor made for a specific need.
A good example is a report we’ve done for a number of clients: a Pharmacy Downtime Medication Administration Record report, or Downtime eMAR. Like the standard MAR report in MEDITECH, this report uses data for all current inpatients, including their current medications, recent and scheduled meds, as well as other details about the patient like allergies, demographics, etc. There are several reasons this report lends itself to being produced out of a data mart, rather than directly from a live database:
- Frequency – since the data needs to be as close to real time as possible, it’s produced on either a 15 or 30 minute cycle, depending on the overall hospital size, DR database size and other performance-affecting considerations.
- Data complexity – while the report is of course patient-based, the underlying level of detail per patient quickly multiplies: per prescription status, prescription, drug administration, scheduled administration, and potentially any further detail each hospital chooses to document.
- Presentation method – the report itself is presented in Adobe .pdf format, with one document produced per inpatient location. We use SQL Server Reporting Services tools to design and publish the report, and depending on each client’s preferences, can present anywhere from 5-12 sub-reports associated with each record. The act of physically rendering and saving each file can take several minutes, depending on the number of locations and number of patients. At large facilities, it’s not unusual to have hundreds of pages, cumulatively, across all locations and patients.
When thinking about using a data mart, keep in mind it’s really a two-step process: getting data in, then getting data out. More formally put, step 1 is about populating or refreshing your data mart, and step 2 is about data retrieval for presentation.
Whether you produce reports directly from a live database, or from your own data mart, step 1 is basically the same in each case: you need to have efficient and accurate T-SQL code (ideally a stored procedure) that produces the right information. If you use a data mart, this step will differ in that instead of simply displaying the data, you’ll insert it into the table or tables that define your data mart.
And so here’s the first thing to decide if you want to use a data mart: can you easily support and maintain your own table schema? Do your data requirements change frequently, or remain fairly constant? If your users make frequent revisions, for example, this will mean adding or changing columns, or possibly new tables for your data mart. If, on the other hand, the dataset definition remains constant, you’ll have little or no maintenance.
Efficient data retrieval (step 2) is where data marts really shine. While the performance of step 1 (initially querying the DR and retrieving your dataset) will be the same whether or not you use a data mart, once you have the data you need in your own table or tables, you can hand it off to your presentation application very quickly.
Using our Downtime eMAR as an example, our data mart (recall that it’s just a subset of PHA data, based on current inpatients) is built to optimize the design of the Report Services (SSRS) report. Each category of data gets its own table: patient prescriptions and medications, med administrations, patient allergies, scheduled administrations for the next 24 hours, etc. By creating a table for each category, when the SSRS report renders, it can retrieve what it needs for each sub-report very quickly. This approach is further aided by being able to index our tables as we see fit for fast data retrieval.
For very large sites, when the Downtime eMAR reports are produced, they may take up to 10-15 minutes for a complete set. Since the report uses our data mart tables to produce the data, rather than the DR Pharmacy tables directly, we’re not causing contention or locking on them.
The decision to use a data mart or not depends on a variety of factors. But in certain cases, the performance gains that can be had in producing the report can often outweigh the extra maintenance of your own data mart schema and logic.