Nonclustered Indexes in Data Repository

When and how to add performance-boosting indexes to DR

TL/DR

Nonclustered indexes are optional structures that can dramatically improve query performance in Data Repository when applied to the right columns.

Best candidates are highly unique columns like VisitID, PatientID, and datetime fields used in WHERE clauses or joins
Check existing indexes before creating new ones, and maintain all indexes with regular defragmentation
Advanced techniques include adding included columns for Index Only Scans and filtered indexes for frequently queried status values

A practical guide to creating nonclustered indexes in Data Repository, including when they help, what to index, and advanced strategies.

Indexes in a SQL Server database are structures that speed up data access from tables and views. Two index types exist: clustered and nonclustered. Since each table permits only one clustered index via its primary key, developers and analysts leverage nonclustered indexes as optional performance enhancers.

When Nonclustered Indexes Help

Nonclustered indexes excel when column values show high uniqueness. AdmissionDateTime offers over 31.5 million possible values within a single year, making it an excellent candidate. Conversely, Gender fields with merely three values (M, F, U) demonstrate poor index candidates, as adding an index here would provide very little benefit.

Index Considerations

Creating indexes modifies table design. Two critical practices to keep in mind:

Regular Maintenance: Tables (and both clustered and nonclustered indexes) need to be maintained (defragmented) on a regular basis.
Avoid Redundancy: Check existing indexes before creation to prevent duplication, as other vendors or IT staff may also create indexes.

When Do You Need a Nonclustered Index?

The fundamental answer: slower-than-acceptable data retrieval. Consider:

User Impact: Reports rendering slowly create problems; overnight compilations may not.
Dataset Size: Two-minute wait times vary in acceptability based on query scope.
Execution Plans: Use Management Studio's graphical analysis to identify problematic table scans.
Efficiency Purpose: Creating indexes preemptively requires careful evaluation.

Nonclustered Index Recommendations

Five primary guidelines:

Primary Key Inclusion: Primary key columns automatically appear in nonclustered indexes, so exclude them from new index specifications or Include columns.
VisitID Joins: Whenever you join on VisitID, where VisitID is not part of the primary key, create a nonclustered index on this column. Examples include AbstractData, BarVisits, LabSpecimens, MicSpecimens, PhaRx, OeOrders/OmOrd_Main, and SchAppointments/CwsAppt_Main.
PatientID Joins: Similarly, index PatientID when it's not a primary key component, though this pattern is less common.
Datetime Columns in WHERE Clauses: Columns like AdmissionDateTime, ServiceDateTime, DischargeDateTime, CollectionDateTime, and AdministrationDateTime warrant indexing when used as stored procedure parameters or hard-coded filters.
Highly Discrete Non-Datetime Columns: Fields with numerous distinct values in WHERE clauses merit testing, such as AdmVisits.Status.

Advanced Index Strategy

Complex scenarios require specialized techniques.

Included Columns: Nonclustered indexes support optional non-key columns as "included" values. An index with included non-key columns can significantly improve query performance because the query optimizer can locate all the column values within the index. This enables "Index Only Scan" operations avoiding table access.

Filtered Indexes: These boost performance in very specific circumstances where a value is used in a predicate frequently, but that value is only a small amount of the total values for that table. For example, with AdmVisitEvents or RegEvent_Events, by adding a filter on the Status column where it only matches a specific value, we limit the scope of the index, making it more efficient.

Summary

There is no one size fits all solution for these advanced index examples. Balance improved query performance and the overhead and maintenance indexes require. Dramatic runtime improvements, say from five minutes to 30 seconds, justify index creation. Marginal gains requiring extensive testing may not warrant implementation overhead.

Healthcare AI Data Foundation

Nonclustered Indexes in Data Repository

TL/DR

A practical guide to creating nonclustered indexes in Data Repository, including when they help, what to index, and advanced strategies.

When Nonclustered Indexes Help

Index Considerations

When Do You Need a Nonclustered Index?

Nonclustered Index Recommendations

Advanced Index Strategy

Summary

Healthcare AI Data Foundation

Other Articles

Healthcare AI Data Foundation

2027 IPF PPS Final Rule

2027 IPPS Final Rule

2027 OPPS Proposed Rule

Optimize Your MIPS Cost Category Score

Understanding the MIPS Complex Patient Bonus