FHIR Bulk Data Access Tutorial

FHIR Bulk Data Access Tutorial

Contents

The FHIR Bulk Data Access specification, also known as Flat FHIR, gives healthcare organizations the ability to provide external clients with large amounts of data for a population of patients formatted as FHIR resources. This functionality is available starting in the August 2021 version of Epic.

The bulk data API is powerful, and because it uses the FHIR standard, you might find it easier to export and process data from bulk data if you are already familiar with that standard. However, there are important things to take into consideration before deciding to use bulk data. As always, carefully consider your use case to determine whether bulk data is the best solution.

Bulk data runs on an organization's operational database, so it is important to consider performance. Bulk data exports large amounts of data for large groups of patients, which takes more time to complete the larger the data set or patient population. Responses are not instantaneous, so use cases of bulk data should not rely on immediate responses. Later in this tutorial, we will cover how to set up requests to help minimize this wait time.

Also, bulk data requests are not incremental. The API collects all data for the requested patients and resources before it starts returning any data. You cannot retrieve any results until all results are ready.

As with any data exchange, form should follow function. While having all of this data formatted as FHIR resources might be exciting or sound easier to work with, the data won't be useful if bulk data's technical capabilities don't align with your use case. Consider your exchange paradigm and workflow needs first, then see if bulk data meets those requirements. If it does - great! If it doesn't, explore other options to interoperate with Epic on open.epic. Below are examples of use cases that fit bulk data and some that don't.

  • A one-time load of data in preparation for continuous data exchange using other methods
  • Monthly loads of a targeted set of data (for example, _type=Patient,Condition)
  • Weekly export of a dynamic group of patients (for example, all patients discharged in the last week with a certain diagnosis)
  • Weekly loads of small patient populations (less than one hundred), such as for registry submissions
  • Data synchronization with data warehouses or other databases
  • Periodic loads of large amounts of clinical data
  • Incremental data loads
  • Data exports for groups of over one thousand patients

Epic has several recommendations that can help you have the best experience with bulk data APIs.

  • Limit group sizes to around a thousand patients or fewer.
  • Use the _type parameter whenever possible to improve response time and minimize storage requirements.
  • For groups of under one hundred patients, check the request status every ten minutes, or use exponential backoff. For groups of over one hundred patients, check the status every thirty minutes, or use exponential backoff. More information on exponential backoff can be found in the bulk data export specification.
  • After retrieving all of the resource file content, use the FHIR Bulk Data Delete Request API to allow the server to clean up the data from your request.

The implementation guide for Bulk Data Access is on HL7's website. Starting in the August 2021 version, Epic supports the 1.0.1 version of the specification for R4 FHIR resources. We have also incorporated some features from the 1.1.0 version to provide additional functionality. Epic supports only the Group Export operation. We do not support _since or other bulk data operations at this time.

Now that you have determined that bulk data is an appropriate solution for your use case, let’s go through how to use the APIs. This diagram shows the typical flow of the bulk data APIs.

External clients must be authorized to use the bulk data APIs: Bulk Data Kick-off, Bulk Data Status Request, Bulk Data File Request, and Bulk Data Delete Request. Additionally, they must be authorized for the R4 search API for each resource they want to request, for example, AllergyIntolerance.Search (R4). The healthcare organization you're integrating with also needs to authorize your client to access the specific groups of patients. Work with that organization to enable a group that is appropriate for your use case.

This tutorial assumes you are passing a form of authorization covered in one of our authentication guides. It is generally assumed that in production environments FHIR APIs will use OAuth 2.0; however, it's important to note FHIR APIs do support HTTP Basic Authentication.

To begin, you need the base URL of the organization you want to integrate with, as previously described in the FHIR Tutorial. Let's say this is our base URL:

https://fhir.epic.com/interconnect-fhir-oauth/api/FHIR/R4

Bulk data uses the Group resource for the export operation. Contact the organization you are integrating with to discuss what group of patients to use for your integration and to get the FHIR ID for that group.

Use the Group FHIR ID to call the group export operation:

https://fhir.epic.com/interconnect-fhir-oauth/api/FHIR/R4/Group/eIscQb2HmqkT.aPxBKDR1mIj3721CpVk1suC7rlu3yX83/$export

The request above returns results for a default set of R4 resources in the ndjson format. The default set includes the resources in the U.S. Core Data for Interoperability (USCDI) data classes, resources from the patient compartment of the bulk data access specification, and additional supporting resources outside of the patient compartment. Starting in the November 2021 version of Epic (and in the August 2021 version with a special update), Provenance is also in the default set of resources.

Epic also supports the following parameters for the group export operation:

  • _type
  • includeAssociatedData
  • _typeFilter (starting in the November 2023 version of Epic)

The _type parameter accepts a comma-delimited list of FHIR resource types. When used, bulk data returns only the resource types specified in the parameter. Epic recommends using this parameter whenever possible because limiting the scope of the request to only the resources you need decreases both response times and the amount of data stored.

The _type parameter is also the only way to retrieve Binary resources. Binary files can become very large in this workflow, so Binary resources are not returned by default. Certain very large Binary files cannot be retrieved by bulk data operations at all. If a Binary file is not returned, you can still get the ID from the resource that it's associated with, for example, from the DocumentReference.content.attachment.url element, and perform a separate Binary.Read request for the content. Starting in the November 2021 version of Epic and in August 2021 by special update, an OperationOutcome with the resource type and FHIR ID is included in the response file.

For resources where Epic doesn't support a search by Patient ID (for example, Medication), a resource that has search by Patient ID and references the resource of interest (for example, MedicationRequest) should be included. For example, you could use the _type parameter to limit the request to include only Patient, MedicationRequest, and Medication resources as follows:

https://fhir.epic.com/interconnect-fhir-oauth/api/FHIR/R4/Group/eIscQb2HmqkT.aPxBKDR1mIj3721CpVk1suC7rlu3yX83/$export?_type=patient,medicationrequest,medication

The includeAssociatedData parameter can be set to "LatestProvenanceResources" to include the Provenance resource associated with each resource instance included in the bulk data files.

https://fhir.epic.com/interconnect-fhir-oauth/api/FHIR/R4/Group/eIscQb2HmqkT.aPxBKDR1mIj3721CpVk1suC7rlu3yX83/$export?includeAssociatedData=LatestProvenanceResources

The _typeFilter parameter accepts a comma-delimited list of FHIR resource search queries. This is used to filter the results of the bulk data export so you can retrieve only the data you need. To use _typeFilter for a specific resource type, that resource must be included in the _type parameter. You cannot use _typeFilter without also using _type. Query strings for multiple resources can be included in _typeFilter.

Query strings in the _typeFilter parameter use the same search parameters as the R4 search APIs, and are formatted in much the same way. The general format is “_typeFilter=Resource%3Fparameter%3Dvalue\,value%26parameter%3Dvalue.” The standard query string reserved characters (question mark, equals sign, and ampersand) must be URL encoded within _typeFilter and commas must be backslash escaped.

Here is an example of a simple query string in _typeFilter. This would limit the MedicationRequest resources returned in the bulk data export to those with a category of “inpatient.”

https://vendorservices.epic.com/interconnect-amcurprd-oauth/api/FHIR/R4/Group/eIscQb2HmqkT.aPxBKDR1mIj3721CpVk1suC7rlu3yX83/$export?_type=MedicationRequest&_typeFilter=MedicationRequest%3Fcategory%3Dinpatient

This is an example of a more complex use of _typeFilter. This would limit the Observation resources returned to vital signs from 2023 and laboratory results from 2022 onwards, and the Condition resources returned to active problems on the problem list.

https://vendorservices.epic.com/interconnect-amcurprd-oauth/api/FHIR/R4/Group/eIscQb2HmqkT.aPxBKDR1mIj3721CpVk1suC7rlu3yX83/$export?_type=Observation,Condition&_typeFilter=Observation%3Fcategory%3Dvital-signs%26date%3D2023,Observation%3Fcategory%3Dlaboratory%26date%3Dge2022,Condition%3Fcategory%3Dproblem-list-item%26clinical-status%3Factive

When you use _typeFilter, we recommend following these best practices:

  1. Avoid using multiple query strings that would have overlapping results. For example, including “Observation%3Fcategory%3Dlaboratory” and “Observation%3Fcategory%3Dlaboratory\,vital-signs” in the same request.
  2. When possible, simplify _typeFilter values into a single query string. For example, rather than “_typeFilter=MedicationRequest%3Fstatus%3Dactive,MedicationRequest%3Fstatus%3Don-hold” use “_typeFilter=MedicationRequest%3Fstatus%3Dactive\,on-hold”
  3. Test your query strings with the respective resources in a search request to ensure they return the results you expect before using the query strings in _typeFilter.

Note that the following are not supported by _typeFilter:

  • Searching by patient, subject, or _id
  • Query strings with search result parameters, such as _count, _include, or _revInclude
  • Query strings for the Patient resource
  • Query strings for resources that don't contain patient information
  • Applying the search query strings to resources included by reference

When you have compiled all components of the request URL, you can submit the GET request to kick-off the bulk data workflow. Note that by default, a client can request a specific group of patients only once in a twenty-four-hour period. If you need to request bulk data more frequently, work with the Epic organization you're integrating with to configure an appropriate request window.

Your request must include the following headers:

  • Accept: application/fhir+json
  • Prefer: respond-async

After kicking off the bulk data request, use the status API to track the progress of the request. Note that the same client and user that made the kickoff request must make the status request. In the response headers of the kick-off request, the value of the “Content-Location” header is the status URL for this bulk data request. Each bulk data request has a unique identifier used to request the status of a specific bulk data request.

https://fhir.epic.com/interconnect-fhir-oauth/api/FHIR/BulkRequest/B0F84FB8D37411EB92726C04221B350C

If the bulk data request has not finished processing, the response body is empty. An approximate measure of progress is available in the "X-Progress" response header. The value of this header is "Searched X of Y patients" where "Y" is the number of patients in the group and "X" is the number of patients processed so far. This is only an approximate measure of progress because there is additional processing required after all patients have been searched, so it's possible for the header to be set to "Searched 100 of 100 patients" with no response body returned.

Epic recommends pinging for the request status every ten minutes for groups with a hundred or fewer patients, every thirty minutes for groups over a hundred, or using exponential backoff as described in the bulk data export specification.

After the request is completed, the status API returns the URLs for the resource files. For more information on the structure of the response, reference the API details.

{
    "transactionTime": "2021-06-23T16:39:52Z",
    "request": "https://fhir.epic.com/interconnect-fhir-oauth/api/FHIR/R4/Group/eIscQb2HmqkT.aPxBKDR1mIj3721CpVk1suC7rlu3yX83/$export?_type=patient,encounter,condition",
    "requiresAccessToken": "true",
    "output": [
        {
            "type": "Patient",
            "url": "https://fhir.epic.com/interconnect-fhir-oauth/api/FHIR/BulkRequest/9ED3042CD44111EB84F2D2068206269D/e19upATM-PTKGZuHsy04IUQ3"
        },
        {
            "type": "Condition",
            "url": "https://fhir.epic.com/interconnect-fhir-oauth/api/FHIR/BulkRequest/9ED3042CD44111EB84F2D2068206269D/e8D8N1JwB3qCiHJkmeV.98w3"
        }
    ],
    "error": [
        {
            "type": "OperationOutcome",
            "url": "https://fhir.epic.com/interconnect-fhir-oauth/api/FHIR/BulkRequest/9ED3042CD44111EB84F2D2068206269D/eKbGgVw9pUn6xIiB8kZ756w3"
        }
    ]
}

Bulk data generates a separate file for each resource type. The "output" elements in the status API response lists each resource type and the corresponding file request URL. If a single resource has a very large number of results, the data is split into multiple files, all linked in the response. There is a maximum of three thousand resource instances per file, but the actual number of resource instances in a given file varies based on the size of the resource instances included in the file. To view a file, you'll use the file request API. The file URLs for this API are found in the file manifest from the status API. Note that the same client and user that made the kickoff request must make the file request.

https://fhir.epic.com/interconnect-fhir-oauth/api/FHIR/BulkRequest/9ED3042CD44111EB84F2D2068206269D/e19upATM-PTKGZuHsy04IUQ3

The format of the bulk data files is ndjson. The ndjson format is similar to JSON, but is newline-sensitive. Resource instances are included in the bulk data file through a search by Patient ID or through a reference in a previously gathered resource instance. The files do not differentiate by patient, so results for all patients are included in each resource file.

In addition to the resource files, if there are any request-level errors, the "error" element in the status API response includes the URL for the OperationOutcome file. This file includes only request-level errors. Bulk data does not return resource-level errors.

Error Code

Error Text

Notes

59130

"Some information was not or may not have been returned due to business rules, consent or privacy rules, or access permission constraints."

This error is logged when a resource is requested that the client is not authorized for. OperationOutcome.issue.diagnostics lists the resource.

59100

"Content invalid against the specification or a profile."

This error is logged when a parameter is included in the request that Epic doesn't support in bulk data. OperationOutcome.issue.diagnostics lists the parameter.

59136

"The resource or profile is not supported."

This error is logged when a resource is requested that Epic doesn't support in bulk data. OperationOutcome.issue.diagnostics lists the resource.

59176

"Error Converting FHIR Resource to ndjson. Resource: <resource type> FHIR ID: <resource FHIR ID>"

This error is logged when a resource instance cannot be included in the resource file because of its size. The resource instance can be requested separately using the resource's read interaction.

This error is logged starting in the November 2021 version of Epic and in August 2021 by special update.

If you no longer want to run a request after starting it, or no longer need the data stored, the bulk data delete API is available. This API uses the same URL as the status API, but uses the DELETE HTTP method rather than GET.

DELETE https://fhir.epic.com/interconnect-fhir-oauth/api/FHIR/BulkRequest/B0F84FB8D37411EB92726C04221B350C

If you delete a request before it is completed (the status API hasn’t returned any content), the request does not finish processing and is marked as deleted in the receiving system. The deleted request does not count towards your request per group per client per time period limit.

If you delete a request after it is completed (the status API returns file URLs), the request remains marked as completed in the system, but the data generated by the request is deleted. Deleting the generated data after you've retrieved it can help prevent excessive storage requirements for the server. In this case, the request does count towards the client's request limit. Epic recommends running the delete API after you finish retrieving the data files.