Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer use of resourceUuid index over resourceType when selecting from resourceentity table #2739

Merged
merged 9 commits into from
Dec 18, 2024

Conversation

LZRS
Copy link
Collaborator

@LZRS LZRS commented Nov 26, 2024

IMPORTANT: All PRs must be linked to an issue (except for extremely trivial and straightforward changes).

Fixes #[issue number]

Description
For queries that select from the resourceentity table and have filters to resourceentity.resourceUuid, prefer to use resourceentity.resourceUuid` over resourceentity.resourceType . Both are indexed but the ``resourceentity.resourceUuid is unique and is faster

Alternative(s) considered
Have you considered any alternatives? And if so, why have you chosen the approach in this PR?

Type
Choose one: (Bug fix | Feature | Documentation | Testing | Code health | Builds | Releases | Other)

Screenshots (if applicable)

Checklist

  • I have read and acknowledged the Code of conduct.
  • I have read the Contributing page.
  • I have signed the Google Individual CLA, or I am covered by my company's Corporate CLA.
  • I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.
  • I have run ./gradlew spotlessApply and ./gradlew spotlessCheck to check my code follows the style guide of this project.
  • I have run ./gradlew check and ./gradlew connectedCheck to test my changes locally.
  • I have built and run the demo app(s) to verify my change fixes the issue and/or does not break the demo app(s).

@LZRS LZRS changed the title Prioritize use of resourceUuid index over resourceType when selecting from resourceentity table Prefer use of resourceUuid index over resourceType when selecting from resourceentity table Nov 26, 2024
@LZRS LZRS force-pushed the remove-resourcetype branch from c043494 to b338fcd Compare November 26, 2024 17:57
dubdabasoduba and others added 3 commits November 29, 2024 11:28
Switching from resourceType multi-column index to resourceUuid, the order of results in include/revInclude
is no longer predictable since the resourceUuids are randomly generated, also saved in the db as blob and
hence ordered by byte representation of the resourceUuid
@LZRS LZRS marked this pull request as ready for review December 5, 2024 14:14
@LZRS LZRS requested a review from a team as a code owner December 5, 2024 14:14
@LZRS LZRS requested a review from aditya-07 December 5, 2024 14:14
Copy link
Collaborator

@jingtang10 jingtang10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you pls add some stats from testing.

@aditya-07 please can you add comments.

@LZRS
Copy link
Collaborator Author

LZRS commented Dec 16, 2024

Given

fhirEngine.search<Task> {
    filter(Task.IDENTIFIER, { value = of("routine_screening") })
    filter(Task.STATUS, { value = of("ready") })
}

Previously this would generate the query

SELECT a.resourceUuid, a.serializedResource
FROM ResourceEntity a
WHERE a.resourceType = 'Task'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'identifier'
                           AND index_value = 'routine_screening')
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'status'
                           AND index_value = 'ready')

with the query plan

QUERY PLAN
|--SEARCH a USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)
|--LIST SUBQUERY 1
|  `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?)
`--LIST SUBQUERY 2
   `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?)

Testing on db with 125206 resources, and 109654 Tasks, the query takes around

Run Time: real 2.247 user 0.104013 sys 0.346853

The changes in this PR, would be generating the query

SELECT a.resourceUuid, a.serializedResource
FROM ResourceEntity a
WHERE a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'identifier'
                           AND index_value = 'routine_screening')
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'status'
                           AND index_value = 'ready');

resulting in the query plan

QUERY PLAN
|--SEARCH a USING INDEX index_ResourceEntity_resourceUuid (resourceUuid=?)
|--LIST SUBQUERY 1
|  `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_resourceType_index_name_index_value_resourceUuid (resourceType=? AND index_name=? AND index_value=?)
`--LIST SUBQUERY 2
   `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_resourceType_index_name_index_value_resourceUuid (resourceType=? AND index_name=? AND index_value=?)

Testing on db with 125206 resources, and 109654 Tasks, the query takes around

Run Time: real 0.003 user 0.000442 sys 0.000643

Also Note: the db returned 0 rows since it didn't have any Task with identifier routine_screening

@LZRS
Copy link
Collaborator Author

LZRS commented Dec 16, 2024

For a database with 166293 resources and 137517 Encounters,

The query

SELECT a.resourceUuid, a.serializedResource
FROM ResourceEntity a
WHERE a.resourceType = 'Encounter'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Encounter'
                           AND index_name = '_id'
                           AND ((index_value = 'cf5f865f-602b-4c8c-9ece-acd4761f6eca' OR
                                 index_value = '16fa13cb-a977-44ab-9db5-ccc62a38a414') OR
                                (index_value = 'b228ff4d-ee37-4752-ba96-1d198adb9951' OR
                                 index_value = '89138401-c9b5-4b57-9871-edac67a6fbde')));

takes

Run Time: real 3.826 user 0.163887 sys 0.493112

while the query with preference to the resourceUuid index

SELECT a.resourceUuid, a.serializedResource
FROM ResourceEntity a
WHERE a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Encounter'
                           AND index_name = '_id'
                           AND ((index_value = 'cf5f865f-602b-4c8c-9ece-acd4761f6eca' OR
                                 index_value = '16fa13cb-a977-44ab-9db5-ccc62a38a414') OR
                                (index_value = 'b228ff4d-ee37-4752-ba96-1d198adb9951' OR
                                 index_value = '89138401-c9b5-4b57-9871-edac67a6fbde')));

takes

Run Time: real 0.004 user 0.000646 sys 0.000949

@LZRS LZRS requested a review from jingtang10 December 17, 2024 08:43
@aditya-07 aditya-07 merged commit 736578f into google:master Dec 18, 2024
6 checks passed
@MJ1998
Copy link
Collaborator

MJ1998 commented Jan 23, 2025

This change looks good from the performance metrics. But I am also surprised that the query earlier was taking 2-3 seconds. What are real, user, and sys times - difference between them ?
I am guessing that the db mostly has Task and Encounter resources, and hence the filter with resourceType does not help at all (infact degrades the performance).
So I am just curious to know the distribution of the resourceTypes in the database used to compare different query performances. Do you mind sharing that ?
I am also thinking that a composite index (resourcetype, resourceuuid) would be even better in this case along with the unique index (resourceUuid). WDYT ?
@LZRS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants