Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accession field in drs uri overrides port; breaks uri validation #344

Open
bwalsh opened this issue Feb 23, 2021 · 2 comments
Open

accession field in drs uri overrides port; breaks uri validation #344

bwalsh opened this issue Feb 23, 2021 · 2 comments
Labels
Compliance Issue contains something that could be covered by a test in compliance suite

Comments

@bwalsh
Copy link
Member

bwalsh commented Feb 23, 2021

All: I may be missing something, but assuming we want the drs identifier to be a uri, do we need to re-think the over-ride of the port field?

As it stands, I don't think it is plug compatible with [http, ftp, s3, gs,....]. This is problematic for typed systems that validate url properties (ie fhir)

From: Alexander Vantol <[email protected]>
Sent: Friday, February 19, 2021 6:28:17 PM
To: Alessandro Culotti; Brian Walsh; Sean Burke; Fan Wang; Kyle Ellrott; Brian O'Connor
Subject: [EXTERNAL] Re: AnVIL: Invalid DRS urls stored in gen3.theanvil.io
 
Hey all,

Didn't catch up on the whole chain but want to follow up on DRS URI mentioned, as the format provided is valid per our understanding of the GA4GH DRS specification: https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.1.0/docs/#_compact_identifier_based_drs_uris

drs://[provider_code/]namespace:accession ... Both the provider_code and namespace disallow spaces or punctuation, only lowercase alphanumerical characters, underscores and dots are allowed (e.g. [A-Za-z0-9._]).

We worked pretty closely with GA4GH DRS group (and Broad and other collaborators) to ensure that our "prefix" feature of GUIDs was within the bounds of a valid DRS URI for all our commons in the NIH Interoperability context. Note that the accession is also allowed to have alphanumeric and dots and slashes (though there's some nuance around percent-encoding when hitting a GET endpoint in DRS). 

Thanks,
Alex


On Fri, Feb 19, 2021 at 12:14 PM Alessandro Culotti <[email protected]> wrote:
FYI

---------- Forwarded message ---------
From: Brian Walsh <[email protected]>
Date: Wed, Feb 17, 2021 at 7:18 PM
Subject: AnVIL: Invalid DRS urls stored in gen3.theanvil.io
To: Alessandro Culotti <[email protected]>
Cc: Sean Burke <[email protected]>, Fan Wang <[email protected]>, Kyle Ellrott <[email protected]>, [email protected] <[email protected]>


Alessandro:

 

There appears to be invalid urls stored in gen3.anvil.

 

Specifically,  given the url "drs://dg.ANV0:dg.ANV0/1c772afd-7404-47fc-8636-92d49e314f00", the `port` is non-numeric.

This causes many validators/parsers to fail.  See RFC 3986 below.

 

I believe simply eliminating the “:bg.ANV0” port suffix might be the easiest fix.

@dglazer
Copy link
Member

dglazer commented Aug 9, 2021

I believe part of what's happening here is mixing up URI's and URL's. Strings like "drs://..." are URIs, and are not meant to be parsed or resolved as if they were URLs. So "There appears to be invalid urls stored in gen3.anvil" can't be right, since those strings aren't URLs.

@ianfore
Copy link

ianfore commented Aug 10, 2021

Yes the distinction between URI, URL and URN often get mixed up, and it is indeed a URI.

That doesn't deal with the issue though. I would restate it as the following being an invalid DRS URI.
drs://dg.ANV0:dg.ANV0/1c772afd-7404-47fc-8636-92d49e314f00

drs://dg.ANV0:1c772afd-7404-47fc-8636-92d49e314f00 should be sufficient.

This issue is really a duplicate of #340.

@ianfore ianfore added the Compliance Issue contains something that could be covered by a test in compliance suite label Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compliance Issue contains something that could be covered by a test in compliance suite
Projects
None yet
Development

No branches or pull requests

3 participants