Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage JPQL search expressions in the configuration #27

Open
RKrahl opened this issue Jan 23, 2023 · 1 comment
Open

Leverage JPQL search expressions in the configuration #27

RKrahl opened this issue Jan 23, 2023 · 1 comment
Labels
idea Idea that might need some more reflection before considering it a regular feature request
Milestone

Comments

@RKrahl
Copy link
Member

RKrahl commented Jan 23, 2023

The current configuration is too complicated and inefficient. It might be simplified if we could leverage JPQL search expressions in the config files.

Due to the design of icat.oaipmh, we need to configure, which properties from which objects in ICAT to consider for an object to be disseminated over OAI-PMH in a first step. From this, an internal XML representation of these objects is created. In a second step, this internal representation is transformed using XSLT.

Only to compile all the ICAT entity objects needed for the metadata of a data publication, the following configuration lines are needed:

# Identifiers for the configuration of metadata to be retrieved from ICAT
data.configurations = datapub

# Relevant data objects and properties for each data configuration
data.datapub.mainObject = DataPublication

data.datapub.stringProperties = pid title description subject
data.datapub.numericProperties = id
data.datapub.dateProperties = publicationDate
data.datapub.subPropertyLists = users dates relatedItems fundingReferences content

data.datapub.users.stringProperties = orderKey fullName givenName familyName contributorType email
data.datapub.users.subPropertyLists = user affiliations
data.datapub.users.user.stringProperties = orcidId
data.datapub.users.affiliations.stringProperties = name pid fullReference

data.datapub.dates.stringProperties = dateType date

data.datapub.relatedItems.stringProperties = identifier relationType fullReference relatedItemType title

data.datapub.fundingReferences.subPropertyLists = funding
data.datapub.fundingReferences.funding.stringProperties = funderIdentifier funderName awardNumber awardTitle

data.datapub.content.subPropertyLists = dataCollectionDatasets
data.datapub.content.dataCollectionDatasets.subPropertyLists = dataset
data.datapub.content.dataCollectionDatasets.dataset.numericProperties = fileSize
data.datapub.content.dataCollectionDatasets.dataset.subPropertyLists = datafiles
data.datapub.content.dataCollectionDatasets.dataset.datafiles.subPropertyLists = datafileFormat
data.datapub.content.dataCollectionDatasets.dataset.datafiles.datafileFormat.stringProperties = type

This seems to be too clumsy.

Roughly the same could be achieved with a single JPQL search expression:

SELECT dp FROM DataPublication dp INCLUDE dp.content AS dc, dc.dataCollectionDatafiles AS dcdf, dcdf.datafile AS df1, df1.datafileFormat, dc.dataCollectionDatasets AS dcds, dcds.dataset AS ds, ds.datafiles AS df2, df2.datafileFormat, dp.dates, dp.fundingReferences AS dpfun, dpfun.funding, dp.relatedItems, dp.users AS dpu, dpu.affiliations, dpu.user

Furthermore, the internal XML representation roughly corresponds one to one to the ICAT schema. This means that if we want to include the experimental techniques being used in an investigation, we need to include all datasets from that investigation in the internal representation, which might look something like:

<metadata>
  <datasets>
    <instance>
      <datasetTechniques>
	<instance>
	  <technique>
	    <name>neutron diffraction</name>
	    <pid>PaNET:PaNET01217</pid>
	  </technique>
	</instance>
      </datasetTechniques>
    </instance>
    <instance>
      <!-- ... -->
    </instance>
    <instance>
      <!-- ... -->
    </instance>
    <!-- ... -->
  </datasets>
  <!-- ... -->
</metadata>

Note that there may be hundreds of datasets in one investigation. Often they all have the same technique, but that is not guaranteed. The distinct techniques must then be extracted from that using XSLT, which is also somewhat involved.

In princlple, we could select the list of distinct techniques related to an investigation using one simple JPQL search statement like:

SELECT DISTINCT(t) FROM Technique t JOIN t.datasetTechniques AS dst JOIN dst.dataset AS ds JOIN ds.investigation AS i WHERE i.id = %d

(where the %d would need to be substituted with the internal id of that investigation.)

So if we could compile the internal XML representation by a couple JPQL searches configured in the config file, things might become significantly simpler.

@RKrahl RKrahl added the idea Idea that might need some more reflection before considering it a regular feature request label Jan 23, 2023
@RKrahl
Copy link
Member Author

RKrahl commented Feb 13, 2023

I just noticed yet another benefit of this approach: at the moment, it is not possible to disseminate only a subset of the objects for a data configuration. In the run.properties one must specify for each data configuration the ICAT object which will be the main source of information when retrieving metadata from ICAT. The icat.oaipmh component will then unconditionally disseminate all objects of that type. There is no way to put a condition to filter the objects.

@RKrahl RKrahl added this to the 3.0.0 milestone Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Idea that might need some more reflection before considering it a regular feature request
Projects
None yet
Development

No branches or pull requests

1 participant