Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GNIP 93: Allowing changes to the dataset attributes table via REST API #10873

Open
1 of 5 tasks
mwallschlaeger opened this issue Apr 3, 2023 · 7 comments
Open
1 of 5 tasks

Comments

@mwallschlaeger
Copy link
Member

mwallschlaeger commented Apr 3, 2023

GNIP 93 - Allowing changes to the dataset attributes table via REST API

Overview

I guess this might be to small for a GNIP, but it might require a discussion of the PSC.

Currently making changes to the attribute table of a dataset is only possbile via the Advanced Metadata Editor form. To further restify GeoNode it would be nice have the possibility to change attributes via the REST API.

Proposed By

@mwallschlaeger : Marcel Wallschlaeger

Assigned to Release

This proposal is for GeoNode 4.1.

State

  • Under Discussion
  • In Progress
  • Completed
  • Rejected
  • Deferred

Motivation

The main motivation for us is requirement to upload data into our GeoNode instance using an external service only communicating to GeoNode through the REST API. Therefor to set or change attributes of a dataset this workflow requires the attribute manipulation via REST interface.

Proposal

Currently the Dataset REST endpoint has a read_only field for the attributes. In my opinion I would leave this construct as it is. and add another endpoint below the Dataset. Something like:

router.register(r"datasets/(?P<pk>\d+)/attributes",views.DatasetAttributesViewSet, "attributes")

Which handles the changes to the attributes of a dataset. This Endpoint would allow to change the "description", "attribute_label", "visable", "featureinfo_type, "display_order".

Backwards Compatibility

I dont think so.

Future evolution

For our specific usecase we further want to add fields to the Attributes models which might be interesting for the whole GeoNode. This would be "unit" and "keywords" for a column so in a later implementation users could find dataset including a specific type of attribute inside of a dataset. This would allow to make dashboards and stories to compare same units and keyword tags of different datasets more easy.

Feedback

Update this section with relevant feedbacks, if any.

Voting

Project Steering Committee:

  • Alessio Fabiani:
  • Francesco Bartoli:
  • Giovanni Allegri:
  • Simone Dalmasso:
  • Toni Schoenbuchner:
  • Florian Hoedt:

Links

@giohappy giohappy changed the title GNIP #93 - Allowing changes to the dataset attributes table via REST API GNIP - 93: Allowing changes to the dataset attributes table via REST API Jul 4, 2023
@giohappy giohappy changed the title GNIP - 93: Allowing changes to the dataset attributes table via REST API GNIP 93: Allowing changes to the dataset attributes table via REST API Jul 4, 2023
@gannebamm
Copy link
Contributor

I see the need for these capabilities. We are researching how an attribute value update would be possible via DRF and REST. @kilichenko-pixida will post some of his findings soon.

@kilichenko-pixida
Copy link
Contributor

I see the need for these capabilities. We are researching how an attribute value update would be possible via DRF and REST. @kilichenko-pixida will post some of his findings soon.

As mentioned, I am working on this in this issue. As of now, I am not adding a separate endpoint for the attributes, I think I found a way to update them through the PATCH api/v2/dataset/<dataset_id>.

I didn't figure out what kind of JSON payload might just work out of the box with the DRF serizlizers for this type of nested field, so instead of passing it as {"attribute_set": [{...}, ...]} (would cause at exception), it would have to be passed as {"data": {"attribute_set": [{...}, ...]}}}. This is later read and parsed in the update method of the DatasetSerializer class (overriding the default from its parent class) and attributes are going to be directly updated from there as well.

I am also preparing a geonodectl command which would allow to pass a JSON file like {"attribute_set": [{...}, ...]} to update the attributes.

@kilichenko-pixida
Copy link
Contributor

@gannebamm @ridoo @mwallschlaeger

Here is a summary of the relevant changes that already happened. I been working on this issue and it resulted in two already merged PRs: one to geonode to add the attrtibute processing ability and another to geonodectl to make patching from JSON files possible in geonodectl as mentioned in this issue. As per testing in the dev environment, patching of attributes is now possible, but current implementation might not be a good solution for a direct use of the API as described in this GNIP.

The reason is that currently the attribute_set values needs to be passed wrapped in a "data" field which allows to bypass internal DRF validation and access it from the update method within the DatasetSerializer class. E.g. {"data": {"attribute_set": [{...}, ...]}}}.

This "wrapping" is done automatically when patching attributes from the geonodectl, but that makes patching directly through the API not obvious (though now at least possible). Details on why it was done this way could be found in the discussion on the first PR to geonode - TLDR We don't know how to do it, but it might turn out to be trivial.

@mattiagiupponi
Copy link
Contributor

Hi @gannebamm @kilichenko-pixida @ridoo @mwallschlaeger

I was checking the issue and I have a couple of notes about the implementation.

attributes as a model

The field attributes serve a purpose and come from Geoserver. Therefore, we must handle them with care. For example, we retrieve the attribute name and type from Geoserver through a function:

def set_attributes_from_geoserver(layer, overwrite=False):
"""
Retrieve layer attribute names & types from Geoserver,
then store in GeoNode database using Attribute model
"""
attribute_map = []
if getattr(layer, "remote_service") and layer.remote_service:
server_url = layer.remote_service.service_url
if layer.remote_service.operations.get("GetCapabilities", None) and layer.remote_service.operations.get(
"GetCapabilities"
).get("methods"):
for _method in layer.remote_service.operations.get("GetCapabilities").get("methods"):
if _method.get("type", "").upper() == "GET":
server_url = _method.get("url", server_url)
break
else:
server_url = ogc_server_settings.LOCATION
if layer.subtype in ["tileStore", "remote"] and layer.remote_service.ptype == "gxp_arcrestsource":
dft_url = f"{server_url}{(layer.alternate or layer.typename)}?f=json"
try:
# The code below will fail if http_client cannot be imported
req, body = http_client.get(dft_url, user=_user)
body = json.loads(body)
attribute_map = [
[n["name"], _esri_types[n["type"]]] for n in body["fields"] if n.get("name") and n.get("type")
]
except Exception:
tb = traceback.format_exc()
logger.debug(tb)
attribute_map = []
elif layer.subtype in {"vector", "tileStore", "remote", "wmsStore", "vector_time"}:
typename = layer.alternate if layer.alternate else layer.typename
dft_url_path = re.sub(r"\/wms\/?$", "/", server_url)
dft_query = urlencode(
{"service": "wfs", "version": "1.0.0", "request": "DescribeFeatureType", "typename": typename}
)
dft_url = urljoin(dft_url_path, f"ows?{dft_query}")
try:
# The code below will fail if http_client cannot be imported or WFS not supported
req, body = http_client.get(dft_url, user=_user)
doc = dlxml.fromstring(body.encode())
xsd = "{http://www.w3.org/2001/XMLSchema}"
path = f".//{xsd}extension/{xsd}sequence/{xsd}element"
attribute_map = [
[n.attrib["name"], n.attrib["type"]]
for n in doc.findall(path)
if n.attrib.get("name") and n.attrib.get("type")
]
except Exception:
tb = traceback.format_exc()
logger.debug(tb)
attribute_map = []
# Try WMS instead
dft_url = (
server_url
+ "?"
+ urlencode(
{
"service": "wms",
"version": "1.0.0",
"request": "GetFeatureInfo",
"bbox": ",".join([str(x) for x in layer.bbox]),
"LAYERS": layer.alternate,
"QUERY_LAYERS": typename,
"feature_count": 1,
"width": 1,
"height": 1,
"srs": "EPSG:4326",
"info_format": "text/html",
"x": 1,
"y": 1,
}
)
)
try:
req, body = http_client.get(dft_url, user=_user)
soup = BeautifulSoup(body, features="lxml")
for field in soup.findAll("th"):
if field.string is None:
field_name = field.contents[0].string
else:
field_name = field.string
attribute_map.append([field_name, "xsd:string"])
except Exception:
tb = traceback.format_exc()
logger.debug(tb)
attribute_map = []
elif layer.subtype in ["raster"]:
typename = layer.alternate if layer.alternate else layer.typename
dc_url = f"{server_url}wcs?{urlencode({'service': 'wcs', 'version': '1.1.0', 'request': 'DescribeCoverage', 'identifiers': typename})}"
try:
req, body = http_client.get(dc_url, user=_user)
doc = dlxml.fromstring(body.encode())
wcs = "{http://www.opengis.net/wcs/1.1.1}"
path = f".//{wcs}Axis/{wcs}AvailableKeys/{wcs}Key"
attribute_map = [[n.text, "raster"] for n in doc.findall(path)]
except Exception:
tb = traceback.format_exc()
logger.debug(tb)
attribute_map = []
# Get attribute statistics & package for call to really_set_attributes()
attribute_stats = defaultdict(dict)
# Add new layer attributes if they don't already exist
for attribute in attribute_map:
field, ftype = attribute
if field is not None:
if Attribute.objects.filter(dataset=layer, attribute=field).exists():
continue
elif is_dataset_attribute_aggregable(layer.subtype, field, ftype):
logger.debug("Generating layer attribute statistics")
result = get_attribute_statistics(layer.alternate or layer.typename, field)
else:
result = None
attribute_stats[layer.name][field] = result
set_attributes(layer, attribute_map, overwrite=overwrite, attribute_stats=attribute_stats)

and later set with this function:

def set_attributes(layer, attribute_map, overwrite=False, attribute_stats=None):
"""*layer*: a geonode.layers.models.Dataset instance
*attribute_map*: a list of 2-lists specifying attribute names and types,
example: [ ['id', 'Integer'], ... ]
*overwrite*: replace existing attributes with new values if name/type matches.
*attribute_stats*: dictionary of return values from get_attribute_statistics(),
of the form to get values by referencing attribute_stats[<dataset_name>][<field_name>].
"""
# we need 3 more items; description, attribute_label, and display_order
attribute_map_dict = {
"field": 0,
"ftype": 1,
"description": 2,
"label": 3,
"display_order": 4,
}
for attribute in attribute_map:
if len(attribute) == 2:
attribute.extend((None, None, 0))
attributes = layer.attribute_set.all()
# Delete existing attributes if they no longer exist in an updated layer
for la in attributes:
lafound = False
for attribute in attribute_map:
field, ftype, description, label, display_order = attribute
if field == la.attribute:
lafound = True
# store description and attribute_label in attribute_map
attribute[attribute_map_dict["description"]] = la.description
attribute[attribute_map_dict["label"]] = la.attribute_label
attribute[attribute_map_dict["display_order"]] = la.display_order
if overwrite or not lafound:
logger.debug("Going to delete [%s] for [%s]", la.attribute, layer.name)
la.delete()
# Add new layer attributes if they doesn't exist already
if attribute_map:
iter = len(Attribute.objects.filter(dataset=layer)) + 1
for attribute in attribute_map:
field, ftype, description, label, display_order = attribute
if field:
_gs_attrs = Attribute.objects.filter(dataset=layer, attribute=field)
if _gs_attrs.count() == 1:
la = _gs_attrs.get()
else:
if _gs_attrs.exists():
_gs_attrs.delete()
la = Attribute.objects.create(dataset=layer, attribute=field)
la.visible = ftype.find("gml:") != 0
la.attribute_type = ftype
la.description = description
la.attribute_label = label
la.display_order = iter
iter += 1
if not attribute_stats or layer.name not in attribute_stats or field not in attribute_stats[layer.name]:
result = None
else:
result = attribute_stats[layer.name][field]
if result:
logger.debug("Generating layer attribute statistics")
la.count = result["Count"]
la.min = result["Min"]
la.max = result["Max"]
la.average = result["Average"]
la.median = result["Median"]
la.stddev = result["StandardDeviation"]
la.sum = result["Sum"]
la.unique_values = result["unique_values"]
la.last_stats_updated = datetime.datetime.now(timezone.get_current_timezone())
try:
la.save()
except Exception as e:
logger.exception(e)
else:
logger.debug("No attributes found")

in the end, they must always be coherent with Geoserver and the original dataset

Api implementation

Sometimes working with dynamic rest is a headache, I agree. The field always expects a list of IDs rather than a payload to update the values.

The reason is that currently, the attribute_set values need to be passed wrapped in a "data" field which allows us to bypass internal DRF validation and access it from the update method within the DatasetSerializer class. E.g. {"data": {"attribute_set": [{...}, ...]}}}.

This function seems like a workaround solution, which might not be the best approach in case we need to work with another field in the future.

We define a specific action and extend_schema in the viewset, allowing the user to add a schema to the API without creating a new URL.

For example, the extra_metadata:

@extend_schema(
methods=["get", "put", "delete", "post"], description="Get/Update/Delete/Add extra metadata for resource"
)
@action(
detail=True,
methods=["get", "put", "delete", "post"],
permission_classes=[IsOwnerOrAdmin, UserHasPerms(perms_dict={"default": {"POST": ["base.add_resourcebase"]}})],
url_path=r"extra_metadata", # noqa
url_name="extra-metadata",
)
def extra_metadata(self, request, pk, *args, **kwargs):
_obj = get_object_or_404(ResourceBase, pk=pk)
if request.method == "GET":
# get list of available metadata
queryset = _obj.metadata.all()
_filters = [{f"metadata__{key}": value} for key, value in request.query_params.items()]
if _filters:
queryset = queryset.filter(**_filters[0])
return Response(ExtraMetadataSerializer().to_representation(queryset))
if not request.method == "DELETE":
try:
extra_metadata = validate_extra_metadata(request.data, _obj)
except Exception as e:
return Response(status=500, data=e.args[0])
if request.method == "PUT":
"""
update specific metadata. The ID of the metadata is required to perform the update
[
{
"id": 1,
"name": "foo_name",
"slug": "foo_sug",
"help_text": "object",
"field_type": "int",
"value": "object",
"category": "object"
}
]
"""
for _m in extra_metadata:
_id = _m.pop("id")
ResourceBase.objects.filter(id=_obj.id).first().metadata.filter(id=_id).update(metadata=_m)
logger.info("metadata updated for the selected resource")
_obj.refresh_from_db()
return Response(ExtraMetadataSerializer().to_representation(_obj.metadata.all()))
elif request.method == "DELETE":
# delete single metadata
"""
Expect a payload with the IDs of the metadata that should be deleted. Payload be like:
[4, 3]
"""
ResourceBase.objects.filter(id=_obj.id).first().metadata.filter(id__in=request.data).delete()
_obj.refresh_from_db()
return Response(ExtraMetadataSerializer().to_representation(_obj.metadata.all()))
elif request.method == "POST":
# add new metadata
"""
[
{
"name": "foo_name",
"slug": "foo_sug",
"help_text": "object",
"field_type": "int",
"value": "object",
"category": "object"
}
]
"""
for _m in extra_metadata:
new_m = ExtraMetadata.objects.create(resource=_obj, metadata=_m)
new_m.save()
_obj.metadata.add(new_m)
_obj.refresh_from_db()
return Response(ExtraMetadataSerializer().to_representation(_obj.metadata.all()), status=201)
def _get_request_params(self, request, encode=False):
try:
return (
QueryDict(request.body, mutable=True, encoding="UTF-8")
if encode
else QueryDict(request.body, mutable=True)
)
except Exception as e:
"""
The request with the barer token access to the request.data during the token verification
so in this case if the request.body cannot not access, we just re-access to the
request.data to get the params needed
"""
logger.debug(e)
return request.data

In this function, all the code required to add, delete, update, and handle the extra_metadata attribute for the resource is defined. The corresponding endpoint will be /api/v2/resources/{pk}/extra_metadata.

I would suggest having a similar approach for handling the attribute. This will have several benefits:

  • It will limit the scope of the API to a specific usage.
  • It will keep the serializer simple by avoiding any additional logic related to a particular field update.
  • It will make maintenance easier.

@ridoo
Copy link
Contributor

ridoo commented Jan 25, 2024

@mattiagiupponi thanks for your feedback. I am getting more and more into the rest_framework stuff. There is so much "magic" underneath .. once you have cases which diverge from the "normal" (tm) ones, you'd need much deeper understanding of how all works.

In this specific case, I would be curious how you plan to do edits via the REST API when dropping the legacy metadata editor templates. If you plan to re-use the API v2 (which I assume you will) we should revise it with regard to such changes. To my current understanding, the API v2 is too limited to accept all necessary metadata changes of a dataset, right?

@mattiagiupponi
Copy link
Contributor

In this specific case, I would be curious how you plan to do edits via the REST API when dropping the legacy metadata editor templates. If you plan to re-use the API v2 (which I assume you will) we should revise it with regard to such changes. To my current understanding, the API v2 is too limited to accept all necessary metadata changes of a dataset, right?

We are still evaluating how change it, but for sure some API changes might be required

@giohappy
Copy link
Contributor

giohappy commented Jan 25, 2024

@ridoo we're speaking of dataset attributes, not metadata in general here, right?

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants