You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To attempt to keep things more clearly modular, I am moving the prototype code that generates a Hugo static site from the W3ACT data from ukwa-manage and into here. The current implementation has been ported over (to be checked in), but there are also some gaps.
Collections and sub-collections: Only top-level collections are marked as published or not. Collections really need separating so that top-level and sub-collections are handled properly.
Similarly, Targets need separating into Archived Web Sites and Archived Web Pages, i.e. when a Target is a specific resource within another Target, it should be 'demoted' to being a Archived Web Page that belongs to a Collection.
The infrastructure for cross-referencing these things also needs consideration.
Ideally, this process should facilitate moving to mastering the data in GitHub/NetlifyCMS.
Support wct_at_oid for Collections and wct_id for Targets, as alias at least.
For cross-references, Hugo provides quite sophisticated functionality for Related Content that allow us to perform these lookups. The IDs have to be coerced to strings but it works fine. The main 'gotcha' was the default threshold (80) was too high and even exect matches didn't get picked out. Not clear how the scoring works! The downside is that we have to use the same threshold for everything, so if we attempted to use the Related Content feature for other purposes than direct references, we might get too many (poor) matches that need to be cut down.
Well, b40406f implements the basic proof-of-concept for ukwa-site.
Not clear how best to handle identifiers. We have a LOT of records (19,516 pages, mostly host-level Targets) and looking records up is easier if the main ID is also the filename (we can support things like WCT-IDs via aliases).
Totally opaque filenames are very cumbersome to work with manually, but semantic names can be brittle over time. However, the primary URL for a record should be stable in general, so we could arrange the web site records by e.g. host (no www) or domain and creation date:
targets/gov.uk-2019-04-12.md
or perhaps just host and a version number (if needed):
targets/gov.uk-1.md
Using the host like this would help users find records. I am planning to collapse Target records down to hosts, so there would normally be just one file per host. Individual highlighted URLs within a Collection would be handled separately (although this also needs some more thought).
Note that NetlifyCMS also currently doesn't support content in sub-folders, but may do soon. Keeping everything in one folder is likely not very performant, but may be acceptable.
To attempt to keep things more clearly modular, I am moving the prototype code that generates a Hugo static site from the W3ACT data from
ukwa-manage
and into here. The current implementation has been ported over (to be checked in), but there are also some gaps.The infrastructure for cross-referencing these things also needs consideration.
Ideally, this process should facilitate moving to mastering the data in GitHub/NetlifyCMS.
wct_at_oid
for Collections andwct_id
for Targets, as alias at least.The text was updated successfully, but these errors were encountered: