Skip to content

Documenting current consensus on possible technical solutions

dF edited this page Nov 16, 2020 · 8 revisions

Preamble

John's and Miloš's diagrams serve as far points that need to be brought together before spec writing can start

The situation is really a triangle with Lemon<>Lex-0 zero being more fine grained, Miloš and Lemon being more prescriptive in allowed relationships -> better for industry interoperability

Particular issues

Subsensing

Multiword expressions

Should not be modelled as subsenses, should be its own headwords. Needs a cut between example sentences and idioms that constitute their own entries..

homographs

Should not be modelled as subsenses either, a referencing method at the headword/entry level should be developed. Possibly similar to Wikipedia disambiguation page..

Discussion

DMLex could be homograph agnostic, this information is addressed automatically via the identity/identification Need a cut between homonymy and polysemy..

differents POS

Business logic checks

checking on circularity of definitions (?), sub-sensing

checking on contradictions in the data model

Clarification on the role of link

Link exists at Entry level, has source and target selector, also an attribute for type of relationship

POS data model

Instead of free form text, values driven by an authority should be exchanged

lex:NMTOKEN

lex: will be the prefix reserved for LEXIDMA TC DMLex

The general structure of the POS information should be:

[authority]:NMTOKEN

possible authority prefix examples:

[universal part of speech tagging]

upos:

clarin:

lemon:

How to relate (inflectional) forms to POS?

Morphology object with possible value pairs

@POS="[authority]:NMTOKEN"

@formsLink="[formsURI]"

example: @relationshipType="imperative"