List of additional Joycean compounds #51

droher · 2019-09-28T21:22:32Z

I wrote a hacky algorithm to find likely Joycean compounds. It excludes any words already tagged as compounds in the XML, as well as any words inside of a foreign language tag. There are plenty of false positives, but it does a pretty good job at sending likely ones to the top of the list:
compound_guesses.txt

I'd be happy to put up a PR to add a bunch of these to the XML, but I wanted to check before I did to see if you'd be interested/if that was the best way to go about it. Thanks!

JonathanReeve · 2019-09-29T04:59:41Z

That's great! This is really cool. How do you imagine encoding it? A quick guess of mine might be:

leave a note in the teiHeader somewhere about how you did the inference of the word components
abuse the <reg> tag to show the regularized form (i.e. two words reconstructed from their compound
wrap it in <choice> to show multiple possibilities

Something like this:

<distinct type="nonstandard-compound">
buttocksmothered
<choice>
  <reg>buttocks mothered</reg>
  <reg>buttock smothered</reg>
</choice>
</distinct>

droher · 2019-09-29T12:36:43Z

This is the first TEI doc I've ever used, so I would defer to you on the right encoding. If we go the reg route above, does that mean that the text inside of those tags would be picked up as text properties by XML parsers? Would there be a way to include them as attributes of an element instead?

Also I think the example you chose is one of the very few that would have two different interpretations (and even there, the second one is the clear primary meaning), but it's not any extra work to add multiple sensible choices where they exist.

A few more interpretive questions:

How would you recommend handling cases that are ambiguous as to whether they're compounds or joycean blends, e.g. musemathics?
What about cases that are unusual affixes, e.g. remarkablest or incoordinately? Are these best thought of as compounds?
I saw your earlier discussion on when to use standard vs nonstandard compound tagging -- I don't have access to the OED, but I could use the same logic with Wiktionary if that worked for you.

JonathanReeve · 2019-09-29T16:20:59Z

So far, we've just maintained a list of these tags that shouldn't be rendered as text during a transformation, but we could make that more explicit in the markup by adding a property like rend="none", which I might put in <choice>.

There's a way to indicate which of two choices is the primary one, maybe using certainty, but unless you're feeling extra ambitious I wouldn't worry about this for now.

For lack of a more specific term ("nonstandard adverbial construction"?), type="Joycean" sounds about right to me for cases like incoordinately. Also for musemathics, for different reasons. Wiktionary would work here. @sk3853, do you have any ideas about how to handle these, based on your experience with categorizing distinct words?

Looking forward to seeing the PR.

droher · 2019-09-30T10:51:10Z

Great, aiming to get the PR up this coming weekend.

The concern I have around both the tag list and the rend=None options is that they're not as obvious to users (like me) who are expecting the text properties of the XML to just contain the text of Ulysses. If there are already examples like this in the XML, then adding one more wouldn't be a problem - maybe the solution is just calling those non-rendering tags out more explicitly in the doc?

sk3853 · 2019-10-01T06:22:29Z

Hi guys- with regards to Joyceans: I'm curious as to what variables your algorithm included to distinguish Joyean compounds from nonstandard-compounds, which was my biggest difficulty when going through this manually.
I think that it would be most efficient and consistent to tag these words as example, if you're confident that he coined the terms. Once I finish up the rest of the project I'm going to run through it again and confirm that my Joycean words aren't just nonstandard compounds and vice-versa. Perhaps it would be worthwhile to consider changing those tags so there isn't as much overlap.
I don't think it would be a bad idea to include multiple interpretations of words like "buttsmothered," but there are so many word choices to interpret that I think that kind of project would make more sense further down the road.

droher · 2019-10-01T11:09:01Z

Hi @sk3853, could you give an example of "Joyean compounds from nonstandard-compounds"? I thought the distinction was between standard compounds (word exists with a hyphen in the OED) and nonstandard.

The algorithm isn't doing any distinguishing like that yet. It first finds the set of words in Ulysses that are not in a list of English words, and within words, finds instances where two substring pairs are in the list. Then I sort the list by the geometric mean of the lengths of the original word and each word of the substring pair.

Before I put the PR up, I'm going to cross-reference the list against Wiktionary to distinguish between standard and non-standard compounds, and also go through each word manually to weed out false positives.

JonathanReeve · 2019-10-01T15:31:12Z

If there are already examples like this in the XML, then adding one more wouldn't be a problem - maybe the solution is just calling those non-rendering tags out more explicitly in the doc?

Good idea. There are already quite a few of these styles of tags, and they render all kinds of artifacts, like latitudes and longitudes for <place> tags. In some cases, I have XSLT that hides them, but there really should be a list of these somewhere, or some other kind of logic to hide them.

sk3853 · 2019-10-07T15:47:04Z

Hi- Sorry, when I wrote that I meant that I had difficulties distinguishing Joycean compound words (buttsmothered) (distinct type= “Joycean”) from words that are maybe better categorized as nonstandard compounds (newsboards)(distinct type= “nonstandard-compound”). That wouldn't be a Joycean, since it's a basic combination of two words, but it's also not in the OED as news-boards. The compound category was an easy one to figure out since the OED came up with the hyphenated word as a suggestion whenever I entered one that was nonhyphenated in Joyce. Your additions are welcome! Just know that the 4 distinct types we have going are -compound -nonstandard-compound -Joycean -archaism. Sorry for the slow replies, my schedule has been crazy recently. El El mar, oct. 1, 2019 a la(s) 7:09 a. m., David Roher < [email protected]> escribió:

…

Hi @sk3853 <https://github.com/sk3853>, could you give an example of "Joyean compounds from nonstandard-compounds"? I thought the distinction was between standard compounds (word exists with a hyphen in the OED) and nonstandard. The algorithm isn't doing any distinguishing like that yet. It first finds the set of words in Ulysses that are not in a list of English words, and within words, finds instances where two substring pairs *are* in the list. Then I sort the list by the geometric mean of the lengths of the original word and each word of the substring pair. Before I put the PR up, I'm going to cross-reference the list against Wiktionary to distinguish between standard and non-standard compounds, but I'm going to go through each word manually to weed out false positives. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#51?email_source=notifications&email_token=AFLPKBLMFFMLBDRYX7PVFW3QMMVU5A5CNFSM4I3PWT6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAA4M2Q#issuecomment-536987242>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFLPKBNHLGRMOQ7EBJNADXDQMMVU5ANCNFSM4I3PWT6A> .

workshub · 2021-10-22T10:12:38Z

A user started working on this issue via WorksHub.

workshub · 2021-11-26T14:18:37Z

@imrobintomar started working on this issue via WorksHub.

JonathanReeve · 2021-11-27T19:38:50Z

Hi, @imrobintomar! Glad to see that you've started work on this issue. Let me know if you have any questions along the way!

workshub · 2021-12-07T00:49:06Z

A user started working on this issue via WorksHub.

workshub · 2021-12-20T11:54:41Z

A user started working on this issue via WorksHub.

JonathanReeve · 2021-12-20T23:08:36Z

@imrobintomar, could you say what you had in mind for this issue? And do you have any questions?

workshub · 2022-01-30T20:53:25Z

A user started working on this issue via WorksHub.

workshub · 2022-02-02T21:35:22Z

@Avrnikh-iziki started working on this issue via WorksHub.

JonathanReeve · 2022-02-02T22:14:11Z

@imrobintomar, have you started work on this issue? I don't see anything in your GitHub account about this yet. Please let me know ASAP.

JonathanReeve · 2022-02-02T22:14:38Z

Hi @Avrnikh-iziki! Could you tell me what you had in mind for this issue?

workshub · 2022-02-04T18:33:37Z

A user started working on this issue via WorksHub.

workshub · 2022-02-24T07:13:04Z

A user started working on this issue via WorksHub.

workshub · 2022-03-02T16:36:07Z

A user started working on this issue via WorksHub.

workshub · 2022-03-03T10:02:07Z

A user started working on this issue via WorksHub.

workshub · 2022-03-23T03:36:58Z

A user started working on this issue via WorksHub.

workshub · 2022-03-30T00:06:15Z

A user started working on this issue via WorksHub.

workshub · 2022-04-02T19:45:59Z

A user started working on this issue via WorksHub.

workshub · 2022-04-04T06:28:06Z

A user started working on this issue via WorksHub.

workshub · 2022-06-21T03:31:51Z

A user started working on this issue via WorksHub.

workshub · 2022-07-19T07:01:47Z

A user started working on this issue via WorksHub.

workshub · 2022-08-12T01:54:47Z

A user started working on this issue via WorksHub.

workshub · 2022-10-28T11:09:20Z

@Brucedevnairobi started working on this issue via WorksHub.

workshub · 2022-12-24T06:03:59Z

A user started working on this issue via WorksHub.

workshub · 2023-03-31T13:58:47Z

@draconid719 started working on this issue via WorksHub.

workshub · 2023-05-10T16:14:51Z

@Natalia-Mikhieieva started working on this issue via WorksHub.

JonathanReeve · 2023-05-11T02:50:52Z

@Natalia-Mikhieieva, what did you have in mind for this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List of additional Joycean compounds #51

List of additional Joycean compounds #51

droher commented Sep 28, 2019

JonathanReeve commented Sep 29, 2019

droher commented Sep 29, 2019

JonathanReeve commented Sep 29, 2019

droher commented Sep 30, 2019

sk3853 commented Oct 1, 2019

droher commented Oct 1, 2019 •

edited

Loading

JonathanReeve commented Oct 1, 2019

sk3853 commented Oct 7, 2019 via email

workshub bot commented Oct 22, 2021

workshub bot commented Nov 26, 2021

JonathanReeve commented Nov 27, 2021

workshub bot commented Dec 7, 2021

workshub bot commented Dec 20, 2021

JonathanReeve commented Dec 20, 2021

workshub bot commented Jan 30, 2022

workshub bot commented Feb 2, 2022

JonathanReeve commented Feb 2, 2022

JonathanReeve commented Feb 2, 2022

workshub bot commented Feb 4, 2022

workshub bot commented Feb 24, 2022

workshub bot commented Mar 2, 2022

workshub bot commented Mar 3, 2022

workshub bot commented Mar 23, 2022

workshub bot commented Mar 30, 2022

workshub bot commented Apr 2, 2022

workshub bot commented Apr 4, 2022

workshub bot commented Jun 21, 2022

workshub bot commented Jul 19, 2022

workshub bot commented Aug 12, 2022

workshub bot commented Oct 28, 2022

workshub bot commented Dec 24, 2022

workshub bot commented Mar 31, 2023

workshub bot commented May 10, 2023

JonathanReeve commented May 11, 2023

List of additional Joycean compounds #51

List of additional Joycean compounds #51

Comments

droher commented Sep 28, 2019

JonathanReeve commented Sep 29, 2019

droher commented Sep 29, 2019

JonathanReeve commented Sep 29, 2019

droher commented Sep 30, 2019

sk3853 commented Oct 1, 2019

droher commented Oct 1, 2019 • edited Loading

JonathanReeve commented Oct 1, 2019

sk3853 commented Oct 7, 2019 via email

workshub bot commented Oct 22, 2021

workshub bot commented Nov 26, 2021

JonathanReeve commented Nov 27, 2021

workshub bot commented Dec 7, 2021

workshub bot commented Dec 20, 2021

JonathanReeve commented Dec 20, 2021

workshub bot commented Jan 30, 2022

workshub bot commented Feb 2, 2022

JonathanReeve commented Feb 2, 2022

JonathanReeve commented Feb 2, 2022

workshub bot commented Feb 4, 2022

workshub bot commented Feb 24, 2022

workshub bot commented Mar 2, 2022

workshub bot commented Mar 3, 2022

workshub bot commented Mar 23, 2022

workshub bot commented Mar 30, 2022

workshub bot commented Apr 2, 2022

workshub bot commented Apr 4, 2022

workshub bot commented Jun 21, 2022

workshub bot commented Jul 19, 2022

workshub bot commented Aug 12, 2022

workshub bot commented Oct 28, 2022

workshub bot commented Dec 24, 2022

workshub bot commented Mar 31, 2023

workshub bot commented May 10, 2023

JonathanReeve commented May 11, 2023

droher commented Oct 1, 2019 •

edited

Loading