Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of additional Joycean compounds #51

Open
droher opened this issue Sep 28, 2019 · 34 comments
Open

List of additional Joycean compounds #51

droher opened this issue Sep 28, 2019 · 34 comments

Comments

@droher
Copy link

droher commented Sep 28, 2019

I wrote a hacky algorithm to find likely Joycean compounds. It excludes any words already tagged as compounds in the XML, as well as any words inside of a foreign language tag. There are plenty of false positives, but it does a pretty good job at sending likely ones to the top of the list:
compound_guesses.txt

I'd be happy to put up a PR to add a bunch of these to the XML, but I wanted to check before I did to see if you'd be interested/if that was the best way to go about it. Thanks!

@JonathanReeve
Copy link
Member

That's great! This is really cool. How do you imagine encoding it? A quick guess of mine might be:

Something like this:

<distinct type="nonstandard-compound">
buttocksmothered
<choice>
  <reg>buttocks mothered</reg>
  <reg>buttock smothered</reg>
</choice>
</distinct>

@droher
Copy link
Author

droher commented Sep 29, 2019

This is the first TEI doc I've ever used, so I would defer to you on the right encoding. If we go the reg route above, does that mean that the text inside of those tags would be picked up as text properties by XML parsers? Would there be a way to include them as attributes of an element instead?

Also I think the example you chose is one of the very few that would have two different interpretations (and even there, the second one is the clear primary meaning), but it's not any extra work to add multiple sensible choices where they exist.

A few more interpretive questions:

  • How would you recommend handling cases that are ambiguous as to whether they're compounds or joycean blends, e.g. musemathics?
  • What about cases that are unusual affixes, e.g. remarkablest or incoordinately? Are these best thought of as compounds?
  • I saw your earlier discussion on when to use standard vs nonstandard compound tagging -- I don't have access to the OED, but I could use the same logic with Wiktionary if that worked for you.

@JonathanReeve
Copy link
Member

So far, we've just maintained a list of these tags that shouldn't be rendered as text during a transformation, but we could make that more explicit in the markup by adding a property like rend="none", which I might put in <choice>.

There's a way to indicate which of two choices is the primary one, maybe using certainty, but unless you're feeling extra ambitious I wouldn't worry about this for now.

For lack of a more specific term ("nonstandard adverbial construction"?), type="Joycean" sounds about right to me for cases like incoordinately. Also for musemathics, for different reasons. Wiktionary would work here. @sk3853, do you have any ideas about how to handle these, based on your experience with categorizing distinct words?

Looking forward to seeing the PR.

@droher
Copy link
Author

droher commented Sep 30, 2019

Great, aiming to get the PR up this coming weekend.

The concern I have around both the tag list and the rend=None options is that they're not as obvious to users (like me) who are expecting the text properties of the XML to just contain the text of Ulysses. If there are already examples like this in the XML, then adding one more wouldn't be a problem - maybe the solution is just calling those non-rendering tags out more explicitly in the doc?

@sk3853
Copy link
Contributor

sk3853 commented Oct 1, 2019

Hi guys- with regards to Joyceans: I'm curious as to what variables your algorithm included to distinguish Joyean compounds from nonstandard-compounds, which was my biggest difficulty when going through this manually.
I think that it would be most efficient and consistent to tag these words as example, if you're confident that he coined the terms. Once I finish up the rest of the project I'm going to run through it again and confirm that my Joycean words aren't just nonstandard compounds and vice-versa. Perhaps it would be worthwhile to consider changing those tags so there isn't as much overlap.
I don't think it would be a bad idea to include multiple interpretations of words like "buttsmothered," but there are so many word choices to interpret that I think that kind of project would make more sense further down the road.

@droher
Copy link
Author

droher commented Oct 1, 2019

Hi @sk3853, could you give an example of "Joyean compounds from nonstandard-compounds"? I thought the distinction was between standard compounds (word exists with a hyphen in the OED) and nonstandard.

The algorithm isn't doing any distinguishing like that yet. It first finds the set of words in Ulysses that are not in a list of English words, and within words, finds instances where two substring pairs are in the list. Then I sort the list by the geometric mean of the lengths of the original word and each word of the substring pair.

Before I put the PR up, I'm going to cross-reference the list against Wiktionary to distinguish between standard and non-standard compounds, and also go through each word manually to weed out false positives.

@JonathanReeve
Copy link
Member

If there are already examples like this in the XML, then adding one more wouldn't be a problem - maybe the solution is just calling those non-rendering tags out more explicitly in the doc?

Good idea. There are already quite a few of these styles of tags, and they render all kinds of artifacts, like latitudes and longitudes for <place> tags. In some cases, I have XSLT that hides them, but there really should be a list of these somewhere, or some other kind of logic to hide them.

@sk3853
Copy link
Contributor

sk3853 commented Oct 7, 2019 via email

@workshub
Copy link

workshub bot commented Oct 22, 2021

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Nov 26, 2021

@imrobintomar started working on this issue via WorksHub.

@JonathanReeve
Copy link
Member

Hi, @imrobintomar! Glad to see that you've started work on this issue. Let me know if you have any questions along the way!

@workshub
Copy link

workshub bot commented Dec 7, 2021

A user started working on this issue via WorksHub.

1 similar comment
@workshub
Copy link

workshub bot commented Dec 20, 2021

A user started working on this issue via WorksHub.

@JonathanReeve
Copy link
Member

@imrobintomar, could you say what you had in mind for this issue? And do you have any questions?

@workshub
Copy link

workshub bot commented Jan 30, 2022

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Feb 2, 2022

@Avrnikh-iziki started working on this issue via WorksHub.

@JonathanReeve
Copy link
Member

@imrobintomar, have you started work on this issue? I don't see anything in your GitHub account about this yet. Please let me know ASAP.

@JonathanReeve
Copy link
Member

Hi @Avrnikh-iziki! Could you tell me what you had in mind for this issue?

@workshub
Copy link

workshub bot commented Feb 4, 2022

A user started working on this issue via WorksHub.

10 similar comments
@workshub
Copy link

workshub bot commented Feb 24, 2022

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Mar 2, 2022

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Mar 3, 2022

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Mar 23, 2022

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Mar 30, 2022

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Apr 2, 2022

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Apr 4, 2022

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Jun 21, 2022

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Jul 19, 2022

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Aug 12, 2022

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Oct 28, 2022

@Brucedevnairobi started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Dec 24, 2022

A user started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented Mar 31, 2023

@draconid719 started working on this issue via WorksHub.

@workshub
Copy link

workshub bot commented May 10, 2023

@Natalia-Mikhieieva started working on this issue via WorksHub.

@JonathanReeve
Copy link
Member

@Natalia-Mikhieieva, what did you have in mind for this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants