-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List of additional Joycean compounds #51
Comments
That's great! This is really cool. How do you imagine encoding it? A quick guess of mine might be:
Something like this: <distinct type="nonstandard-compound">
buttocksmothered
<choice>
<reg>buttocks mothered</reg>
<reg>buttock smothered</reg>
</choice>
</distinct> |
This is the first TEI doc I've ever used, so I would defer to you on the right encoding. If we go the Also I think the example you chose is one of the very few that would have two different interpretations (and even there, the second one is the clear primary meaning), but it's not any extra work to add multiple sensible choices where they exist. A few more interpretive questions:
|
So far, we've just maintained a list of these tags that shouldn't be rendered as text during a transformation, but we could make that more explicit in the markup by adding a property like There's a way to indicate which of two choices is the primary one, maybe using For lack of a more specific term ("nonstandard adverbial construction"?), Looking forward to seeing the PR. |
Great, aiming to get the PR up this coming weekend. The concern I have around both the tag list and the |
Hi guys- with regards to Joyceans: I'm curious as to what variables your algorithm included to distinguish Joyean compounds from nonstandard-compounds, which was my biggest difficulty when going through this manually. |
Hi @sk3853, could you give an example of "Joyean compounds from nonstandard-compounds"? I thought the distinction was between standard compounds (word exists with a hyphen in the OED) and nonstandard. The algorithm isn't doing any distinguishing like that yet. It first finds the set of words in Ulysses that are not in a list of English words, and within words, finds instances where two substring pairs are in the list. Then I sort the list by the geometric mean of the lengths of the original word and each word of the substring pair. Before I put the PR up, I'm going to cross-reference the list against Wiktionary to distinguish between standard and non-standard compounds, and also go through each word manually to weed out false positives. |
Good idea. There are already quite a few of these styles of tags, and they render all kinds of artifacts, like latitudes and longitudes for |
Hi-
Sorry, when I wrote that I meant that I had difficulties distinguishing
Joycean compound words (buttsmothered) (distinct type= “Joycean”) from
words that are maybe better categorized as nonstandard compounds
(newsboards)(distinct type= “nonstandard-compound”). That wouldn't be a
Joycean, since it's a basic combination of two words, but it's also not in
the OED as news-boards. The compound category was an easy one to figure out
since the OED came up with the hyphenated word as a suggestion whenever I
entered one that was nonhyphenated in Joyce. Your additions are welcome!
Just know that the 4 distinct types we have going are
-compound
-nonstandard-compound
-Joycean
-archaism.
Sorry for the slow replies, my schedule has been crazy recently.
El El mar, oct. 1, 2019 a la(s) 7:09 a. m., David Roher <
[email protected]> escribió:
… Hi @sk3853 <https://github.com/sk3853>, could you give an example of
"Joyean compounds from nonstandard-compounds"? I thought the distinction
was between standard compounds (word exists with a hyphen in the OED) and
nonstandard.
The algorithm isn't doing any distinguishing like that yet. It first finds
the set of words in Ulysses that are not in a list of English words, and
within words, finds instances where two substring pairs *are* in the
list. Then I sort the list by the geometric mean of the lengths of the
original word and each word of the substring pair.
Before I put the PR up, I'm going to cross-reference the list against
Wiktionary to distinguish between standard and non-standard compounds, but
I'm going to go through each word manually to weed out false positives.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#51?email_source=notifications&email_token=AFLPKBLMFFMLBDRYX7PVFW3QMMVU5A5CNFSM4I3PWT6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAA4M2Q#issuecomment-536987242>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFLPKBNHLGRMOQ7EBJNADXDQMMVU5ANCNFSM4I3PWT6A>
.
|
A user started working on this issue via WorksHub. |
@imrobintomar started working on this issue via WorksHub. |
Hi, @imrobintomar! Glad to see that you've started work on this issue. Let me know if you have any questions along the way! |
A user started working on this issue via WorksHub. |
1 similar comment
A user started working on this issue via WorksHub. |
@imrobintomar, could you say what you had in mind for this issue? And do you have any questions? |
A user started working on this issue via WorksHub. |
@Avrnikh-iziki started working on this issue via WorksHub. |
@imrobintomar, have you started work on this issue? I don't see anything in your GitHub account about this yet. Please let me know ASAP. |
Hi @Avrnikh-iziki! Could you tell me what you had in mind for this issue? |
A user started working on this issue via WorksHub. |
10 similar comments
A user started working on this issue via WorksHub. |
A user started working on this issue via WorksHub. |
A user started working on this issue via WorksHub. |
A user started working on this issue via WorksHub. |
A user started working on this issue via WorksHub. |
A user started working on this issue via WorksHub. |
A user started working on this issue via WorksHub. |
A user started working on this issue via WorksHub. |
A user started working on this issue via WorksHub. |
A user started working on this issue via WorksHub. |
@Brucedevnairobi started working on this issue via WorksHub. |
A user started working on this issue via WorksHub. |
@draconid719 started working on this issue via WorksHub. |
@Natalia-Mikhieieva started working on this issue via WorksHub. |
@Natalia-Mikhieieva, what did you have in mind for this issue? |
I wrote a hacky algorithm to find likely Joycean compounds. It excludes any words already tagged as compounds in the XML, as well as any words inside of a foreign language tag. There are plenty of false positives, but it does a pretty good job at sending likely ones to the top of the list:
compound_guesses.txt
I'd be happy to put up a PR to add a bunch of these to the XML, but I wanted to check before I did to see if you'd be interested/if that was the best way to go about it. Thanks!
The text was updated successfully, but these errors were encountered: