Disambiguate <emph> to <foreign> for foreign languages #2

JonathanReeve · 2017-01-16T19:21:17Z

A sub-task of #1.

JonathanReeve · 2017-01-16T19:27:48Z

Here are some interesting edge-cases:

<lb n="070468"/><said>--</said><emph>Thanky vous</emph>, Lenehan said, helping himself.</p>

There is no pre-existing language ID for this mix of French and English. But we might be able to define one in the TEI header like this:

<langUsage>
 <language ident=”franglais”>Franglais, a mixture of French and English</language>
</langUsage>

Then we can mark it up like this:

<lb n="070468"/><said>--</said><foreign xml:lang=”franglais”>Thanky vous</foreign>, Lenehan said, helping himself.</p>

I've documented this in CONTRIBUTING.md. Here's another case:

<lb n="080623"/>here. <emph>Lacaus esant tara tara.</emph> Great chorus that. <emph>Taree tara.</emph> Must be washed

Gifford suggests that "Lacaus esant" is Bloom remembering the line "La cause è santa" from a French opera probably performed in Italian, since Italian performances of the opera were common in the early 20th C. But the French for that, "la cause est sainte" might be rendered the same way in Bloom's mind. I'll leave this as Italian for now, trusting that Gifford knows something I don't.

Here's another:

<lb n="100849"/><said>--</said><emph>Se el yilo nebrakada femininum! Amor me solo! Sanktus! Amen.</emph></p>

Gifford writes: "Se reads Sel in the German version of the 'Eighth and Ninth Books of Moses' thus, if the phrase were Sel el yilo, it could be regarded as a phonetic reproduction of the Spanish Cielillo, “Little Heaven”; and “nebrakada” could be Spanish-Arabic for “blessed”. The whole charm would then read: “[My] little heaven of blessed femininity, love only me. Holy! Amen.”

So I'll define a new language, Spanish-Arabic, in the header.

This one is puzzling:

<p rend="non-indent"><lb n="110043"/><emph>Naminedamine.</emph> Preacher is he.</p>

I'll label this as Latin for now.

JonathanReeve · 2017-01-17T00:56:11Z

This is almost finished with ca3f3bd, but I'll leave this issue open for now, awaiting any discussion about the edge-cases described above.

yellwork · 2017-01-17T20:32:43Z

From the Wikibooks annotations to Ulysses (line references adapted to our edition):

Naminedamine (Latin) Bloom’s corruption of the liturgical phrase In nomine Domini (In the name of the Lord).[1] During Paddy Dignam’s Burial Service, Bloom had concocted the term Dominenamine (6.595). Later, in this episode, he will be inspired to concoct a similar nonsense word by a line in The Croppy Boy: 11.1036. Another variant, nominedomine, occurs at 11.1244.

[1] Gifford (1988) 294.
Thornton (1968) 241.

Interestingly enough, while both corpusnomine at 11.1036 and nominedomine at 11.1244 were tagged in <emph> (and Jonathan now has them, rightly, as <foreign xml:lang="la">), ‘Dominenamine’ at 6.595 is not italicised in ‘Hades’ and so was never <emph> distinguished in our markup.

This and the discussion above raises two questions:
(1) Do we want to further distinguish garbled, humorous or inaccurate instances of <foreign>? Are there appropriate attributes for these nuances? (Or do we just @type everything?)
(2) Do we want to encode as <foreign> instances of non-English usage that are not typographically distinguished in the Gabler critical text? Use of @rend might get around the non-italics. Of course, these will be more difficult to track down in the corpus than the <emph>-tagged words and phrases! Maybe this is where external sources / previous scholarship will come into its own?

JonathanReeve · 2017-01-18T18:50:42Z

This is really interesting. I think you're right--we could mark up "Dominenamine" with <foreign>, but add a @rend attribute to indicate that it's not italicized, since that will be the default behavior for <foreign> tags. That way, it can be easily distinguished.

Although we could define a new language for, say, corrupted Latin, that might be a slippery slope, since there is also corrupted (phonetically spelled) Irish, and other nonstandard forms. So I think that's a good idea to add a @type. Then it would look like this:

<foreign xml:lang="la" type="corrupted" rend="none">Dominenamine</foreign>

Of course, we could also go with something like "Bloomean" instead of "corrupted"! So long as we keep track of the @types we're using in CONTRIBUTING.md, I'm sure it'll be fine.

yellwork · 2017-01-18T19:34:32Z

OK, let’s go with <foreign> and @rend="none" for such cases. As I said, it’ll be a much slower job to track down these untagged <foreign>s in the episodes. ‘Proteus’, for example, has:

<lb n="030176"/>devil's name? Paysayenn. P. C. N., you know: <foreign xml:lang="fr">physiques, chimiques et

But we could easily add <foreign xml:lang="fr" rend="none"> on ‘Paysayenn’. On ‘P. C. N.’ too? Or has Stephen switched to an English pronunciation of the letters to clarify matters?

I think the 'corrupted' value on @type is smart. Am I right that it’ll apply to both the marked (and so initially <emph>) and unmarked instances of <foreign> in the corpus? If we’re finding these instances of ‘home-made’ or otherwise distinctive <foreign> all over the episodes, we might want to disambiguate further. For now, I think @type="corrupted" also answers two of the queries/edge-cases in your original post to this thread:

<lb n="080623"/>here. <emph>Lacaus esant tara tara.</emph> Great chorus that. <emph>Taree tara.</emph> Must be washed

<lb n="100849"/><said>--</said><emph>Se el yilo nebrakada femininum! Amor me solo! Sanktus! Amen.</emph></p>

JonathanReeve · 2017-01-24T22:13:58Z

I had a good chat with Hugh Cayless on the Digital Humanities Slack, and he has a few suggestions for improving the language encoding:

As the new RelaxNG validation indicates, franglais apparently isn't valid, but something like fr-t-c0-en might be. That's french with a mixture of English thrown in. There's a discussion about these "hybrid" locales here.
Our grc language code should actually be grc-Latn, he points out, since it uses the Latin instead of Greek alphabet.

I'll go ahead and make these changes now.

yellwork · 2017-01-26T17:45:22Z

Intriguing, Jonathan. More work but more nuance in the markup. I suppose you can tweak your tools to group French and French-ish when they need to and to separate them when that’s preferable?

It makes me think too that Bloom’s

<p><foreign xml:lang="es"><lb n="150216"/>Bueñas noches, señorita Blanca. Que calle es esta?</foreign></p></sp>

is probably Spanglish (es-t-h0-en?) as his “Bueñas” is not accurate. Or is this a separate nuance altogether? Or just @type="corrupted" again?

Where would something like Stephen’s “demiurgos” at U 3.18 sit? Right now, we have it encoded as one of our lingering <emph>s: <emph>demiurgos</emph>, but the word is, seemingly, a Latinization of the Greek “δημιουργός”. This is hardly grc-latn then. Can we distinguish Latin translated from the Greek using the new language codes?

Or is it just English?
OED gives for Demiurge, n.

Etymology: modern < Greek δημιουργός (Latinized dēmiūrgus), lit. public or skilled worker, < δήμιος of the people, public + -εργος -working, worker: compare French demiurge. The Greek and Latin forms demiurgos, -urgus/diːmɪ-/, /dɛmɪɜːɡəs/were in earlier use. (So in 16th cent. French demiourgon, Rabelais.)

Is “demiurgos” then just an English obsoletism?!

JonathanReeve mentioned this issue Jan 16, 2017

Disambiguate <emph> tags #1

Open

4 tasks

JonathanReeve added the enhancement label Jan 17, 2017

JonathanReeve mentioned this issue Jan 17, 2017

Parse Joycean lexica for new instances of <foreign> #4

Open

yellwork mentioned this issue Jan 18, 2017

Disambiguating <emph> into multiple taggings #7

Open

JonathanReeve added a commit that referenced this issue Jan 18, 2017

correct mistake as noticed in discussion of #2

d152102

JonathanReeve mentioned this issue Feb 15, 2017

Validate with RelaxNG #22

Open

yellwork added a commit that referenced this issue Aug 3, 2017

Emph disambig per #2

fbc0f8e

JonathanReeve mentioned this issue Feb 17, 2021

mark up languages open-editions/corpus-eliot-middlemarch-tei#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disambiguate <emph> to <foreign> for foreign languages #2

Disambiguate <emph> to <foreign> for foreign languages #2

JonathanReeve commented Jan 16, 2017

JonathanReeve commented Jan 16, 2017 •

edited

Loading

JonathanReeve commented Jan 17, 2017 •

edited

Loading

yellwork commented Jan 17, 2017

JonathanReeve commented Jan 18, 2017

yellwork commented Jan 18, 2017 •

edited

Loading

JonathanReeve commented Jan 24, 2017

yellwork commented Jan 26, 2017

Disambiguate <emph> to <foreign> for foreign languages #2

Disambiguate <emph> to <foreign> for foreign languages #2

Comments

JonathanReeve commented Jan 16, 2017

JonathanReeve commented Jan 16, 2017 • edited Loading

JonathanReeve commented Jan 17, 2017 • edited Loading

yellwork commented Jan 17, 2017

JonathanReeve commented Jan 18, 2017

yellwork commented Jan 18, 2017 • edited Loading

JonathanReeve commented Jan 24, 2017

yellwork commented Jan 26, 2017

JonathanReeve commented Jan 16, 2017 •

edited

Loading

JonathanReeve commented Jan 17, 2017 •

edited

Loading

yellwork commented Jan 18, 2017 •

edited

Loading