Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguate <emph> to <foreign> for foreign languages #2

Open
JonathanReeve opened this issue Jan 16, 2017 · 7 comments
Open

Disambiguate <emph> to <foreign> for foreign languages #2

JonathanReeve opened this issue Jan 16, 2017 · 7 comments

Comments

@JonathanReeve
Copy link
Member

A sub-task of #1.

@JonathanReeve
Copy link
Member Author

JonathanReeve commented Jan 16, 2017

Here are some interesting edge-cases:

<lb n="070468"/><said>--</said><emph>Thanky vous</emph>, Lenehan said, helping himself.</p>

There is no pre-existing language ID for this mix of French and English. But we might be able to define one in the TEI header like this:

<langUsage>
 <language ident=”franglais”>Franglais, a mixture of French and English</language>
</langUsage>

Then we can mark it up like this:

<lb n="070468"/><said>--</said><foreign xml:lang=”franglais”>Thanky vous</foreign>, Lenehan said, helping himself.</p>

I've documented this in CONTRIBUTING.md. Here's another case:

<lb n="080623"/>here. <emph>Lacaus esant tara tara.</emph> Great chorus that. <emph>Taree tara.</emph> Must be washed

Gifford suggests that "Lacaus esant" is Bloom remembering the line "La cause è santa" from a French opera probably performed in Italian, since Italian performances of the opera were common in the early 20th C. But the French for that, "la cause est sainte" might be rendered the same way in Bloom's mind. I'll leave this as Italian for now, trusting that Gifford knows something I don't.

Here's another:

<lb n="100849"/><said>--</said><emph>Se el yilo nebrakada femininum! Amor me solo! Sanktus! Amen.</emph></p>

Gifford writes: "Se reads Sel in the German version of the 'Eighth and Ninth Books of Moses' thus, if the phrase were Sel el yilo, it could be regarded as a phonetic reproduc­tion of the Spanish Cielillo, “Little Heaven”; and “nebrakada” could be Spanish-Arabic for “blessed”. The whole charm would then read: “[My] little heaven of blessed femininity, love only me. Holy! Amen.”

So I'll define a new language, Spanish-Arabic, in the header.

This one is puzzling:

<p rend="non-indent"><lb n="110043"/><emph>Naminedamine.</emph> Preacher is he.</p>

I'll label this as Latin for now.

@JonathanReeve
Copy link
Member Author

JonathanReeve commented Jan 17, 2017

This is almost finished with ca3f3bd, but I'll leave this issue open for now, awaiting any discussion about the edge-cases described above.

@yellwork
Copy link
Collaborator

From the Wikibooks annotations to Ulysses (line references adapted to our edition):

Naminedamine (Latin) Bloom’s corruption of the liturgical phrase In nomine Domini (In the name of the Lord).[1] During Paddy Dignam’s Burial Service, Bloom had concocted the term Dominenamine (6.595). Later, in this episode, he will be inspired to concoct a similar nonsense word by a line in The Croppy Boy: 11.1036. Another variant, nominedomine, occurs at 11.1244.

[1] Gifford (1988) 294.
Thornton (1968) 241.

Interestingly enough, while both corpusnomine at 11.1036 and nominedomine at 11.1244 were tagged in <emph> (and Jonathan now has them, rightly, as <foreign xml:lang="la">), ‘Dominenamine’ at 6.595 is not italicised in ‘Hades’ and so was never <emph> distinguished in our markup.

This and the discussion above raises two questions:
(1) Do we want to further distinguish garbled, humorous or inaccurate instances of <foreign>? Are there appropriate attributes for these nuances? (Or do we just @type everything?)
(2) Do we want to encode as <foreign> instances of non-English usage that are not typographically distinguished in the Gabler critical text? Use of @rend might get around the non-italics. Of course, these will be more difficult to track down in the corpus than the <emph>-tagged words and phrases! Maybe this is where external sources / previous scholarship will come into its own?

@JonathanReeve
Copy link
Member Author

This is really interesting. I think you're right--we could mark up "Dominenamine" with <foreign>, but add a @rend attribute to indicate that it's not italicized, since that will be the default behavior for <foreign> tags. That way, it can be easily distinguished.

Although we could define a new language for, say, corrupted Latin, that might be a slippery slope, since there is also corrupted (phonetically spelled) Irish, and other nonstandard forms. So I think that's a good idea to add a @type. Then it would look like this:

<foreign xml:lang="la" type="corrupted" rend="none">Dominenamine</foreign>

Of course, we could also go with something like "Bloomean" instead of "corrupted"! So long as we keep track of the @types we're using in CONTRIBUTING.md, I'm sure it'll be fine.

@yellwork
Copy link
Collaborator

yellwork commented Jan 18, 2017

OK, let’s go with <foreign> and @rend="none" for such cases. As I said, it’ll be a much slower job to track down these untagged <foreign>s in the episodes. ‘Proteus’, for example, has:

<lb n="030176"/>devil's name? Paysayenn. P. C. N., you know: <foreign xml:lang="fr">physiques, chimiques et

But we could easily add <foreign xml:lang="fr" rend="none"> on ‘Paysayenn’. On ‘P. C. N.’ too? Or has Stephen switched to an English pronunciation of the letters to clarify matters?

I think the 'corrupted' value on @type is smart. Am I right that it’ll apply to both the marked (and so initially <emph>) and unmarked instances of <foreign> in the corpus? If we’re finding these instances of ‘home-made’ or otherwise distinctive <foreign> all over the episodes, we might want to disambiguate further. For now, I think @type="corrupted" also answers two of the queries/edge-cases in your original post to this thread:

<lb n="080623"/>here. <emph>Lacaus esant tara tara.</emph> Great chorus that. <emph>Taree tara.</emph> Must be washed
<lb n="100849"/><said>--</said><emph>Se el yilo nebrakada femininum! Amor me solo! Sanktus! Amen.</emph></p>

@JonathanReeve
Copy link
Member Author

I had a good chat with Hugh Cayless on the Digital Humanities Slack, and he has a few suggestions for improving the language encoding:

  1. As the new RelaxNG validation indicates, franglais apparently isn't valid, but something like fr-t-c0-en might be. That's french with a mixture of English thrown in. There's a discussion about these "hybrid" locales here.

  2. Our grc language code should actually be grc-Latn, he points out, since it uses the Latin instead of Greek alphabet.

I'll go ahead and make these changes now.

@yellwork
Copy link
Collaborator

Intriguing, Jonathan. More work but more nuance in the markup. I suppose you can tweak your tools to group French and French-ish when they need to and to separate them when that’s preferable?

It makes me think too that Bloom’s

<p><foreign xml:lang="es"><lb n="150216"/>Bueñas noches, señorita Blanca. Que calle es esta?</foreign></p></sp>

is probably Spanglish (es-t-h0-en?) as his “Bueñas” is not accurate. Or is this a separate nuance altogether? Or just @type="corrupted" again?

Where would something like Stephen’s “demiurgos” at U 3.18 sit? Right now, we have it encoded as one of our lingering <emph>s: <emph>demiurgos</emph>, but the word is, seemingly, a Latinization of the Greek “δημιουργός”. This is hardly grc-latn then. Can we distinguish Latin translated from the Greek using the new language codes?

Or is it just English?
OED gives for Demiurge, n.

Etymology: modern < Greek δημιουργός (Latinized dēmiūrgus), lit. public or skilled worker, < δήμιος of the people, public + -εργος -working, worker: compare French demiurge. The Greek and Latin forms demiurgos, -urgus/diːmɪ-/, /dɛmɪɜːɡəs/were in earlier use. (So in 16th cent. French demiourgon, Rabelais.)

Is “demiurgos” then just an English obsoletism?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants