Retagging Joyce’s dialogue #19

yellwork · 2017-02-08T14:41:04Z

This is a to-do issue to pick out the various tasks discussed in #9:

Convert all double-hyphen dialogue dashes to the quotation dash or horizontal bar.
Shift the </said> tags in <said>―</said> structures to the end of character speech. Add all intermedial <said> tagging.
Proof the </said> tagging for every episode. How? We will visualize all of the episodes in a browser and colour just the </said> tagged dialogue.
Episodes remaining: 1. “Telemachus” 2. “Nestor” 3. “Proteus” 4. “Calypso” 5. “Lotus Eaters” 6. “Hades” 7. “Aeolus” 8. “Lestrygonians” 9. “Scylla and Charybdis” 10. “Wandering Rocks” 11. “Sirens” 12. “Cyclops” 13. “Nausicaa” 14. “Oxen of the Sun” 15. “Circe” 16. “Eumaeus” 17. “Ithaca” 18. “Penelope”
Disambiguate the appropriate <emph> to <said> tagging.
[there might be a few other stragglers]
Add @who attribution for every instance of <said> (or in “Circe” <sp>). Use character names for the values.
Switch @who values to @xml:id.
Compile a <listPerson> dossier of speakers.

The text was updated successfully, but these errors were encountered:

c-forster · 2017-02-08T20:34:40Z

Having a to-do list for this seems wise. FYI: The following ack one-liner will extract names from the who attribute of said tags.

ack -o "(?<=<said who=\")[\w\'\. ]*" *.xml

This will compile a sorted list of all the names across the corpus:

ack -ho "(?<=<said who=\")[\w\'\. ]*" *.xml | sort | uniq

I was using it as a sanity check to catch misspellings when I marked up "Telemachus."

Could we also assign, or let people claim, episodes to mark up with dialogue on this, or another issue? I am going to tackle another episode as soon as I can, and want to avoid reduplicating labor.

yellwork · 2017-02-09T09:18:41Z

Good idea. Can we formally assign them or do we just call dibs here?

After you started <said> tagging, Chris, I snagged a lot of the low-hanging fruit (the less chatty, shorter episodes). Claiming the longer ones now makes sense because they’re likely to take a considerable bit of time to mark up.

Those ack commands will come in very handy once we start figuring out the speaking parts.

yellwork · 2017-02-09T12:15:29Z

Going to do the @who attribution on “Proteus” now.

yellwork · 2017-02-09T12:51:42Z

Going to tackle @who on “Aeolus” now.

JonathanReeve · 2017-02-15T23:36:48Z

@c-forster, that ack hack is great. I use ag, "The Silver Searcher," myself, and was able to get it to work the same way using ag --nofilename -o "(?<=<said who=\")[\w\'\. ]*" *.xml | sort | uniq. I'll put this into a makefile so that we can run these sorts of things easily.

Caught/tweaked a few <foreign> as well.

Also <emph> to <quote> or <name>

yellwork · 2017-11-02T16:33:51Z

I’m simplifying this. A ⟨listPerson⟩ for the entire novel would be incredible, but … too much work for now. So I’m going to switch all @who values to character initials and put the key in the separate plaintext file persons.txt.

JonathanReeve · 2017-11-03T14:49:19Z

Sounds good. I'm not seeing the key in persons.txt, though? Anyway when it's there, if it's in some kind of regular format, like comma- or tab-separated, then it'll be easy to make a list of these keys to add to the header.

yellwork · 2017-11-03T14:54:53Z

I’m doing it all offline while I go through all eighteen episodes. I’ll merge them all into the repository once done.

My local persons.txt looks like this:

db [tab]Davy Byrne
dbc [tab]Davy Byrne's curate
dbm [tab]D.B. Murphy
dd [tab]Dan Dawson
did [tab]Dilly Dedalus

That could be the basis for a <listPerson> – information I’d love to see added but too much for us right now (I feel).

JonathanReeve · 2017-11-03T15:53:06Z

Awesome, sounds great. Ronan Crowley <[email protected]> writes:

…

I’m doing it all offline while I go through all eighteen episodes. I’ll merge them all into the repository once done. My local persons.txt looks like this: db Davy Byrne dbc Davy Byrne's curate dbm D.B. Murphy dd Dan Dawson did Dilly Dedalus That could be the basis for a `<listPerson>` – information I’d love to see added but too much for us right now (I feel).

yellwork · 2017-11-03T15:56:07Z

Some content that was marooned in the closed #9 was your suggestion, Jonathan, for unclear @who values. Something like:

<lb n="060004"/><said xml:id="060004-a" who="Cunningham">―Come on, Simon.
<certainty target="#060004-a" match="@who" locus="value" assertedValue="Power" degree="0.5">
    <desc>It's unclear here whether it's Cunningham or Power speaking.</desc>
</certainty> 
</said>

I’m going to go ahead and use this encoding whenever an unclear speaker is limited to a handful of candidates. Unless you’ve another idea?

persons.txt contains a list of all speakers in the novel.

Several lgs nested in quote or said/quote. Added speaker ambiguity at U 01.671.

Has some lingering unclear speakers (see U 6.116–118 and 6.139, 6.215, 6.384).

There are some lingering unclears in these episodes.

JonathanReeve · 2017-11-05T16:47:41Z

Sounds great. Let's do it. I'll make a note of this in our conventions list, too. Ronan Crowley <[email protected]> writes:

…

Some content that was marooned in the closed #9 was your suggestion, Jonathan, for unclear @who values. Something like: <lb n="060004"/><said xml:id="060004-a" who="Cunningham">―Come on, Simon. <certainty target="#060004-a" ***@***.***" locus="value" assertedValue="Power" degree="0.5"> <desc>It's unclear here whether it's Cunningham or Power speaking.</desc> </certainty> </said> I’m going to go ahead and use this encoding whenever an unclear speaker is limited to a handful of candidates. Unless you’ve another idea? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.*

yellwork · 2017-11-05T17:17:35Z

How do we attribute dialogue in an exchange between several people ? There’s a spot like this in Hades where no speakers are given for several lines of dialogue:

<lb n="060114"/><said who="lb">―I met M'Coy this morning,</said> Mr Bloom said. <said who="lb">He said he'd try to come.</said></p>
<p><lb n="060115"/>The carriage halted short.
<lb n="060116"/><said who="unclear">―What's wrong?</said>
<lb n="060117"/><said who="unclear">―We're stopped.</said>
<lb n="060118"/><said who="unclear">―Where are we?</said></p>
<p><lb n="060119"/>Mr Bloom put his head out of the window.
<lb n="060120"/><said who="lb">―The grand canal,</said> he said.</p>

The unclears can only be Cunningham, Power or Simon Dedalus (with Bloom, perhaps, chiming in at U 6.117). How best would that be encoded?

JonathanReeve · 2017-11-08T15:24:39Z

I read the TEI docs on <certainty> again but this is the best I could think of:

<lb n="060114"/><said who="lb">―I met M'Coy this morning,</said> Mr Bloom said. <said who="lb">He said he'd try to come.</said></p>
<p><lb n="060115"/>The carriage halted short.
<lb n="060116"/><said who="unclear">―What's wrong?
<certainty match="@who" locus="value" assertedValue="Power" degree="0.33" />
<certainty match="@who" locus="value" assertedValue="Cunningham" degree="0.33" />
<certainty match="@who" locus="value" assertedValue="Simon Dedalus" degree="0.33" /> 
</said>
<lb n="060117"/><said who="unclear">―We're stopped.
<certainty match="@who" locus="value" assertedValue="Power" degree="0.33" />
<certainty match="@who" locus="value" assertedValue="Cunningham" degree="0.33" />
<certainty match="@who" locus="value" assertedValue="Simon Dedalus" degree="0.33" /> 
</said>
<lb n="060118"/><said who="unclear">―Where are we?
<certainty match="@who" locus="value" assertedValue="Power" degree="0.33" />
<certainty match="@who" locus="value" assertedValue="Cunningham" degree="0.33" />
<certainty match="@who" locus="value" assertedValue="Simon Dedalus" degree="0.33" /> 
</said></p>
<p><lb n="060119"/>Mr Bloom put his head out of the window.
<lb n="060120"/><said who="lb">―The grand canal,</said> he said.</p>

...which is super kludgey and not very DRY. Ideally we could do target="#060116 #060118 #060119" on a single <certainty> set, and avoid all this repetition, but it doesn't look like XML can handle multiple attribute values.

@tcatapano, any ideas?

JonathanReeve added a commit that referenced this issue Feb 15, 2017

add makefile for validation and for listing characters, per #19

308c811

yellwork added a commit that referenced this issue Feb 16, 2017

@who tagging, per #19, #9

835afc6

Caught/tweaked a few <foreign> as well.

yellwork added a commit that referenced this issue Feb 16, 2017

<said> tagging per #19

fb33ded

yellwork added a commit that referenced this issue Feb 16, 2017

tagged @who per #19

e0ce689

yellwork added a commit that referenced this issue Feb 22, 2017

Div type=section (#11) and who tagging (#19)

8d89208

Also <emph> to <quote> or <name>

yellwork added a commit that referenced this issue Mar 2, 2017

Dialogue tagging (said/who) #19

2311a51

yellwork added a commit that referenced this issue Mar 5, 2017

Proof <said> per #19

25427bc

yellwork added a commit that referenced this issue Nov 2, 2017

Simplify @who values per #9 #19.

118cb2d

yellwork added a commit that referenced this issue Nov 4, 2017

Who values per #19

36d0c72

persons.txt contains a list of all speakers in the novel.

yellwork added a commit that referenced this issue Nov 4, 2017

Per #31 and #19.

b54e818

Several lgs nested in quote or said/quote. Added speaker ambiguity at U 01.671.

yellwork added a commit that referenced this issue Nov 4, 2017

Per #19 and #10.

e2dc136

yellwork added a commit that referenced this issue Nov 4, 2017

Per #19.

f851826

Has some lingering unclear speakers (see U 6.116–118 and 6.139, 6.215, 6.384).

yellwork added a commit that referenced this issue Nov 4, 2017

Per #19.

fc9d752

yellwork added a commit that referenced this issue Nov 4, 2017

Per #19.

1d3595a

There are some lingering unclears in these episodes.

yellwork added a commit that referenced this issue Nov 4, 2017

Per #19. [incomplete]

57ac1e5

yellwork added a commit that referenced this issue Nov 4, 2017

Per #19 (incomplete)

5929178

yellwork mentioned this issue Nov 5, 2017

hide <certainty> content in XSL #35

Open

yellwork added a commit that referenced this issue Nov 17, 2017

Said tagging #19

2da73e7

yellwork added a commit that referenced this issue Nov 17, 2017

Emph per #1 and #19.

f89fe66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retagging Joyce’s dialogue #19

Retagging Joyce’s dialogue #19

yellwork commented Feb 8, 2017 •

edited

Loading

c-forster commented Feb 8, 2017 •

edited

Loading

yellwork commented Feb 9, 2017

yellwork commented Feb 9, 2017

yellwork commented Feb 9, 2017

JonathanReeve commented Feb 15, 2017

yellwork commented Nov 2, 2017

JonathanReeve commented Nov 3, 2017

yellwork commented Nov 3, 2017 •

edited

Loading

JonathanReeve commented Nov 3, 2017 via email

yellwork commented Nov 3, 2017

JonathanReeve commented Nov 5, 2017 via email

yellwork commented Nov 5, 2017

JonathanReeve commented Nov 8, 2017

Retagging Joyce’s dialogue #19

Retagging Joyce’s dialogue #19

Comments

yellwork commented Feb 8, 2017 • edited Loading

c-forster commented Feb 8, 2017 • edited Loading

yellwork commented Feb 9, 2017

yellwork commented Feb 9, 2017

yellwork commented Feb 9, 2017

JonathanReeve commented Feb 15, 2017

yellwork commented Nov 2, 2017

JonathanReeve commented Nov 3, 2017

yellwork commented Nov 3, 2017 • edited Loading

JonathanReeve commented Nov 3, 2017 via email

yellwork commented Nov 3, 2017

JonathanReeve commented Nov 5, 2017 via email

yellwork commented Nov 5, 2017

JonathanReeve commented Nov 8, 2017

yellwork commented Feb 8, 2017 •

edited

Loading

c-forster commented Feb 8, 2017 •

edited

Loading

yellwork commented Nov 3, 2017 •

edited

Loading