-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not an issue but a request #26
Comments
Hello,
A MUCH more recent version of the repo is here:
https://github.com/hlapin/mishnah-data
It would help if you could tell me what output you need (and how much). I
have various scripts that might generate the data.
Thanks HL
Hayim Lapin
Professor of History
Robert H. Smith Professor of Jewish Studies
University of Maryland
Jewish Studies: 4141 Susquehanna Hall, College Park, MD 20742 | 301 405 4975
History: 2115 Francis Scott Key Hall, College Park, MD 20742 | 301 405 4296
…On Mon, Jan 22, 2024 at 3:30 AM johnlockejrr ***@***.***> wrote:
Sorry to put this as an issue because is not but I didn't know how to get
to you other way.
Do you have the mishnah texts transcribed in raw format or text, json etc.
I have a hard time extracting them from the TEI format, I can do that with
python but I lose the deletion marks or adittions and so on. Or can you,
kindly, provide a script to do that? Python or whatever...?
Thank you so much!
—
Reply to this email directly, view it on GitHub
<#26>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIFDTIP2OVLJGVESXA4QYLYPYPRDAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4TGMZVGYZTCNI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Or the format in mishna-data/txt would do (better would be some scripts that can handle this to extract from TEI format so I can extract all the witnesses):
|
What I tried so far:
Still having problems with subtags of the text like correction, line breaks, damage, addition etc.
|
Would TEIGarage <https://teigarage.tei-c.org/#> be helpful in this case?
https://teigarage.tei-c.org/#
KEN M. PENNER (he/him)
PROFESSOR & CHAIR, RELIGIOUS STUDIES
St. Francis Xavier University
Antigonish, Nova Scotia · Canada
t 902 867 2265 · c 902 870 0697
www.stfx.ca<http://www.stfx.ca/>
[Text Description automatically generated]
I acknowledge that StFX is located in Mi’kma’ki, the ancestral and unceded territory of the Mi’kmaq People.
…________________________________
From: johnlockejrr ***@***.***>
Sent: Tuesday, January 23, 2024 6:22 AM
To: umd-mith/mishnah ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [umd-mith/mishnah] Not an issue but a request (Issue #26)
What I tried so far:
from lxml import etree
parser = etree.XMLParser()
tree = etree.parse('S00483.xml', parser)
parma = tree.getroot()
nsmap={'tei': 'http://www.tei-c.org/ns/1.0'}
to_find = set(['abbr', 'add', 'addSpan', 'am', 'anchor', 'c', 'cb', 'choice', 'damage', 'damageSpan', 'del', 'expan', 'fw', 'gap', 'label', 'lb', 'metamark', 'milestone', 'note', 'orig', 'pb', 'pc', 'ptr', 'reg', 'space', 'surplus', 'unclear', 'w'])
for line in parma.findall(".//tei:div", namespaces=nsmap):
for ab in line.findall(".//tei:ab", namespaces=nsmap):
verse_id = ab.attrib['{http://www.w3.org/XML/1998/namespace}id']
verse = []
for tag in ab.iter():
if not len(tag):
if not tag.text is None and (not hasattr(tag, '{http://www.tei-c.org/ns/1.0}label') and not hasattr(tag, '{http://www.tei-c.org/ns/1.0}am')):
verse.append(tag.text.strip())
print(f"{verse_id} {' '.join(verse)}")
Still having problems with subtags of the text like correction, line breaks, damage, addition etc.
S00483.1.1.1.1 מאמתי קורין את שמע בערבים משעה שהכהנים נכנסין לאכל בתרומתן עד סוף האשמורת הראשנה דברי רבי אליעזר וחכמין אומרין עד חצות רבן גמליאל אומר עד שיעלה עמוד השחר ׳ מעשה שבאו בניו מבית המשתה אמרו לו לא קרינו את שמע אמר להם אם לא עלה עמוד השחר מותרין אתם לקרות ׳ ולא זו בלבד אלא כל שאמרו חכמים עד חצות ׳ מצותן עד שיעלה עמוד השחר ׳ הקטר חלבים ואיברין ואכילת פסחים מצותן עד שיעלה עמוד השחר ׳ וכל הנאכלין ליום אחד מצותן עד שיעלה עמוד השחר אם כן למה אמרו חכמים עד חצות אלא להרחיק את האדם מן העבירה
S00483.1.1.1.2 ׳ מאמתי קורין את שמע בשחרים משיכירו בין תכלת ללבן רבי אליעזר אומר בין תכלת לכרתן וגומרה עד הנץ החמה ׳ ורבי יהושע ׳ אומר עד שלש שעות שכן דרך בני מלכים לעמוד בשלש שעות הקורא מיכן והלך לא הפסיד כאדם שהוא קורא בתורה
S00483.1.1.1.3 ׳ בית שמי אומרין בערב כל אדם יטו ויקרו ובבקר יעמודו ׳ שנאמר ובשכבך ובקומך ו ובית הלל ׳ אומרים כל אדם קורין כדרכן ׳ שנאמר ובלכתך בדרך אם כן למה נאמר בשכבך ובקומך ׳ אלא בשעה שדרך בני אדם שוכבין ובשעה שדרך בני אדם עומדין ׳ אמר ׳ רבי טרפון אני הייתי בא בדרך והטיתי לקרות כדברי בית שמי וסיכנתי בעצמי מפני הלסטים אמרו לו כדיי הייתה לחוב בעצמך שעברתה על דברי בית הלל
—
Reply to this email directly, view it on GitHub<#26 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AECVX2EU7M3NRUFZH3Z4LCLYP6FPPAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBVG4ZTKNBVGY>.
You are receiving this because you are subscribed to this thread.
|
Unfortunately no, it extracts text but all text critical stuff vanishes (deletions, additions etc.) |
We seem to be crossing replies. Did you see my message with attachments
over email?
…On Tue, Jan 23, 2024 at 9:01 AM johnlockejrr ***@***.***> wrote:
Would TEIGarage https://teigarage.tei-c.org/# be helpful in this case?
https://teigarage.tei-c.org/# KEN M. PENNER (he/him) PROFESSOR & CHAIR,
RELIGIOUS STUDIES St. Francis Xavier University Antigonish, Nova Scotia ·
Canada t 902 867 2265 · c 902 870 0697 www.stfx.ca<http://www.stfx.ca/
<http://www.stfx.ca%3Chttp://www.stfx.ca/>> [Text Description
automatically generated] I acknowledge that StFX is located in Mi’kma’ki,
the ancestral and unceded territory of the Mi’kmaq People.
… <#m_3505107407536050456_>
________________________________ From: johnlockejrr *@*.*> Sent: Tuesday,
January 23, 2024 6:22 AM To: umd-mith/mishnah @.*> Cc: Subscribed *@*.***>
Subject: Re: [umd-mith/mishnah] Not an issue but a request (Issue #26
<#26>) What I tried so far:
from lxml import etree parser = etree.XMLParser() tree =
etree.parse('S00483.xml', parser) parma = tree.getroot() nsmap={'tei': '
http://www.tei-c.org/ns/1.0'} to_find = set(['abbr', 'add', 'addSpan',
'am', 'anchor', 'c', 'cb', 'choice', 'damage', 'damageSpan', 'del',
'expan', 'fw', 'gap', 'label', 'lb', 'metamark', 'milestone', 'note',
'orig', 'pb', 'pc', 'ptr', 'reg', 'space', 'surplus', 'unclear', 'w']) for
line in parma.findall(".//tei:div", namespaces=nsmap): for ab in
line.findall(".//tei:ab", namespaces=nsmap): verse_id = ab.attrib['{
http://www.w3.org/XML/1998/namespace}id'] verse = [] for tag in
ab.iter(): if not len(tag): if not tag.text is None and (not hasattr(tag, '{
http://www.tei-c.org/ns/1.0}label') and not hasattr(tag, '{
http://www.tei-c.org/ns/1.0}am')): verse.append(tag.text.strip())
print(f"{verse_id} {' '.join(verse)}") Still having problems with subtags
of the text like correction, line breaks, damage, addition etc.
S00483.1.1.1.1 מאמתי קורין את שמע בערבים משעה שהכהנים נכנסין לאכל בתרומתן
עד סוף האשמורת הראשנה דברי רבי אליעזר וחכמין אומרין עד חצות רבן גמליאל אומר
עד שיעלה עמוד השחר ׳ מעשה שבאו בניו מבית המשתה אמרו לו לא קרינו את שמע אמר
להם אם לא עלה עמוד השחר מותרין אתם לקרות ׳ ולא זו בלבד אלא כל שאמרו חכמים
עד חצות ׳ מצותן עד שיעלה עמוד השחר ׳ הקטר חלבים ואיברין ואכילת פסחים מצותן
עד שיעלה עמוד השחר ׳ וכל הנאכלין ליום אחד מצותן עד שיעלה עמוד השחר אם כן
למה אמרו חכמים עד חצות אלא להרחיק את האדם מן העבירה S00483.1.1.1.2 ׳ מאמתי
קורין את שמע בשחרים משיכירו בין תכלת ללבן רבי אליעזר אומר בין תכלת לכרתן
וגומרה עד הנץ החמה ׳ ורבי יהושע ׳ אומר עד שלש שעות שכן דרך בני מלכים לעמוד
בשלש שעות הקורא מיכן והלך לא הפסיד כאדם שהוא קורא בתורה S00483.1.1.1.3 ׳
בית שמי אומרין בערב כל אדם יטו ויקרו ובבקר יעמודו ׳ שנאמר ובשכבך ובקומך ו
ובית הלל ׳ אומרים כל אדם קורין כדרכן ׳ שנאמר ובלכתך בדרך אם כן למה נאמר
בשכבך ובקומך ׳ אלא בשעה שדרך בני אדם שוכבין ובשעה שדרך בני אדם עומדין ׳ אמר
׳ רבי טרפון אני הייתי בא בדרך והטיתי לקרות כדברי בית שמי וסיכנתי בעצמי מפני
הלסטים אמרו לו כדיי הייתה לחוב בעצמך שעברתה על דברי בית הלל — Reply to this
email directly, view it on GitHub<#26 (comment)
<#26 (comment)>>,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AECVX2EU7M3NRUFZH3Z4LCLYP6FPPAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBVG4ZTKNBVGY.
You are receiving this because you are subscribed to this thread.
Unfortunately no, it extracts text but all text critical stuff vanishes
(deletions, additions etc.)
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIFDTKZT4XTU7CTODFFI2DYP67BTAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGEZDANRUHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I have failed to divert us to email... |
Would be so kind if you could send the script. Can you zip it or rar it so
the mail will allow? Thank you so much!
…On Tue, 23 Jan 2024 at 16:19, Hayim Lapin ***@***.***> wrote:
I have failed to divert us to email...
Sorry, I did not pay enough attention to the output you pointed to, but
here is a preliminary dump of Kaufmann.
out.txt <https://github.com/umd-mith/mishnah/files/14026284/out.txt>
I can send the xslt script that produced it too, but gh does not allow me
to upload files of that type.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD44GHU4OOFKAMSKQMQQBN3YP7IJBAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGI3TCOJXGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Here you go. One thing I noticed is that you will need to insert a space at the line breaks ( |
This is great, thank you so much!
…On Tue, 23 Jan 2024 at 18:53, Hayim Lapin ***@***.***> wrote:
Here you go. One thing I noticed is that you will need to insert a space
at the line breaks (<lb/>). I can make these and other simple updates if
you need me to.
*Full disclosure*: I have never actually run Saxon/XSLT on the command
line, but only either in an IDE or in a webapp.
toPlainText.zip
<https://github.com/umd-mith/mishnah/files/14028041/toPlainText.zip>
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD44GHVGMAOBEFP2OBZYH2TYP72IXAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGYYTANRQGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Just tested it and the output is great.
I succeeded with Transform.exe from SaxonHE9-9-1-8N under Windows, in Linux
should work also but I don't have a license for it yet, waiting.
Pretty simple: Transform.exe -s:S00483.xml -xsl:toPlainText.xsl
-o:S00483.txt
Thank you so much! You made my day brighter :)
…On Tue, Jan 23, 2024 at 6:53 PM Hayim Lapin ***@***.***> wrote:
Here you go. One thing I noticed is that you will need to insert a space
at the line breaks (<lb/>). I can make these and other simple updates if
you need me to.
*Full disclosure*: I have never actually run Saxon/XSLT on the command
line, but only either in an IDE or in a webapp.
toPlainText.zip
<https://github.com/umd-mith/mishnah/files/14028041/toPlainText.zip>
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD44GHVGMAOBEFP2OBZYH2TYP72IXAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGYYTANRQGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I'd love to hear what this is all in service of!
On Tue, Jan 23, 2024 at 1:48 PM johnlockejrr ***@***.***>
wrote:
… Just tested it and the output is great.
I succeeded with Transform.exe from SaxonHE9-9-1-8N under Windows, in
Linux
should work also but I don't have a license for it yet, waiting.
Pretty simple: Transform.exe -s:S00483.xml -xsl:toPlainText.xsl
-o:S00483.txt
Thank you so much! You made my day brighter :)
On Tue, Jan 23, 2024 at 6:53 PM Hayim Lapin ***@***.***>
wrote:
> Here you go. One thing I noticed is that you will need to insert a space
> at the line breaks (<lb/>). I can make these and other simple updates if
> you need me to.
> *Full disclosure*: I have never actually run Saxon/XSLT on the command
> line, but only either in an IDE or in a webapp.
>
> toPlainText.zip
> <https://github.com/umd-mith/mishnah/files/14028041/toPlainText.zip>
>
> —
> Reply to this email directly, view it on GitHub
> <#26 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AD44GHVGMAOBEFP2OBZYH2TYP72IXAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGYYTANRQGY>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIFDTJFVG6J3TSEKT2IOS3YQAAXDAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWG4YTCNJRGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
For now is a personal project for a rabbinical works database (tannaitic)
if all goes well I'll go public so other people can enjoy my work. I'll
keep in touch if you don't mind.
…On Tue, 23 Jan 2024 at 19:51, Hayim Lapin ***@***.***> wrote:
I'd love to hear what this is all in service of!
On Tue, Jan 23, 2024 at 1:48 PM johnlockejrr ***@***.***>
wrote:
> Just tested it and the output is great.
> I succeeded with Transform.exe from SaxonHE9-9-1-8N under Windows, in
> Linux
> should work also but I don't have a license for it yet, waiting.
> Pretty simple: Transform.exe -s:S00483.xml -xsl:toPlainText.xsl
> -o:S00483.txt
> Thank you so much! You made my day brighter :)
>
> On Tue, Jan 23, 2024 at 6:53 PM Hayim Lapin ***@***.***>
> wrote:
>
> > Here you go. One thing I noticed is that you will need to insert a
space
> > at the line breaks (<lb/>). I can make these and other simple updates
if
> > you need me to.
> > *Full disclosure*: I have never actually run Saxon/XSLT on the command
> > line, but only either in an IDE or in a webapp.
> >
> > toPlainText.zip
> > <https://github.com/umd-mith/mishnah/files/14028041/toPlainText.zip>
> >
> > —
> > Reply to this email directly, view it on GitHub
> > <#26 (comment)>,
>
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/AD44GHVGMAOBEFP2OBZYH2TYP72IXAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGYYTANRQGY>
>
> > .
> > You are receiving this because you authored the thread.Message ID:
> > ***@***.***>
> >
>
> —
> Reply to this email directly, view it on GitHub
> <#26 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAIFDTJFVG6J3TSEKT2IOS3YQAAXDAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWG4YTCNJRGQ>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD44GHVDUZ5QYNKFGJBY623YQABDPAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWG4ZDAMZUGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
John,
I can see why you’d prefer this text-critical edition of the text, but I also wanted to make sure you are aware of https://github.com/Sefaria/Sefaria-Export/tree/master/txt/Mishnah
Cheers,
Ken
KEN M. PENNER (he/him)
PROFESSOR & CHAIR, RELIGIOUS STUDIES
St. Francis Xavier University
Antigonish, Nova Scotia · Canada
t 902 867 2265 · c 902 870 0697
www.stfx.ca<http://www.stfx.ca/>
[Text Description automatically generated]
I acknowledge that StFX is located in Mi’kma’ki, the ancestral and unceded territory of the Mi’kmaq People.
From: johnlockejrr ***@***.***>
Sent: Tuesday, January 23, 2024 3:02 PM
To: umd-mith/mishnah ***@***.***>
Cc: Ken Penner ***@***.***>; Comment ***@***.***>
Subject: Re: [umd-mith/mishnah] Not an issue but a request (Issue #26)
For now is a personal project for a rabbinical works database (tannaitic)
if all goes well I'll go public so other people can enjoy my work. I'll
keep in touch if you don't mind.
On Tue, 23 Jan 2024 at 19:51, Hayim Lapin ***@***.***> wrote:
I'd love to hear what this is all in service of!
On Tue, Jan 23, 2024 at 1:48 PM johnlockejrr ***@***.***>
wrote:
> Just tested it and the output is great.
> I succeeded with Transform.exe from SaxonHE9-9-1-8N under Windows, in
> Linux
> should work also but I don't have a license for it yet, waiting.
> Pretty simple: Transform.exe -s:S00483.xml -xsl:toPlainText.xsl
> -o:S00483.txt
> Thank you so much! You made my day brighter :)
>
> On Tue, Jan 23, 2024 at 6:53 PM Hayim Lapin ***@***.***>
> wrote:
>
> > Here you go. One thing I noticed is that you will need to insert a
space
> > at the line breaks (<lb/>). I can make these and other simple updates
if
> > you need me to.
> > *Full disclosure*: I have never actually run Saxon/XSLT on the command
> > line, but only either in an IDE or in a webapp.
> >
> > toPlainText.zip
> > <https://github.com/umd-mith/mishnah/files/14028041/toPlainText.zip>
> >
> > —
> > Reply to this email directly, view it on GitHub
> > <#26 (comment)>,
>
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/AD44GHVGMAOBEFP2OBZYH2TYP72IXAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGYYTANRQGY>
>
> > .
> > You are receiving this because you authored the thread.Message ID:
> > ***@***.***>
> >
>
> —
> Reply to this email directly, view it on GitHub
> <#26 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAIFDTJFVG6J3TSEKT2IOS3YQAAXDAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWG4YTCNJRGQ>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD44GHVDUZ5QYNKFGJBY623YQABDPAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWG4ZDAMZUGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub<#26 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AECVX2EATUDYJZZQCQHXSCDYQACIXAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWG4ZTMNJSGA>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Yes, I'm aware of that, but they have only few witnesses for Mishnah and I
try to get as many as possible to have a skeleton, then work on the MSS
On Tue, 23 Jan 2024 at 20:46, Ken M. Penner ***@***.***>
wrote:
… John,
I can see why you’d prefer this text-critical edition of the text, but I
also wanted to make sure you are aware of
https://github.com/Sefaria/Sefaria-Export/tree/master/txt/Mishnah
Cheers,
Ken
KEN M. PENNER (he/him)
PROFESSOR & CHAIR, RELIGIOUS STUDIES
St. Francis Xavier University
Antigonish, Nova Scotia · Canada
t 902 867 2265 · c 902 870 0697
www.stfx.ca<http://www.stfx.ca/>
[Text Description automatically generated]
I acknowledge that StFX is located in Mi’kma’ki, the ancestral and unceded
territory of the Mi’kmaq People.
From: johnlockejrr ***@***.***>
Sent: Tuesday, January 23, 2024 3:02 PM
To: umd-mith/mishnah ***@***.***>
Cc: Ken Penner ***@***.***>; Comment ***@***.***>
Subject: Re: [umd-mith/mishnah] Not an issue but a request (Issue #26)
For now is a personal project for a rabbinical works database (tannaitic)
if all goes well I'll go public so other people can enjoy my work. I'll
keep in touch if you don't mind.
On Tue, 23 Jan 2024 at 19:51, Hayim Lapin ***@***.***> wrote:
> I'd love to hear what this is all in service of!
>
> On Tue, Jan 23, 2024 at 1:48 PM johnlockejrr ***@***.***>
> wrote:
>
> > Just tested it and the output is great.
> > I succeeded with Transform.exe from SaxonHE9-9-1-8N under Windows, in
> > Linux
> > should work also but I don't have a license for it yet, waiting.
> > Pretty simple: Transform.exe -s:S00483.xml -xsl:toPlainText.xsl
> > -o:S00483.txt
> > Thank you so much! You made my day brighter :)
> >
> > On Tue, Jan 23, 2024 at 6:53 PM Hayim Lapin ***@***.***>
> > wrote:
> >
> > > Here you go. One thing I noticed is that you will need to insert a
> space
> > > at the line breaks (<lb/>). I can make these and other simple
updates
> if
> > > you need me to.
> > > *Full disclosure*: I have never actually run Saxon/XSLT on the
command
> > > line, but only either in an IDE or in a webapp.
> > >
> > > toPlainText.zip
> > > <https://github.com/umd-mith/mishnah/files/14028041/toPlainText.zip>
> > >
> > > —
> > > Reply to this email directly, view it on GitHub
> > > <
#26 (comment)>,
>
> >
> > > or unsubscribe
> > > <
> >
>
https://github.com/notifications/unsubscribe-auth/AD44GHVGMAOBEFP2OBZYH2TYP72IXAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGYYTANRQGY>
>
> >
> > > .
> > > You are receiving this because you authored the thread.Message ID:
> > > ***@***.***>
> > >
> >
> > —
> > Reply to this email directly, view it on GitHub
> > <#26 (comment)>,
>
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/AAIFDTJFVG6J3TSEKT2IOS3YQAAXDAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWG4YTCNJRGQ>
>
> > .
> > You are receiving this because you commented.Message ID:
> > ***@***.***>
> >
>
> —
> Reply to this email directly, view it on GitHub
> <#26 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AD44GHVDUZ5QYNKFGJBY623YQABDPAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWG4ZDAMZUGU>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub<
#26 (comment)>,
or unsubscribe<
https://github.com/notifications/unsubscribe-auth/AECVX2EATUDYJZZQCQHXSCDYQACIXAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWG4ZTMNJSGA>.
You are receiving this because you commented.Message ID: ***@***.***>
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD44GHWBNYY5YLHZQVVSNWDYQAHRFAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWHAYDSMZWGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
If I understand what you are after, then, you may be reinventing the wheel.
My project is part of a larger project to generate automatic transcription
and alignment of variants, with a lot of work already done.
Happy to talk more, but as you say this is not an "issue." Can we *PLEASE*
move to email rather than gh?
On Tue, Jan 23, 2024 at 3:05 PM johnlockejrr ***@***.***>
wrote:
… Yes, I'm aware of that, but they have only few witnesses for Mishnah and I
try to get as many as possible to have a skeleton, then work on the MSS
On Tue, 23 Jan 2024 at 20:46, Ken M. Penner ***@***.***>
wrote:
> John,
> I can see why you’d prefer this text-critical edition of the text, but I
> also wanted to make sure you are aware of
> https://github.com/Sefaria/Sefaria-Export/tree/master/txt/Mishnah
> Cheers,
> Ken
>
> KEN M. PENNER (he/him)
> PROFESSOR & CHAIR, RELIGIOUS STUDIES
> St. Francis Xavier University
> Antigonish, Nova Scotia · Canada
> t 902 867 2265 · c 902 870 0697
> www.stfx.ca<http://www.stfx.ca/>
> [Text Description automatically generated]
> I acknowledge that StFX is located in Mi’kma’ki, the ancestral and
unceded
> territory of the Mi’kmaq People.
>
> From: johnlockejrr ***@***.***>
> Sent: Tuesday, January 23, 2024 3:02 PM
> To: umd-mith/mishnah ***@***.***>
> Cc: Ken Penner ***@***.***>; Comment ***@***.***>
> Subject: Re: [umd-mith/mishnah] Not an issue but a request (Issue #26)
>
> For now is a personal project for a rabbinical works database
(tannaitic)
> if all goes well I'll go public so other people can enjoy my work. I'll
> keep in touch if you don't mind.
>
> On Tue, 23 Jan 2024 at 19:51, Hayim Lapin ***@***.***> wrote:
>
> > I'd love to hear what this is all in service of!
> >
> > On Tue, Jan 23, 2024 at 1:48 PM johnlockejrr ***@***.***>
> > wrote:
> >
> > > Just tested it and the output is great.
> > > I succeeded with Transform.exe from SaxonHE9-9-1-8N under Windows,
in
> > > Linux
> > > should work also but I don't have a license for it yet, waiting.
> > > Pretty simple: Transform.exe -s:S00483.xml -xsl:toPlainText.xsl
> > > -o:S00483.txt
> > > Thank you so much! You made my day brighter :)
> > >
> > > On Tue, Jan 23, 2024 at 6:53 PM Hayim Lapin ***@***.***>
> > > wrote:
> > >
> > > > Here you go. One thing I noticed is that you will need to insert a
> > space
> > > > at the line breaks (<lb/>). I can make these and other simple
> updates
> > if
> > > > you need me to.
> > > > *Full disclosure*: I have never actually run Saxon/XSLT on the
> command
> > > > line, but only either in an IDE or in a webapp.
> > > >
> > > > toPlainText.zip
> > > > <
https://github.com/umd-mith/mishnah/files/14028041/toPlainText.zip>
>
> > > >
> > > > —
> > > > Reply to this email directly, view it on GitHub
> > > > <
> #26 (comment)>,
> >
> > >
> > > > or unsubscribe
> > > > <
> > >
> >
>
https://github.com/notifications/unsubscribe-auth/AD44GHVGMAOBEFP2OBZYH2TYP72IXAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGYYTANRQGY>
>
> >
> > >
> > > > .
> > > > You are receiving this because you authored the thread.Message ID:
> > > > ***@***.***>
> > > >
> > >
> > > —
> > > Reply to this email directly, view it on GitHub
> > > <
#26 (comment)>,
>
> >
> > > or unsubscribe
> > > <
> >
>
https://github.com/notifications/unsubscribe-auth/AAIFDTJFVG6J3TSEKT2IOS3YQAAXDAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWG4YTCNJRGQ>
>
> >
> > > .
> > > You are receiving this because you commented.Message ID:
> > > ***@***.***>
> > >
> >
> > —
> > Reply to this email directly, view it on GitHub
> > <#26 (comment)>,
>
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/AD44GHVDUZ5QYNKFGJBY623YQABDPAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWG4ZDAMZUGU>
>
> > .
> > You are receiving this because you authored the thread.Message ID:
> > ***@***.***>
> >
>
>
> —
> Reply to this email directly, view it on GitHub<
> #26 (comment)>,
> or unsubscribe<
>
https://github.com/notifications/unsubscribe-auth/AECVX2EATUDYJZZQCQHXSCDYQACIXAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWG4ZTMNJSGA>.
>
> You are receiving this because you commented.Message ID: ***@***.***>
>
> —
> Reply to this email directly, view it on GitHub
> <#26 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AD44GHWBNYY5YLHZQVVSNWDYQAHRFAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWHAYDSMZWGU>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIFDTIAK52E3HVJ6B7UCFTYQAJY3AVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWHAZTKNJZG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Sure, mail will be better. Yes, I think I may be reinventing the wheel, just found out about your project a week ago when trying to find a transcription of De Rossi 138. |
Can you send me an email to my gmail: johnlockejrr? |
Is this you?
Hayim Lapin
Professor of History
Robert H. Smith Professor of Jewish Studies
University of Maryland
Jewish Studies: 4141 Susquehanna Hall, College Park, MD 20742 | 301 405 4975
History: 2115 Francis Scott Key Hall, College Park, MD 20742 | 301 405 4296
…On Mon, Mar 4, 2024 at 9:45 AM johnlockejrr ***@***.***> wrote:
Can you send me an email to my gmail: johnlockejrr?
I want to ask you about some things and don't want to prolong the
discussion here.
Anyway, I presume you use kraken or eScriptorium for automated
transcriptions of Hebrew texts, do you have
any good recognition and segmentation models you can share? Thank you!
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIFDTOPJTHBT6FCG7YRLTLYWSCILAVCNFSM6AAAAABCEZD6S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZWG42DKMRYGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Yes, johnlockejrr [ at ] gmail.com |
Sorry to put this as an issue because is not but I didn't know how to get to you other way.
Do you have the mishnah texts transcribed in raw format or text, json etc. I have a hard time extracting them from the TEI format, I can do that with python but I lose the deletion marks or additions and so on. Or can you, kindly, provide a script to do that? Python or whatever...?
Thank you so much!
The text was updated successfully, but these errors were encountered: