From 80d33ca7cc82f96ac47600bbb7cb499ef6fe4c69 Mon Sep 17 00:00:00 2001 From: Arianna Masciolini Date: Wed, 25 Sep 2024 18:21:11 +0200 Subject: [PATCH] attempt to stund website --- README.md | 185 +++---------------------------------- _config.yaml | 29 ++++++ _includes/custom-head.html | 8 ++ _includes/footer.html | 38 ++++++++ _includes/image.html | 6 ++ android-chrome-192x192.png | Bin 0 -> 1529 bytes android-chrome-512x512.png | Bin 0 -> 1195 bytes apple-touch-icon.png | Bin 0 -> 1577 bytes browserconfig.xml | 9 ++ favicon-16x16.png | Bin 0 -> 1280 bytes favicon-32x32.png | Bin 0 -> 1404 bytes favicon.ico | Bin 0 -> 15086 bytes installation.md | 32 +++++++ mstile-150x150.png | Bin 0 -> 1110 bytes pub.md | 8 ++ safari-pinned-tab.svg | 19 ++++ site.webmanifest | 19 ++++ tutorial.md | 137 +++++++++++++++++++++++++++ 18 files changed, 317 insertions(+), 173 deletions(-) create mode 100644 _config.yaml create mode 100644 _includes/custom-head.html create mode 100644 _includes/footer.html create mode 100644 _includes/image.html create mode 100644 android-chrome-192x192.png create mode 100644 android-chrome-512x512.png create mode 100644 apple-touch-icon.png create mode 100644 browserconfig.xml create mode 100644 favicon-16x16.png create mode 100644 favicon-32x32.png create mode 100644 favicon.ico create mode 100644 installation.md create mode 100644 mstile-150x150.png create mode 100644 pub.md create mode 100644 safari-pinned-tab.svg create mode 100644 site.webmanifest create mode 100644 tutorial.md diff --git a/README.md b/README.md index 2c1d6c4..2cc9824 100644 --- a/README.md +++ b/README.md @@ -1,182 +1,23 @@ -# STUnD -A GUI Search Tool for (bilingual) parallel Universal Dependencies treebanks that runs in your browser. - +--- +title: STUnD +layout: base --- -## Usage -STUnD is available at [demo.spraakbanken.gu.se/stund](https://demo.spraakbanken.gu.se/stund). - -### Specifying input files -![start screen](img/start.png) - -The "Browse..." buttons are used to specify one or two (parallel) input files, which have to be in strict [CoNNL-U format](https://universaldependencies.org/format.html). - -If you only specify the input file(s), leaving the other fields blank, clicking "search" will run a default query that returns the full treebank: - -![null search](img/null_search.png) - -### Running a query -Queries are specified in first text input field: - -![query for presens perfekt](img/presens_perfekt.png) - -(note that double clicking on it will show the query history). - -#### Monolingual queries -The example query in the picture is - -```haskell -TREE_ (FEATS_ "VerbForm=Sup") [AND [LEMMA "ha", FEATS_ "Tense=Pres"]] -``` - -This is a _simple_ or _monolingual_ query, looking for present perfect constructions in the Swedish treebank. -It reads as - -> Look for (sub)trees (`TREE_`) where the root is a supinum (`(FEATS_ "VerbForm=Sup")`) and one of its direct dependents is the present of the verb "ha" (`AND [LEMMA "ha", FEATS_ "Tense=Pres"]`). - -Now only the subtrees matching the query (often full trees in this case) are highlighted in bold (cf. last row). - -With some knowledge of Swedish, this particular query can be rewritten more concisely as - -```haskell -TREE_ (FEATS_ "VerbForm=Sup") [FORM "har"] -``` - -It is then very easy to modify the query for other structurally similar tenses: - -- `TREE_ (FEATS_ "VerbForm=Sup") [FORM "hade"]` (pluperfect) -- `TREE_ (FEATS_ "VerbForm=Sup") [FORM "ha"]` (perfect infinitive) - -Compared to the original "present perfect" query, these make it easier to see how queries work: first, the program tries to align the two treebanks to identify semantically equivalent subtrees; then the query is run on the left (Swedish) treebank and matching subtrees are returned alongside their English counterpart. -For this reason, it can find correspondences such as "att ha sagt"-"as saying" (row 4), even though the English construction is not at all similar to the Swedish one. -Unlike query matching, the alignment step is not guaranteed to find all correspondences: in the picture below, you can see that sometimes a match is found in the Swedish treebank but nothing is highlighted in the corresponding English sentence (this is the case, for instance, in the last few rows). Some other times, a correspondence is found but is incorrect. - -![perfect infinitive](img/perf_inf.png) - -Of course, monolingual queries can also be run on single treebanks: - -![monolingual query on single treebank](img/single.png) - -#### Parallel queries -Queries can also be _parallel_ or _bilingual_. For instance, we can use the following pattern to serach for sentences where a Swedish present perfect corresponds to a passive present tense in English: - -```haskell -TREE_ (FEATS_ "VerbForm={Sup->Part}") [AND [LEMMA "{ha->be}", FEATS_ "Tense=Pres"]] -``` - -This produces the following results: - -![bilingual query](img/bilingual_query.png) - -Note that the second hit here is a false positive, due to the fact that "are" in the clause "there are already..." is also a direct dependent of the main lexical verb "dropped". -This is unfortunate, but difficult to avoid given how conjuncts are treated in UD. - -The basic query language ("UD patterns") is described [here](https://github.com/harisont/deptreehs/blob/main/pattern_matching_and_replacement.md), while the extended version for parallel (bilingual) queries (`{X -> Y}` syntax) is documented [here](https://github.com/harisont/L2-UD#l1-l2-patterns). - -### Adding a replacement rule -The last input field can be used to specify a _replacement rule_ to be applied to all matching subtrees in both languages. -This can help highlighting the relevant parts of each query result and manipulate them. - -Understanding replacement rules, which are described [alongside the basic query language](https://github.com/harisont/deptreehs/blob/main/pattern_matching_and_replacement.md), can be slightly more challenging. - -As a first example, - -```haskell -PRUNE (UPOS "VERB") 0 -``` - -decreases the depth of trees rooted in a verb to 0, eliminating all dependents: - -![drastic pruning](img/replacement1.png) - -The more complex pattern - -```haskell -CHANGES [FILTER_SUBTREES TRUE (OR [DEPREL_ "aux", DEPREL_ "cop"]), PRUNE TRUE 1] -``` - -uses dependency labels to isolate verb constructions of maximum depth 1, thus producing, in conjunction with the first query, the following output: - -![replacement rule](img/replacement2.png) - -### CoNNL-U and tree mode -So far, we have seen how to use STUnD in plain text mode. -Switching to CoNNL-U mode allows inspecting the CoNNL-U (sub)trees __corresponding to bold text in the default text mode__: - -![CoNNL-U mode](img/conllu.png) - -Tree mode renders them as SVG trees: - -![Tree mode](img/tree.png) - -### Saving the search results -Query results can be saved as plain text/TSV, CoNNL-U and HTML-embedded SVG trees. -The output format depends on the mode (in the example below, for instance, results would be saved as graphical trees). - -![download links for the search results](img/download.png) - -If two treebanks are being compared, results can be saved as two separate files, one per treebank, or as a single "parallel" file: - -- in text mode, the T1 and T2 files are simple text (one sentence per line), while the parallel file is tab-separated. This makes it easy to import search results in any spreadsheet program -- in CoNNL-U mode, the output is always a new treebank, treebank that can, for instance, be used as input for more refined queries in StUnD, or simply imported into [a CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) for further inspection.[^1] In the parallel file, sentences from the two input treebanks are alternated -- similarly, parallel files in tree mode consist in alternating T1-T2 trees - -### Other use cases -So far, we have shown how to use STUnD on multilingual treebanks. -Many of the tool's functionalities, however, are also relevant in other scenarios, such as comparing learner sentences with corrections: - -![example of L1-L2 query on VALICO](img/it_gender.png) - -In the image above, you can see STUnD in action on the [VALICO treebank of L2 Italian](https://github.com/UniversalDependencies/UD_Italian-Valico), looking for feminine nouns incorrectly inflected as masculine. - -By checking "highlight discrepancies", in addition, the tool will highlight all sentence pairs matching the query that present any difference: - -![highlight discrepancies](img/papers_please.png) - -In the case of a learner corpus, "discrepant" means "erroneous", but highlighting discrepancies can also be useful in other settings, such as when comparing different analyses of the same text to resolve disagreement in a linguistic annotation project. -This functionality is, however, still very rudimentary. -In the future, the plan is to refine it to only highlight sentences where the discrepancy occurs in the subtree matching the query. - -## What's happening under the hood? -STUnD is basically a GUI for [L2-UD `match`](https://github.com/harisont/L2-UD#querying-parallel-l1-l2-treebanks), which allows running parallel queries on UD treebanks by combining [UD-based subtree alignment](https://github.com/harisont/concept-alignment) with [UD tree pattern matching](https://github.com/harisont/deptreehs/blob/main/pattern_matching_and_replacement.md). - -While L2-UD (where "L2" stands for "second language") was originally developed for comparing learner texts and corrections, the current STUnD prototype uses a slightly modified, "bilingual" or "2L" version of L2-UD, optimized for comparisons between different languages. In the future, it will be possible for the user to choose what version to use, meaning that the tool will be usable on both types of treebanks. - -STUnD is now a web application composed of a Haskell server and a JavaScript frontend. -The code for its first locally-run prototype GUI is still available (but not actively maintained) on the [`threepenny`](https://github.com/harisont/STUnD/tree/threepenny) branch. - -## Installation -If you want to compile and run STUnD directly on your Linux/Windows computer, you can use [the Haskell Tool Stack](https://docs.haskellstack.org/en/stable/) or build and run it inside a [Docker](https://www.docker.com/) container. - -### Compile using Stack -If you do have Stack, you can run - -``` -stack build -``` - -or - -``` -stack install -``` - -which also installs the executable, should be all that is needed enough. On Windows, however, you might encounter problems related to curl, STUnD's only external dependency. If that's the case, see [this](win.md). +A Search Tool for (parallel) Universal Dependencies treebanks that runs in your browser. -Afterwards you can run STUnD using either `stack run` or run the installed executable directly. +![STUnD GUI](img/replacement2.png) -### Run using Docker -If you want to use Docker containers, the simplest way is to use `docker compose`. To build and run the image you can simply type: +- [live demo](https://demo.spraakbanken.gu.se/stund) +- [tutorial](tutorial.md) +- [installation](installation.md) -``` -docker compose up stund-gui -``` +While STUnD can also be used on single dependency treebanks, its most unique feature is that it allows running parallel queries on sentence-aligned UD treebanks by combining [UD-based subtree alignment](https://github.com/harisont/concept-alignment) with [UD tree pattern matching](https://github.com/harisont/deptreehs/blob/main/pattern_matching_and_replacement.md). -This will take a while for the first time because the image has to be built. Afterwards, running the container can be started directly with the same command. +STUnD is a Haskell+JavaScript web application built by Herbert Lange and Arianna Masciolini based on an initial prototype by Arianna Masciolini. ## Citing -If you use this tool, you are welcome to cite +If you use this tool in your research, you are welcome to cite ``` @inproceedings{masciolini2024stund, @@ -188,6 +29,4 @@ If you use this tool, you are welcome to cite } ``` -A more extensive and up-to-date publication in English is currently in preparation. - -[^1]: technical note: this works because all extracted subtrees are adjusted so that they have a root node and valid (sequential) IDs. +A more extensive and up-to-date publication in English is currently in preparation. \ No newline at end of file diff --git a/_config.yaml b/_config.yaml new file mode 100644 index 0000000..0add056 --- /dev/null +++ b/_config.yaml @@ -0,0 +1,29 @@ +title: STUnD +author: + name: Arianna Masciolini + email: arianna.masciolini@gmail.com + +description: > + A Search Tool for (parallel) Universal Dependencies treebanks that runs in your browser. + +header_pages: + - tutorial.md + - installation.md + - pub.md + +show_excerpts: false + +remote_theme: jekyll/minima + +minima: + skin: dark + date_format: "%-d %B %Y" + social_links: + - { platform: github, user_url: https://github.com/harisont } + - { platform: stackoverflow, user_url: https://stackoverflow.com/users/7729724/harisont } + - { platform: instagram, user_url: https://www.instagram.com/unottica/ } + - { platform: youtube, user_url: https://www.youtube.com/c/ImparalHaskellemettilodaparte } + - { platform: rss, user_url: https://harisont.github.io/feed.xml} + +plugins: + - jekyll-feed \ No newline at end of file diff --git a/_includes/custom-head.html b/_includes/custom-head.html new file mode 100644 index 0000000..784b1c4 --- /dev/null +++ b/_includes/custom-head.html @@ -0,0 +1,8 @@ + + + + + + + + \ No newline at end of file diff --git a/_includes/footer.html b/_includes/footer.html new file mode 100644 index 0000000..f6b7d1b --- /dev/null +++ b/_includes/footer.html @@ -0,0 +1,38 @@ + diff --git a/_includes/image.html b/_includes/image.html new file mode 100644 index 0000000..913d74b --- /dev/null +++ b/_includes/image.html @@ -0,0 +1,6 @@ +
+ {{ include.description }} + +
{{ include.description }}
+
\ No newline at end of file diff --git a/android-chrome-192x192.png b/android-chrome-192x192.png new file mode 100644 index 0000000000000000000000000000000000000000..3ccdc44f5b13652b3fdbe0fcb3bc00531e1d1c9b GIT binary patch literal 1529 zcmeAS@N?(olHy`uVBq!ia0vp^2SAvE8Azrw%`s(QV64e>b`Ho)PG(@xm{>c}*5j~) z%+dJhqd{7HWdVg8TdX1lL<_HIi7a$lAsSU`u=5M|EE5$?y}qXGg9rPMu4?YyypC^O zlfnPpzLw@CTXr?;k#X)jPL&&-uTWRa5tzWSH+@bGqcP(mjQF_qLRr`+Y<(Y?8>M z;zx$fdLDmHl-_)5e$KW1uZ?k1%B9ao1d|rtI9SBKdD225f5GD-p^Izkucfwr7E9k6 zl)Se3lX%n6q+?L3w6B*H;d zOXWEKKlc5*O}Qrv=LK0Xmh8Xr?KsDU&DF-mxPu`;%P3PG%mPmy;ANgBq6Y4`8@={>AJfhyS_)QN`8SZy$eQJP{DSQu@II z0SVopZ@Z4~Zl1gMZ}qaP_xW396s_Y+?VkvY4uxcoAYTTCDm4a%h86~fUqGRT7Yq!g z1`G_Z5*Qe)W-u^_7tGleXakhk72p%%3Zy4{T^Mrc|NsB<+}^tYU2jto`iV0f<@2pvy zFCxnKm&IW3AJMBkjI$s5u8s|{U;(-U1x#SP@hWF}_?sgQclUmpeeRz;V_G=htf#Va zvh$~uSBce~{&_UBpt!PBmiO)-CWDr3wVy&W@7n+LK`{ger@@LL?|szMyY?;dKX2^j z@0<7QG20tk{##G??rCedce}d4F7yj;TT8Q&&y<__ zKzFN_xJHzuB$lLFB^RXvDF!10BO_e{LtP`w5JMv?6H6--3vB}fD+7aH!GCX~Xvob^ z$xN%nt)ZLGH65ry8e{{oV6#fjFQ_caOwTA`2q;a;$xK#o$#38pM1 zwK%ybv!En1KaauP(>FjNsWdaEL^m@}p*YCh$tT?rl-5LHs&f1B1rI+KILv zhaF^&#z!9w(&8%%DCF2;6)7NEctuNOq0*nf0YbNA+T zeCwJNelR|I@?yygCDniWnp-+PKG-4u{+;c+dyN0wT34oBw^)!hZK9uAOHN+si%M6H zW)7vU6-Rt({X~L4$i#pD@bRnOxy^gd|Fx`|y5}Urd?7CGHllK_;aH4=2P=?uI+zqjFVC>eLf$ftVshJd) zpS@9eLB8~k9eLTGUoeBivm0q3 zPLj8~3sV|*O$tbPfk$L90|U1(2s1Lwnj--eWH0gbb!C6XD9dALP~~E^9jHgh)5S5Q z;?~bP0l+XkKeOZN^ literal 0 HcmV?d00001 diff --git a/apple-touch-icon.png b/apple-touch-icon.png new file mode 100644 index 0000000000000000000000000000000000000000..da2a30fa7d9c903d7d470f35f8a940ebbb959684 GIT binary patch literal 1577 zcmeAS@N?(olHy`uVBq!ia0vp^TR@nD8Ax&oe*=FqN1tS*OYzmVE@ro&E1>V@vUo8 z_`&$-$%`c~lvMxeYi{ZI_+W?p`**hQ?lJyzYh9Uk-C{x3w26LdEjf9iFDhL*nmLrZ zRvhuE^%DvHAQS)n!^f|B=Qi&-|JSl=>YkGf^BrtXmmF5Qr!eo{mXdS7j|hfM5_weo z$go+@05)6 z*H(WLZ~J`fl+pD}KIKkV9mf_1Nz*BTjja}Ir8LzpZJGb!1HWb6#2BNUrxKn-IEZSg z9OwVXzJIqV_hjL`APdHl{WrcH=eV$$+vN4OzqL8H&k8Gi;EBo1t>4Ocre;!He)dM? z1^LoDcI0J$k}KHB%wzL%lA>i$!?OJW4EN5zm|f)XUsoWixH|jo<1dUS0>W2HKX@P@ zp&RsV*YVxWbNBwOUUv09f6I)bb$qG)6M+$-kn9oU%fL{j#=y|f!octgDAe$RfuYoZ zf#FpG1B2BJ1_tqhIlBUFfD%^&d_r7-^klCK!wLQW|G#|p(nmm#xt0X^1p^Z&4?iCh zCo3Dr@{8xr?^kNMYH}PFR-vT+D1s;*b3=G`DAk4@xYmNj^kiEpy z*OmPlqb!f1{&yd9C!n4NPZ!6Kid%2*oc259Aky%#PdDQtmy!r~sCHAsBiz_ZVNwv%@lu?3#UuSc6&Wl-@^Q1F3^}y{wwwl}pNppN8%ggmA z+**H4)6c3D`k>#VF8RWyGbv! z-+ATa%TLy5m#te^UHh@-f09BT2m9eQFN$)nOxoqM?pV>pZ--;GpXcYjKO(yLcTdLO z%w3O7maARfQwTQ(595DtUB{DGBMXSFt9Q(;F0b%LeY?$pOTqYiCaUh zn`s|VgEYtnU_oe=oL^8`l$oAU!VplJl#`jP;F6!4n3=b2qG2>pnG#G{N@{U(QD#9& zW_})nyQgn}LQ-jFPKj=2o + + + + + #da532c + + + diff --git a/favicon-16x16.png b/favicon-16x16.png new file mode 100644 index 0000000000000000000000000000000000000000..e470934cb9b66da7a450d7d4fcc259562af0d399 GIT binary patch literal 1280 zcmeAS@N?(olHy`uVBq!ia0vp^0wB!63?wyl`GXl47;7?}odYtHlYs&gYbV-z9Cna7 z8XtW$NQFqN1tS*OYzmVE@ro&E1>V@vUo8 z_`&$-$%`c~lvMxeYi{ZI_+W?p`**hQ?lJyzYh9Uk-C{x3w26LdEjf9iFDhL*nmLrZ zRvhuE^%DvHAQS)n!^f|B=Qi&-|JSl=>YkGf^BrtXmmF5Qr!eo{mXdS7j|hfM5_weo z$go+@05)6 z*H(WLZ~J`fl+pD}KIKkV9mf_1Nz*BTjja}Ir8LzpZJGb!1HWb6#2BNUrxKn-IEZSg z9OwVXzJIqV_hjL`APdHl{WrcH=eV$$+vN4OzqL8H&k8Gi;EBo1t>4Ocre;!He)dM? z1^LoDcI0J$k}KHB%wzL%lA>i$!?OJW4EN5zm|f)XUsoWixH|jo<1dUS0>W2HKX@P@ zp&RsV*YVxWbNBwOUUv09f6I)bb$qG)6M@m8kn9oU%fL{j#=y|f!octgDAe$RfuYoZ zf#FpG1B2BJ1_tqhIlBUFfD$VLd_r7-^klCKgB<<;|NqqSir8Zel0#W7Q>gjYrUIlEIWMf z%K=SbO!9VjF*n$;!yU-sEbxddW?*M}If0xzElleei;0 zZ|wPR3)zJ&OVcelPR3YCEy!@3zPe)XO}9=q%>_B?s<)qe?TcCQYv~y-gKhB(EE10O zE&s&4YSuA}^|QA#KV&+0P?hCir{Bk0*->{-wcUx%>f%&wn4MHN{mV~>jPDKh+w2&R zG4b~L$cnB3I#0F4HKHUXu_VKa*w7#dlbSX!BwYa19?85rnwe9go0+Fj9OUlglkNx%aRyPC>fFS<(zL|n zlG38oBCv9wc@s_n74X3oejJGDp}?H+U@Y(qnifE?Dx($#g2v3Ib`Ho)PG(@xm{>c}*5j~) z%+dJhqd{7HWdVg8TdX1lL<_HIi7a$lAsSU`u=5M|EE5$?y}qXGg9rPMu4?YyypC^O zlfnPpzLw@CTXr?;k#X)jPL&&-uTWRa5tzWSH+@bGqcP(mjQF_qLRr`+Y<(Y?8>M z;zx$fdLDmHl-_)5e$KW1uZ?k1%B9ao1d|rtI9SBKdD225f5GD-p^Izkucfwr7E9k6 zl)Se3lX%n6q+?L3w6B*H;d zOXWEKKlc5*O}Qrv=LK0Xmh8Xr?KsDU&DF-mxPu`;%P3PG%mPmy;ANgBq6Y4`8@={>AJfhyS_)QN`8SZy$eQJP{DSQu@II z0SVopZ@Z4~Zl1gMZ}qaP_xW396s_Y+?VkvWlw^+}Uj~LMH3o);76yi2K%s^g3=E|P z3=FRl7#OT(FffQ0%-I!a1C%%u;1l8sq$hh_7zXJ7|NrhMS}lP7aViP&3ua(ok(TG< zl#^xWmtsEp=s}9Z?T6RR3LftjDfImL)wW)|XyTk%JO2LO+*h2yHTB2$4plp=(j29R zMoY~u?whx^2{XOdum1!zjWNmF-NiZm0&fzK!&%@FS2H46!(UcT%MAK?i{r<2AoIgu+vbBf{^5a|k8<|KD%qrP#Vh zW9gL2SFe74TUZ_?P?xMAq}p=Cp`oH_LJF6wRIiWNItP>K;-6W+bgOz!I<27Qh!-7C(@)PIp@Do!i{G`>L4qu6$kX-2G9H;pM5Sw>cj=ZqH5i4S)X3VpsBR z|L6PcIbRz)1|1ejGHs?{kAzyIuL;u6d{nqNK%+^f{E-hIN^r~u! zYeY#(Vo9o1a#1RfVlXl=GSW3L)HSjUF*LF=v9vNV*ETS)GBDtg?lnTukei>9nO2Eg zL#>->A5eod$Od59VwId9UOP@0sJnXKTFpPQJOw{4uxGI!2w_nfo4CcAUsvorIY z=b2~bIdjgLnS0ilHd8PwR~pii`DnQ@Ym70aQlmWInD3FzC)N4EnZ_JP;X!mTF#{lZ z?O0Y%!ru^q5P=YZ5P`{!z`4G+SHnxN*WyL;yP;#Eyi->v41(=PdHGtA*AKRbA@cAi zvaY=Kx^~pH0si**k?*~(e%dm&q5Ua>MVvzXJ#m@>w11)^nXjG@|9o-FVnvMG;R)CT{T9!Ve+PW` z?ayVC07YZ07>D(JalqxXfc{gSB)ARdt^e`r;j05K zz*z0fD%blWv_IvA3Gq*iyD)xIjvc9m@e{^RhT)P>4dW+_pTwZ>2pA_z;Zf+aSVR6P z_zJ!O%_D4vDf{LRnk)V6=*muYdu6;i3RmQq@x;kMH1R(+; z0wDq+0^^QAR6C5xn^p@^^krmBt2AopMNy>~p`SjzJg=rdYD|ypCOa)uiuL}aOQZHm z&&67Q6v6Cq)C;i$y>RP0E$=T|D1;+Kpb-J)N(FG&fA?e!5OvH3cRe_xRXx}&hR;Fk z-wTa&pp~6_9y@&>^Q~3KY4QtPJ-3Wb5iWwg=G*7$wJKv@cIr@HTCUmAneP7Q5rdZ@ zo~vt}@R%KM$`8RwIMv_;=>v{VFEZKcIw;k6m~HFi%~OxNFK`Xny1eyewVQ7K>95vt z`Ym%EJPrpS{*6%g>2=Wc)j8=%w`n|Njj^pyUGIYSfpnck1?1O8ZHUiH_k>-?c=FWs zq2tT;J68VGczoKn!GDPVbQyp2b1v+HS3rB-O5XzAgW>}?12-YQ_w+3K1avOe)4NIb z;#C-cmh&iditwNF56-uI|DaQbxP3||Z2mFD{V$<^`tIaM&~;F{%XuDZZt)%b&_JQJgV?LxaAUml&Apf+e6#LTsIk>3rS*7^I%w>vtTN5%F^7em3;3HcBK z?Jl@SrCcrAj1F~w5smglO&o5Eq83z!x{EfhAC2s0+}fwOZC|Y6G~>ZNv=)Y7 z3nbRF$h3am3ess>x5yQ;GDPq(BzlVMF3?(4I&ZkmL$(7xgM%;wbf0eoeN(J=m;4_E z>1_4b>%4r_e3PyMeYgD^)c&V~RL>ykjQPCaoculv>gyG_44Xl+rstL029?p#H;%u9 z?5Yh$Nc9_ZpU=jYp(DTcJ6Q1&H(-;k>&v5)%O1PhH7DH;@@2O{{m|ThvwV{J#)s@Q zS9lN9hSx#s2E}kcY=iefI$PYfBP)aYsqxbW`dv&h)HPfIy1u7DI==DKi~JaDZ?y3~ w>A`ET2vp||kj`4y|Nm94?}2I`)R^I6V@9A_u2U%*WAn=%W5meyy1s1w1;?NMQuI$ekWqAw@s$8tL1Eu$Qx;TbZ+-@hDqK< z6}=q~&Hn!ve`>nD)?s;2-03-=?_4$gYa3)7q6IV#4ZOc}p}^GCNJ{P4G4)HGol!AC z{MM@`2k(5Zm03}xW@;_2W_L#XrOQV{z04z>t!eLed#ySrvhrxynT;Z9UEeOt9NZ*R zbV9Kmh4?tjElpfld+yb=@A=uUr9a;A&pzq${qywoi6@_h0Zovao2|NUZQlDO``#au zsZLD#|G*$;<=+#v*Y{t#s~Pv^g~j*a(?BD4o+ZQ0mc7^K zFJ*rF`JU^$<#o2_D?;DZDiRbe*^M0?Jv~1SFu+DnJ zUESWV`ggu(LJMLPZo~Gc{4Ob1Ph9*cqydyrEpd$~Nl7e8wMs5Z1yT$~21Z7@28Oyu zmLY~lRwkBKCKlQT237_Jzk>hXM$wR)pOTqYiCaTApKCf$LwHq4L`hI$xk5ovep+Tu zszOO+L8?M#K}j+LL&coOpLjS5!!$Hb`JX=H`80@uS(#fenOj&{*n6@Fv#^3ogUR6( uX64Nx3a4*eIdSC75t$?GryD#Lc + + + +Created by potrace 1.14, written by Peter Selinger 2001-2017 + + + + + + + diff --git a/site.webmanifest b/site.webmanifest new file mode 100644 index 0000000..b20abb7 --- /dev/null +++ b/site.webmanifest @@ -0,0 +1,19 @@ +{ + "name": "", + "short_name": "", + "icons": [ + { + "src": "/android-chrome-192x192.png", + "sizes": "192x192", + "type": "image/png" + }, + { + "src": "/android-chrome-512x512.png", + "sizes": "512x512", + "type": "image/png" + } + ], + "theme_color": "#ffffff", + "background_color": "#ffffff", + "display": "standalone" +} diff --git a/tutorial.md b/tutorial.md new file mode 100644 index 0000000..180dc85 --- /dev/null +++ b/tutorial.md @@ -0,0 +1,137 @@ +--- +title: Getting started with STUnD +layout: base +--- + +## Specifying input files +![start screen](img/start.png) + +The "Browse..." buttons are used to specify one or two (parallel) input files, which have to be in strict [CoNNL-U format](https://universaldependencies.org/format.html). + +If you only specify the input file(s), leaving the other fields blank, clicking "search" will run a default query that returns the full treebank: + +![null search](img/null_search.png) + +## Running a query +Queries are specified in first text input field: + +![query for presens perfekt](img/presens_perfekt.png) + +(note that double clicking on it will show the query history). + +### Monolingual queries +The example query in the picture is + +```haskell +TREE_ (FEATS_ "VerbForm=Sup") [AND [LEMMA "ha", FEATS_ "Tense=Pres"]] +``` + +This is a _simple_ or _monolingual_ query, looking for present perfect constructions in the Swedish treebank. +It reads as + +> Look for (sub)trees (`TREE_`) where the root is a supinum (`(FEATS_ "VerbForm=Sup")`) and one of its direct dependents is the present of the verb "ha" (`AND [LEMMA "ha", FEATS_ "Tense=Pres"]`). + +Now only the subtrees matching the query (often full trees in this case) are highlighted in bold (cf. last row). + +With some knowledge of Swedish, this particular query can be rewritten more concisely as + +```haskell +TREE_ (FEATS_ "VerbForm=Sup") [FORM "har"] +``` + +It is then very easy to modify the query for other structurally similar tenses: + +- `TREE_ (FEATS_ "VerbForm=Sup") [FORM "hade"]` (pluperfect) +- `TREE_ (FEATS_ "VerbForm=Sup") [FORM "ha"]` (perfect infinitive) + +Compared to the original "present perfect" query, these make it easier to see how queries work: first, the program tries to align the two treebanks to identify semantically equivalent subtrees; then the query is run on the left (Swedish) treebank and matching subtrees are returned alongside their English counterpart. +For this reason, it can find correspondences such as "att ha sagt"-"as saying" (row 4), even though the English construction is not at all similar to the Swedish one. +Unlike query matching, the alignment step is not guaranteed to find all correspondences: in the picture below, you can see that sometimes a match is found in the Swedish treebank but nothing is highlighted in the corresponding English sentence (this is the case, for instance, in the last few rows). Some other times, a correspondence is found but is incorrect. + +![perfect infinitive](img/perf_inf.png) + +Of course, monolingual queries can also be run on single treebanks: + +![monolingual query on single treebank](img/single.png) + +### Parallel queries +Queries can also be _parallel_ or _bilingual_. For instance, we can use the following pattern to serach for sentences where a Swedish present perfect corresponds to a passive present tense in English: + +```haskell +TREE_ (FEATS_ "VerbForm={Sup->Part}") [AND [LEMMA "{ha->be}", FEATS_ "Tense=Pres"]] +``` + +This produces the following results: + +![bilingual query](img/bilingual_query.png) + +Note that the second hit here is a false positive, due to the fact that "are" in the clause "there are already..." is also a direct dependent of the main lexical verb "dropped". +This is unfortunate, but difficult to avoid given how conjuncts are treated in UD. + +The basic query language ("UD patterns") is described [here](https://github.com/harisont/deptreehs/blob/main/pattern_matching_and_replacement.md), while the extended version for parallel (bilingual) queries (`{X -> Y}` syntax) is documented [here](https://github.com/harisont/L2-UD#l1-l2-patterns). + +## Adding a replacement rule +The last input field can be used to specify a _replacement rule_ to be applied to all matching subtrees in both languages. +This can help highlighting the relevant parts of each query result and manipulate them. + +Understanding replacement rules, which are described [alongside the basic query language](https://github.com/harisont/deptreehs/blob/main/pattern_matching_and_replacement.md), can be slightly more challenging. + +As a first example, + +```haskell +PRUNE (UPOS "VERB") 0 +``` + +decreases the depth of trees rooted in a verb to 0, eliminating all dependents: + +![drastic pruning](img/replacement1.png) + +The more complex pattern + +```haskell +CHANGES [FILTER_SUBTREES TRUE (OR [DEPREL_ "aux", DEPREL_ "cop"]), PRUNE TRUE 1] +``` + +uses dependency labels to isolate verb constructions of maximum depth 1, thus producing, in conjunction with the first query, the following output: + +![replacement rule](img/replacement2.png) + +## CoNNL-U and tree mode +So far, we have seen how to use STUnD in plain text mode. +Switching to CoNNL-U mode allows inspecting the CoNNL-U (sub)trees __corresponding to bold text in the default text mode__: + +![CoNNL-U mode](img/conllu.png) + +Tree mode renders them as SVG trees: + +![Tree mode](img/tree.png) + +## Saving the search results +Query results can be saved as plain text/TSV, CoNNL-U and HTML-embedded SVG trees. +The output format depends on the mode (in the example below, for instance, results would be saved as graphical trees). + +![download links for the search results](img/download.png) + +If two treebanks are being compared, results can be saved as two separate files, one per treebank, or as a single "parallel" file: + +- in text mode, the T1 and T2 files are simple text (one sentence per line), while the parallel file is tab-separated. This makes it easy to import search results in any spreadsheet program +- in CoNNL-U mode, the output is always a new treebank, treebank that can, for instance, be used as input for more refined queries in StUnD, or simply imported into [a CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) for further inspection.[^1] In the parallel file, sentences from the two input treebanks are alternated +- similarly, parallel files in tree mode consist in alternating T1-T2 trees + +## Other use cases +So far, we have shown how to use STUnD on multilingual treebanks. +Many of the tool's functionalities, however, are also relevant in other scenarios, such as comparing learner sentences with corrections: + +![example of L1-L2 query on VALICO](img/it_gender.png) + +In the image above, you can see STUnD in action on the [VALICO treebank of L2 Italian](https://github.com/UniversalDependencies/UD_Italian-Valico), looking for feminine nouns incorrectly inflected as masculine. + +By checking "highlight discrepancies", in addition, the tool will highlight all sentence pairs matching the query that present any difference: + +![highlight discrepancies](img/papers_please.png) + +In the case of a learner corpus, "discrepant" means "erroneous", but highlighting discrepancies can also be useful in other settings, such as when comparing different analyses of the same text to resolve disagreement in a linguistic annotation project. +This functionality is, however, still very rudimentary. +In the future, the plan is to refine it to only highlight sentences where the discrepancy occurs in the subtree matching the query. + +[^1]: technical note: this works because all extracted subtrees are adjusted so that they have a root node and valid (sequential) IDs.