From d6c9306df26c666b393b1c9ebe674f87f67c19ac Mon Sep 17 00:00:00 2001 From: Socorro Dominguez Date: Wed, 14 Aug 2024 07:42:20 -0700 Subject: [PATCH] fixing vignette for API possible problems --- vignettes/neotoma2-package.Rmd | 175 +++++++++++++++++++-------------- 1 file changed, 102 insertions(+), 73 deletions(-) diff --git a/vignettes/neotoma2-package.Rmd b/vignettes/neotoma2-package.Rmd index 16f47d3..65deb5b 100644 --- a/vignettes/neotoma2-package.Rmd +++ b/vignettes/neotoma2-package.Rmd @@ -30,13 +30,13 @@ The [Neotoma Paleoecology Database](https://www.neotomadb.org) is a domain-speci ### Resources * [Neotoma Homepage](https://www.neotomadb.org) - * The Neotoma homepage, with links to contacts, news and other tools and resources. +* The Neotoma homepage, with links to contacts, news and other tools and resources. * [Neotoma Database Manual](https://open.neotomadb.org/manual/) - * Documentation for the database itself, with examples of SQL queries and descriptions of Neotoma tables. +* Documentation for the database itself, with examples of SQL queries and descriptions of Neotoma tables. * [Neotoma (JSON) API](https://api.neotomadb.org/) - * A tool to obtain data in JSON format directly through calls to Neotoma. +* A tool to obtain data in JSON format directly through calls to Neotoma. * [Neotoma GitHub Organization](https://github.com/NeotomaDB) - * Open code repositories to see how folks are using Neotoma and what kinds of projects we're working on. +* Open code repositories to see how folks are using Neotoma and what kinds of projects we're working on. * Workshops and Code Examples ([see the section below](#workshops-and-code-examples)) ## Neotoma Data Structure @@ -186,7 +186,7 @@ We can also use `set_sites()` as a tool to update the metadata associated with a ```r # Update a value within an existing `sites` object: longer_alex[[3]] <- set_site(longer_alex[[3]], - altitude = 3000) +altitude = 3000) longer_alex ``` @@ -247,8 +247,13 @@ brazil <- '{"type": "Polygon", # functionality of the `sf` package. brazil_sf <- geojsonsf::geojson_sf(brazil) -brazil_datasets <- get_datasets(loc = brazil_sf) -brazil_datasets + +brazil_datasets <- tryCatch({ + get_datasets(loc = brazil_sf) +}, error = function(e) { + message("Failed to retrieve datasets for Brazil: ", e$message) + NULL +}) ``` Now we have an object called `brazil_datasets` that contains `r length(brazil_datasets)`. @@ -256,7 +261,11 @@ Now we have an object called `brazil_datasets` that contains `r length(brazil_da You can plot these findings! ```{r leafletBrazil} -plotLeaflet(brazil_datasets) +if (!is.null(brazil_datasets)) { + plotLeaflet(brazil_datasets) +} else { + cat("Datasets could not be retrieved due to an API error. Please try again later.") +} ``` ## Filtering Records @@ -264,17 +273,28 @@ plotLeaflet(brazil_datasets) Sometimes we take a large number of records, do some analysis, and then choose to select a subset. For example, we may want to select all sites in a region, and then subset those by dataset type. If we want to look at only the geochronological datasets from Brazil, we can start with the set of records returned from our `get_datasets()` query, and then use the `filter` function in `neotoma2` to select only those datasets that are geochronologic: ```{r filterBrazil} -brazil_dates <- neotoma2::filter(brazil_datasets, - datasettype == "geochronologic") -# or: +if (!is.null(brazil_datasets)) { + brazil_dates <- neotoma2::filter(brazil_datasets, + datasettype == "geochronologic") +} else { + cat("Datasets could not be filtered due to a previous API error. Please try again later.") +} -brazil_dates <- brazil_datasets %>% - neotoma2::filter(datasettype == "geochronologic") +# or: +if (!is.null(brazil_datasets)) { + brazil_dates <- brazil_datasets %>% + neotoma2::filter(datasettype == "geochronologic") +} else { + cat("Datasets could not be filtered due to a previous API error. Please try again later.") +} # With boolean operators: - -brazil_space <- brazil_datasets %>% neotoma2::filter(lat > -18 & lat < -16) +if (!is.null(brazil_datasets)) { + brazil_space <- brazil_datasets %>% neotoma2::filter(lat > -18 & lat < -16) +} else { + cat("Datasets could not be filtered due to a previous API error. Please try again later.") +} ``` The `filter()` function takes as the first argument, a datasets object, followed by the criteria we want to use to filter. Current supported criteria includes: @@ -308,17 +328,26 @@ brazil <- '{"type": "Polygon", # functionality of the `sf` package. brazil_sf <- geojsonsf::geojson_sf(brazil) -brazil_records <- get_datasets(loc = brazil_sf) %>% - neotoma2::filter(datasettype == "pollen" & age_range_young <= 1000 & age_range_old >= 10000) %>% - get_downloads(verbose = FALSE) - -count_by_site <- samples(brazil_records) %>% - dplyr::filter(elementtype == "pollen" & units == "NISP") %>% - group_by(siteid, variablename) %>% - summarise(n = n()) %>% - group_by(variablename) %>% - summarise(n = n()) %>% - arrange(desc(n)) +brazil_records <- tryCatch({ + get_datasets(loc = brazil_sf) %>% + neotoma2::filter(datasettype == "pollen" & age_range_young <= 1000 & age_range_old >= 10000) %>% + get_downloads(verbose = FALSE) +}, error = function(e) { + message("Failed to retrieve records for Brazil: ", e$message) + NULL +}) + +if (!is.null(brazil_records)) { + count_by_site <- samples(brazil_records) %>% + dplyr::filter(elementtype == "pollen" & units == "NISP") %>% + group_by(siteid, variablename) %>% + summarise(n = n()) %>% + group_by(variablename) %>% + summarise(n = n()) %>% + arrange(desc(n)) +} else { + cat("Records could not be retrieved due to an API error. Please try again later.") +} ``` In this code chunk we define the bounding polygon for our sites, filter by time and dataset type, and then return the full records for those sites. We get a `sites` object with dataset and sample information (because we used `get_downloads()`). We execute the `samples()` function to extract all the samples from the `sites` objects, and then filter the resulting `data.frame` to pull only pollen (a pollen dataset may contain spores and other elements that are not, strictly speaking, pollen) that are counted using the number of identified specimens (or NISP). We then `group_by()` the unique site identifiers (`siteid`) and the taxa (`variablename`) to get a count of the number of times each taxon appears in each site. We then want to `summarize()` to a higher level, just trying to understand how many sites each taxon appears in. After that we `arrange()` so that the records show the most common taxa first in the resulting variable `count_by_site`. @@ -335,14 +364,14 @@ The most simple case is a search for a publication based on one or more publicat We can use a single publication ID or multiple IDs. In either case the API returns the publication(s) and creates a new `publications` object (which consists of multiple individual `publication`s). -```{r pubsbyid} +```{r pubsbyid, eval=FALSE} one <- get_publications(12) two <- get_publications(c(12, 14)) ``` From there we can then then subset and extract elements from the list using the standard `[[` format. For example: -```{r showSinglePub} +```{r showSinglePub, eval=FALSE} two[[2]] ``` @@ -362,7 +391,7 @@ We can also use search elements to search for publications. The `get_publicatio * `limit` * `offset` -```{r fulltestPubSearch} +```{r fulltestPubSearch, eval=FALSE} michPubs <- get_publications(search = "Michigan", limit = 2) ``` @@ -370,7 +399,7 @@ This results in a set of `r length(michPubs)` publications from Neotoma, equal t Text matching in Neotoma is approximate, meaning it is a measure of the overall similarity between the search string and the set of article titles. This means that using a nonsense string may still return results results: -```{r nonsenseSearch} +```{r nonsenseSearch, eval=FALSE} noise <- get_publications(search = "Canada Banada Nanada", limit = 5) ``` @@ -378,7 +407,7 @@ This returns a result set of length `r length(noise)`. This returns the (Neotoma) ID, the citation and the publication DOI (if that is stored in Neotoma). We can get the first publication using the standard `[[` nomenclature: -```{r getSecondPub} +```{r getSecondPub, eval=FALSE} two[[1]] ``` @@ -386,7 +415,7 @@ The output will look similar to the output for `two` above, however you will see We can select an array of `publication` objects using the `[[` method, either as a sequence (`1:10`, or as a numeric vector (`c(1, 2, 3)`)): -```{r subsetPubs} +```{r subsetPubs, eval=FALSE} # Select publications with Neotoma Publication IDs 1 - 10. pubArray <- get_publications(1:10) # Select the first five publications: @@ -398,58 +427,58 @@ subPub Just as we can use the `set_sites()` function to set new site information, we can also create new publication information using `set_publications()`. With `set_publications()` you can enter as much or as little of the article metadata as you'd like, but it's designed (in part) to use the CrossRef API to return information from a DOI. -```{r setNewPub} +```{r setNewPub, eval=FALSE} new_pub <- set_publications( - articletitle = "Myrtle Lake: a late- and post-glacial pollen diagram from northern Minnesota", - journal = "Canadian Journal of Botany", - volume = 46) +articletitle = "Myrtle Lake: a late- and post-glacial pollen diagram from northern Minnesota", +journal = "Canadian Journal of Botany", +volume = 46) ``` A `publication` has a large number of slots that can be defined. These may be left blank, they may be set directly after the publication is defined: -```{r setPubValue} +```{r setPubValue, eval=FALSE} new_pub@pages <- "1397-1410" ``` ## Workshops and Code Examples * 2022 International AL/IPA Meeting; Bariloche, Argentina - * [English Language Simple Workflow](https://open.neotomadb.org/Workshops/IAL_IPA-November2022/simple_workflow.html) - * Topics: Simple search, climate gradients, stratigraphic plotting - * Spatial Domain: South America - * Dataset Types: Diatoms - * [Spanish Language Simple Workflow](https://open.neotomadb.org/Workshops/IAL_IPA-November2022/simple_workflow_ES.html) - * Topics: Simple search, climate gradients, stratigraphic plotting - * Spatial Domain: South America - * Dataset Types: Diatoms - * [English Language Complex Workflow](https://open.neotomadb.org/Workshops/IAL_IPA-November2022/complex_workflow.html) - * Topics: Chronology building, Bchron - * Spatial Domain: South America - * Dataset Types: Diatoms - * [Spanish Language Complex Workflow](https://open.neotomadb.org/Workshops/IAL_IPA-November2022/complex_workflow_ES.html) - * Topics: Chronology building, Bchron - * Spatial Domain: South America - * Dataset Types: Diatoms +* [English Language Simple Workflow](https://open.neotomadb.org/Workshops/IAL_IPA-November2022/simple_workflow.html) +* Topics: Simple search, climate gradients, stratigraphic plotting +* Spatial Domain: South America +* Dataset Types: Diatoms +* [Spanish Language Simple Workflow](https://open.neotomadb.org/Workshops/IAL_IPA-November2022/simple_workflow_ES.html) +* Topics: Simple search, climate gradients, stratigraphic plotting +* Spatial Domain: South America +* Dataset Types: Diatoms +* [English Language Complex Workflow](https://open.neotomadb.org/Workshops/IAL_IPA-November2022/complex_workflow.html) +* Topics: Chronology building, Bchron +* Spatial Domain: South America +* Dataset Types: Diatoms +* [Spanish Language Complex Workflow](https://open.neotomadb.org/Workshops/IAL_IPA-November2022/complex_workflow_ES.html) +* Topics: Chronology building, Bchron +* Spatial Domain: South America +* Dataset Types: Diatoms * 2022 European Pollen Database Meeting; Prague, Czech Republic - * [English Language Simple Workflow](https://open.neotomadb.org/Workshops/EPD-May2022/simple_workflow.html) - * Topics: Simple search, climate gradients, stratigraphic plotting, taxonomic harmonization - * Spatial Domain: Europe/Czech Republic - * Dataset Types: Pollen - * [English Language Complex Workflow](https://open.neotomadb.org/Workshops/EPD-May2022/complex_workflow.html) - * Topics: Chronology building, Bchron - * Spatial Domain: Europe/Czech Republic - * Dataset Types: Pollen +* [English Language Simple Workflow](https://open.neotomadb.org/Workshops/EPD-May2022/simple_workflow.html) +* Topics: Simple search, climate gradients, stratigraphic plotting, taxonomic harmonization +* Spatial Domain: Europe/Czech Republic +* Dataset Types: Pollen +* [English Language Complex Workflow](https://open.neotomadb.org/Workshops/EPD-May2022/complex_workflow.html) +* Topics: Chronology building, Bchron +* Spatial Domain: Europe/Czech Republic +* Dataset Types: Pollen * 2022 American Quaternary Association Meeting - * [English Language Simple Workflow](https://open.neotomadb.org/Workshops/AMQUA-June2022/simple_workflow.html) - * Topics: Simple search, climate gradients, stratigraphic plotting - * Spatial Domain: North America - * Dataset Types: Pollen - * [English Language Complex Workflow](https://open.neotomadb.org/Workshops/AMQUA-June2022/complex_workflow.html) - * Topics: Chronologies - * Spatial Domain: North America - * Dataset Types: Pollen +* [English Language Simple Workflow](https://open.neotomadb.org/Workshops/AMQUA-June2022/simple_workflow.html) +* Topics: Simple search, climate gradients, stratigraphic plotting +* Spatial Domain: North America +* Dataset Types: Pollen +* [English Language Complex Workflow](https://open.neotomadb.org/Workshops/AMQUA-June2022/complex_workflow.html) +* Topics: Chronologies +* Spatial Domain: North America +* Dataset Types: Pollen * Neotoma-charcoal Workshop, Göttingen, Germany. Authors: Petr Kuneš & Thomas Giesecke - * [English Language Workflow](https://rpubs.com/petrkunes/neotoma-charcoal) - * Topics: Simple Search, PCA, DCA, Charcoal/Pollen Correlation - * Spatial Domain: Global/Czech Republic - * Dataset Types: Pollen, Charcoal +* [English Language Workflow](https://rpubs.com/petrkunes/neotoma-charcoal) +* Topics: Simple Search, PCA, DCA, Charcoal/Pollen Correlation +* Spatial Domain: Global/Czech Republic +* Dataset Types: Pollen, Charcoal