function estimate_oc #38

NPJuncal · 2024-01-15T10:24:14Z

Estimation of organic Carbon is done by means of linear regressions on log(organic carbon) ~ log(organic matter). It gives back a organic carbon value for each organic matter value provided. If there is a organic carbon value for that sample it return the same value, else, generates a model for that site, else, model for specie, else, model for Ecosystem. If a model can not be created due to the low number of samples (<10) it uses the equations in Howard et al. 2014 to estimate the organic carbon.

outputs:
The initial tibble or data.frame with three new columns:

one column with estimated organic carbon values (eOC) in %
the standard error of the prediction (eOC_se)
the type of model used for estimation (origin)
In addition, a plot with the relationship between organic matter and estimated organic carbon

Works with example data but there is at least one bug.

tasks:

-the function does not work if the oc column is empty. The function estimate several models and then check p values and r2 to see if they are good to use. If there are not good models utilize published equations to estimate oc. But if the oc column is empty it can not estimate any model and it breaks. In this case it should go directly to to the published equations but it does not.

-change equation to estimate oc in salt marsh data if models can not be built. Now it comes from Craft et al. 1991. New equation in Maxwell et al. (2023). https://doi.org/10.1038/s41597-023-02633-x

-add a new output: df with the parameters of the models estimated

-write more test (only a few to check that the function checks type of data provided (numeric vs not numeric), but not about the funtionality of the function itself

-check documentation

enhancement 1: the function divide the df by ecossytem, dominant species and station. Due to this, it is compulsory to provide three columns with ecosystem, species and station. The user may not have diferent species or stations. Allow user to chose if it want this groups or not. Ecosystem column need to be there anyway just in case the models are no good and need to use published functions.

…xtra columns with the estimated oc and the model used to estimate each sample, to a list with tho elements. The data frame that was the former output and a aditional object with the parameters of the models. #38

NPJuncal · 2024-01-17T11:24:21Z

@costavale
I have tried the function with a database that had the oc column empty and it work perfectly using the literature functions... could you send me the dataframe you try to use to check what the problem was?

NPJuncal · 2024-01-17T11:25:12Z

Craft et al 1991 function already changed for Maxwell et al 2023

NPJuncal · 2024-01-19T11:06:48Z

updated tasks:

-check function with @costavale data. The function does work if the oc column is empty. This was not the problem.

-Change 10 sample limit for a warning (see issue #44 ). Right now the function does not fit models if there are less than 10 samples. We can allow it and give a warning. Not all models will be used (models with low r2 or p value are deleted). Need to save which models had less than 10 samples and check with final database if any of this model were used and provide the names of the cores in the warning.

-Check if the om values used to estimate oc are within the range of the samples used to fit the models (see issue #44 ). Possible approach: add the minimum and maximum input om value of each model in a df and then check if the values used to estimate the oc in each core are between them.

-write more test (only a few to check that the function checks type of data provided (numeric vs not numeric), but not about the funtionality of the function itself

-check documentation

enhancement 1: the function divide the df by ecossytem, dominant species and station. Due to this, it is compulsory to provide three columns with ecosystem, species and station. The user may not have diferent species or stations. Allow user to chose if it want this groups or not. Ecosystem column need to be there anyway just in case the models are no good and need to use published functions.

… to 3 #38 Also added a warning if the model used had less than 10 samples and if the oc estimated is below or abouve the initial oc range to estimate the mode. Second and third task in issue #38. Seeing the warning is dificult because the warning for mangrove data with less than 5% of om prints for every sample insted of ones. I will try to fix this or add it to the issue tasks.

NPJuncal · 2024-02-23T11:53:24Z

updated tasks:

-check function with @costavale data. The function does work if the oc column is empty. This was not the problem.

-write more test (only a few to check that the function checks type of data provided (numeric vs not numeric), but not about the funtionality of the function itself

-check documentation

enhancement 1: the function divide the df by ecossytem, dominant species and station. Due to this, it is compulsory to provide three columns with ecosystem, species and station. The user may not have diferent species or stations. Allow user to chose if it want this groups or not. Ecosystem column need to be there anyway just in case the models are no good and need to use published functions.

NPJuncal · 2024-02-29T14:43:30Z

I have solved three bugs I found while using this function with new data:

fit_models would not fit site models if there was only one specie per station
choose model function should not work if site_models or species_models were null
it the input dataset has a column name as one of the temporal columns created by any of the functions they would break

I have already fixed all three. Maybe those were the one @costavale found.

costavale · 2024-02-29T15:36:42Z

Hi all, I am going to check these updates and test again the functions with that dataset.
Regarding the rest, how did we split the work? I am sorry I missed some of the previous updates

Julenasti · 2024-03-01T12:23:00Z

Yes sorry, I'm a little lost right now. Nerea, if you think I can help you with a specific job let me know. I think I'll be more helpful if the issues you assign me are code related - if you have any

NPJuncal · 2024-03-19T11:57:49Z

This function is huge and writing test that would test specific steps is a bit messi. I have decided to write test for each of the small functions within it.
For example:
test_that("fit_ecosystem_model function works as expected", {

what do you think @Pakillo ? Maybe I should put all test in the same script insted of opening a new script for each function?

NPJuncal · 2024-07-29T14:27:52Z

The function crashes
Furthermore, change the Kauffman equation for new equation

choose model function tested if the species of that sample was included in the species model, but didnt give and alternative if not. is fixed, now it skip the species model and select ecosystem model :) #38

NPJuncal mentioned this issue Jan 19, 2024

Minium number of samples to estimate om-oc models #44

Closed

NPJuncal pushed a commit that referenced this issue Mar 19, 2024

test for funtion fit_ecosystem_models, withing estimate_oc #38

8d29c84

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

function estimate_oc #38

function estimate_oc #38

NPJuncal commented Jan 15, 2024 •

edited

Loading

NPJuncal commented Jan 17, 2024

NPJuncal commented Jan 17, 2024

NPJuncal commented Jan 19, 2024

NPJuncal commented Feb 23, 2024

NPJuncal commented Feb 29, 2024

costavale commented Feb 29, 2024

Julenasti commented Mar 1, 2024

NPJuncal commented Mar 19, 2024 •

edited

Loading

NPJuncal commented Jul 29, 2024

function estimate_oc #38

function estimate_oc #38

Comments

NPJuncal commented Jan 15, 2024 • edited Loading

NPJuncal commented Jan 17, 2024

NPJuncal commented Jan 17, 2024

NPJuncal commented Jan 19, 2024

NPJuncal commented Feb 23, 2024

NPJuncal commented Feb 29, 2024

costavale commented Feb 29, 2024

Julenasti commented Mar 1, 2024

NPJuncal commented Mar 19, 2024 • edited Loading

NPJuncal commented Jul 29, 2024

NPJuncal commented Jan 15, 2024 •

edited

Loading

NPJuncal commented Mar 19, 2024 •

edited

Loading