-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for calmar on number of positive value #324
base: master
Are you sure you want to change the base?
Conversation
CHANGELOG.md
Outdated
|
||
# 3.0.1 [#322](https://github.com/openfisca/openfisca-survey-manager/pull/322) | ||
* New feature | ||
- Introduce the possibility to use calmar on variables with a suffix "_number", with target the weighted sum of the number of positive value of the variable. Only the variable without "_number" should be in the tax benefit system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't catch this change.
I would use another suffix like _positive
.
But the best solution is to use an expression like 'X > 0'. And check the type of the result of the variable or the expression. If it is a boolean result than target on count otherwise on mass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is for instance to calibrate on the number of wages. Do you mean we could use ' > 0' as suffix, or something else ? The problem I was trying to tackle is to get back the original variable to use on it the adaptative_calculate_variable
when calibrating on a number of entities with a positive value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are at least two problems.
- the
_number
suffix is IMHO inappropriate (at least use_positive_count
or_strictly_positive_count
) - dealing only with strictly positive value could be easily generalized, so it would be nice to have it
You can use any expression like 'wage > 0' as a key for the margin and do the computation (just replace wage with self.simulationadaptative_calculate_variable(wage, period = period)
and evaluate. You can then use any suffix you like if it is linked to the expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I understand better. I try to implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen your changes. This is better. But can't we deal with more general expressions ?
If you are in a hurry, we can leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of expressions do you have in mind ? Here we can already deal with inequalities and arithmetic changes, the only constraint being that it starts with a name of a variable (we can do something like : 'var ** 2 * (var>0)'
for example). I will be happy to generalize it, but I don't see in which direction this could be useful. Maybe using several variables at once, but in this case it is more natural to create a new variable in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imagine you have a lot of targets you want to satisfy with as many expressions for example conditionnaly on satisfying some boolean expressions (male average salary, women averay salary, age dependency, etc).
It may be very tiedous to create new variables.
And you can use numexpr
to help you with that.
But again, if you do not need it now, it is okay to leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to generalize by allowing for several variables of the same entity in the expression. The code doesn't allow text content outside of openfisca variable names (no sum
for instance), but I think it is already quite better (one can calibrate on (sex == 1) * salary
and (sex == 1) * (salary > 0)
, even if not directly on np.mean((sex == 1) * salary)
or (sex == 1) * salary * weights/ sum(weights * (salary == 1))
. Thanks for your advice ! I will still search a bit but I'm not sure I will be able to separate correctly the openfisca variable from code words and to keep a good level of error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that OK for you @benjello, or is it preferable to allow for text outside variables (for instance by searching explicitely if the string is in the list of the tax and benefit system variables) ? I think it might be error-prone (and make the simulation longer), but if they are some use-case it might be worth it. Thanks again for the improvement you propose !
New features