Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for calmar on number of positive value #324

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# Changelog
### 3.0.2 [#324](https://github.com/openfisca/openfisca-survey-manager/pull/324)

# 3.0.1 [#322](https://github.com/openfisca/openfisca-survey-manager/pull/322)
* New feature
- Introduce the possibility to use calmar on variables with a suffix "_number", with target the weighted sum of the number of positive value of the variable. Only the variable without "_number" should be in the tax benefit system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't catch this change.
I would use another suffix like _positive.
But the best solution is to use an expression like 'X > 0'. And check the type of the result of the variable or the expression. If it is a boolean result than target on count otherwise on mass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is for instance to calibrate on the number of wages. Do you mean we could use ' > 0' as suffix, or something else ? The problem I was trying to tackle is to get back the original variable to use on it the adaptative_calculate_variable when calibrating on a number of entities with a positive value.

Copy link
Member

@benjello benjello Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are at least two problems.

  • the _number suffix is IMHO inappropriate (at least use _positive_count or _strictly_positive_count)
  • dealing only with strictly positive value could be easily generalized, so it would be nice to have it

You can use any expression like 'wage > 0' as a key for the margin and do the computation (just replace wage with self.simulationadaptative_calculate_variable(wage, period = period) and evaluate. You can then use any suffix you like if it is linked to the expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I understand better. I try to implement it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen your changes. This is better. But can't we deal with more general expressions ?
If you are in a hurry, we can leave it as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of expressions do you have in mind ? Here we can already deal with inequalities and arithmetic changes, the only constraint being that it starts with a name of a variable (we can do something like : 'var ** 2 * (var>0)' for example). I will be happy to generalize it, but I don't see in which direction this could be useful. Maybe using several variables at once, but in this case it is more natural to create a new variable in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine you have a lot of targets you want to satisfy with as many expressions for example conditionnaly on satisfying some boolean expressions (male average salary, women averay salary, age dependency, etc).
It may be very tiedous to create new variables.
And you can use numexpr to help you with that.
But again, if you do not need it now, it is okay to leave it as is.

Copy link
Contributor Author

@sylvainipp sylvainipp Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to generalize by allowing for several variables of the same entity in the expression. The code doesn't allow text content outside of openfisca variable names (no sum for instance), but I think it is already quite better (one can calibrate on (sex == 1) * salary and (sex == 1) * (salary > 0), even if not directly on np.mean((sex == 1) * salary) or (sex == 1) * salary * weights/ sum(weights * (salary == 1)). Thanks for your advice ! I will still search a bit but I'm not sure I will be able to separate correctly the openfisca variable from code words and to keep a good level of error handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that OK for you @benjello, or is it preferable to allow for text outside variables (for instance by searching explicitely if the string is in the list of the tax and benefit system variables) ? I think it might be error-prone (and make the simulation longer), but if they are some use-case it might be worth it. Thanks again for the improvement you propose !


### 3.0.1 [#322](https://github.com/openfisca/openfisca-survey-manager/pull/322)

* Technical changes
- Fix build.
Expand Down
25 changes: 17 additions & 8 deletions openfisca_survey_manager/calibration.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,9 @@ def __init__(self, simulation, target_margins, period, target_entity_count = Non
variable_instance_by_variable_name = simulation.tax_benefit_system.variables
entities = set(
variable_instance_by_variable_name[variable].entity.key
for variable in margin_variables
for variable in margin_variables if not variable.endswith("_number")
)
assert all([(len(variable.split("_number")) < 3) for variable in margin_variables]), "invalid name of a variable for calibration, with '_number' not only in last position"
if entity is not None:
entities.add(entity)
self.entities = list(entities)
Expand Down Expand Up @@ -131,8 +132,13 @@ def _build_calmar_data(self) -> dict:
data[self.target_entity][self._initial_weight_name] = self.initial_weight * self.filter_by
period = self.period
for variable in self.margins_by_variable:
assert variable in self.simulation.tax_benefit_system.variables
data[self.simulation.tax_benefit_system.variables[variable].entity.key][variable] = self.simulation.adaptative_calculate_variable(variable, period = period)
assert variable in self.simulation.tax_benefit_system.variables or variable.split("_number")[0] in self.simulation.tax_benefit_system.variables
if variable.endswith("_number"):
assert self.simulation.tax_benefit_system.variables[variable.split("_number")[0]].value_type in [int, float], "variable with suffix 'number' should be float-like so we can separate the positive values"
value = (self.simulation.adaptative_calculate_variable(variable.split("_number")[0], period = period) > 0).astype(int)
else:
value = self.simulation.adaptative_calculate_variable(variable, period = period)
data[self.simulation.tax_benefit_system.variables[variable].entity.key][variable] = value

if len(self.entities) == 2:
for entity in self.entities:
Expand Down Expand Up @@ -209,8 +215,8 @@ def set_target_margin(self, variable, target):
"""
simulation = self.simulation
period = self.period
assert variable in simulation.tax_benefit_system.variables
variable_instance = simulation.tax_benefit_system.variables[variable]
assert variable in simulation.tax_benefit_system.variables or variable.split("_number")[0] in simulation.tax_benefit_system.variables
variable_instance = simulation.tax_benefit_system.variables[variable.split("_number")[0]]

filter_by = self.filter_by
target_by_category = None
Expand Down Expand Up @@ -275,10 +281,13 @@ def _update_margins(self):
filter_by = self.filter_by
initial_weight = self.initial_weight

value = simulation.adaptative_calculate_variable(variable, period = period)
if variable.endswith("_number"):
value = (simulation.adaptative_calculate_variable(variable.split("_number")[0], period = period) < 0).astype(int)
else:
value = simulation.adaptative_calculate_variable(variable, period = period)
weight_variable = simulation.weight_variable_by_entity[target_entity]

if len(self.entities) == 2 and simulation.tax_benefit_system.variables[variable].entity.key != self.target_entity:
if len(self.entities) == 2 and simulation.tax_benefit_system.variables[variable.split("_number")[0]].entity.key != self.target_entity:
value_df = pd.DataFrame(value)
id_variable = self.parameters["id_variable_link"]
value_df[id_variable] = simulation.adaptative_calculate_variable(id_variable, period = period)
Expand All @@ -297,7 +306,7 @@ def _update_margins(self):
('initial', initial_weight),
]

variable_instance = simulation.tax_benefit_system.get_variable(variable)
variable_instance = simulation.tax_benefit_system.get_variable(variable.split("_number")[0])
assert variable_instance is not None
if variable_instance.value_type in [bool, Enum]:
margin_items.append(('category', value))
Expand Down
3 changes: 2 additions & 1 deletion openfisca_survey_manager/tests/test_calmar.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,8 @@ def create_margins(entities = 1):
1: 30,
2: 50,
},
'Z': 140.0
'Z': 140.0,
'Z_number': 80,
}
if entities == 2:
margins_by_variable['C'] = 85
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "OpenFisca-Survey-Manager"
version = "3.0.1"
version = "3.0.2"
description = "A tool for managing survey/administrative data and import them in OpenFisca"
readme = "README.md"
keywords = ["microsimulation", "tax", "benefit", "rac", "rules-as-code", "survey", "data"]
Expand Down