Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDD_BETA sign (+ vs -) is inconsistent across serialized model types #524

Open
ssuffian opened this issue Dec 19, 2024 · 0 comments
Open

Comments

@ssuffian
Copy link
Contributor

Report installed package versions

eemeter==4.0.7
pandas==2.0.3
scipy==1.10.1
numpy==1.24.4

(This is happening in the test suite of the current repo as well)

Describe the bug

When fitting data that contains heating degree days, the model will serialize the hdd_beta as negative for a c_hdd_tidd model but positive for a hdd_tidd_cdd. This is not consistent with how it was in eemeter 2.0, where it was always serialized as positive, and also isn't internally consistent.

import pandas as pd
import random
import eemeter
from datetime import UTC

random.seed(1)
hpp = 65
df = pd.DataFrame(index=pd.date_range('2021-01-01','2021-12-31',freq='D', tz=UTC))
df['temperature'] = np.random.normal(balance_point, 5, len(df)) # random temperatures
df['observed'] = 100 - (df['temperature']).clip(0,hpp) # usage decreasing from 100 at 0 degrees -> 35 at hpp

df2 = df.copy()
cpp = 70
df2.loc[df2['temperature']>cpp,'observed'] = df['temperature'] - 35 # adds increasing usage from 35 at cpp to 65 at 100

df represents a heating only model, while df2 is the same data but also with cooling load.

daily_model4_hdd_only = eemeter.eemeter.DailyModel()
baseline_data = eemeter.eemeter.DailyBaselineData(df, is_electricity_data=False)
daily_model4_hdd_only.fit(baseline_data)
print(daily_model4_hdd_only.model['fw-su_sh_wi'].named_coeffs)

This outputs the following, note the negative sign on hdd_beta:

model_type=<ModelType.HDD_TIDD: 'hdd_tidd'> intercept=35.0 hdd_bp=64.99999332257757 hdd_beta=-0.999999372231808 hdd_k=None cdd_bp=None cdd_beta=None cdd_k=None

When we run using df2, which is the same heating data but with a cooling load too (which means eemeter uses an HDD_TIDD_CDD model instead):

daily_model4_hdd_cdd = eemeter.eemeter.DailyModel()
baseline_data = eemeter.eemeter.DailyBaselineData(df2, is_electricity_data=False)
daily_model4_hdd_cdd.fit(baseline_data)
print(daily_model4_hdd_cdd.model['fw-su_sh_wi'].named_coeffs)

Note that now hdd_beta is positive:

model_type=<ModelType.HDD_TIDD_CDD: 'hdd_tidd_cdd'> intercept=35.0 hdd_bp=65.00025894768191 hdd_beta=0.9999511395722401 hdd_k=None cdd_bp=69.99873552598811 cdd_beta=0.9996143428191359 cdd_k=None

This was not true of eemeter 2.0

design_matrix_daily = eemeter3.create_caltrack_daily_design_matrix(
    df.rename(columns={'observed':'value'})['value'].to_frame(),
    df['temperature'].rename('baseline_temperature_f').resample('h').ffill(),
)
daily_model3_hdd_only = eemeter3.fit_caltrack_usage_per_day_model(
    design_matrix_daily,
)
print(daily_model3_hdd_only.model.model_params)

Returned the following for a heating-only model:

{'intercept': np.float64(34.999999999999964), 'beta_hdd': np.float64(1.0000000000000029), 'heating_balance_point': 65}

And the following for a heating+cooling model:

design_matrix_daily = eemeter3.create_caltrack_daily_design_matrix(
    df2.rename(columns={'observed':'value'})['value'].to_frame(),
    df2['temperature'].rename('baseline_temperature_f').resample('h').ffill(),
)
daily_model3_hdd_cdd = eemeter3.fit_caltrack_usage_per_day_model(
    design_matrix_daily,
)
print(daily_model3_hdd_cdd.model.model_params)

Returned the followign for a heating and cooling model:

{'intercept': np.float64(34.99999999999997), 'beta_cdd': np.float64(0.9999999999999987), 'beta_hdd': np.float64(0.9999999999999947), 'cooling_balance_point': 70, 'heating_balance_point': 65}

Expected behavior

The heating-degree-days slope in the serialized model to be the same sign (always positive) for both c_hdd_tidd and cdd_tidd_hdd models, which also aligns with what is was for CalTrack 2.0 (always positive).

Additional context

I believe this has to do with the fact that this line exists in the hdd_tidd_cdd model code: https://github.com/openeemeter/eemeter/blob/master/eemeter/eemeter/models/daily/base_models/hdd_tidd_cdd.py#L326 but no equivalent line exists in the c_hdd_tidd code. However when I tried to simply add it, it seemed to break lots of tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant