Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K-Fold isssue #18

Open
brunopk opened this issue Apr 17, 2018 · 2 comments
Open

K-Fold isssue #18

brunopk opened this issue Apr 17, 2018 · 2 comments

Comments

@brunopk
Copy link
Collaborator

brunopk commented Apr 17, 2018

    union_ti = pd.DataFrame()
    n = round(ds.pandas_df.__len__() / k)
    errors = [0 for i in range(k)]

    for i in range(k):

        diff_df_union_ti = ds.pandas_df.loc[~ds.pandas_df.index.isin(union_ti.index), :]

        test_df = diff_df_union_ti.sample(n=min(n, len(diff_df_union_ti)))
        union_ti = pd.concat([union_ti, test_df])

        train_df = diff_df_union_ti.loc[~diff_df_union_ti.index.isin(test_df.index), :]

Este es el razonamiento paso a paso que hice del código :

  • al empezar primer iteracion: union_ti vacio
  • al terminar primer iteracion:
    • diff_df_union_ti es todo el dataset
    • test_df es una porcion tomada al azar (esta bien)
    • union_ti = test_df
    • train_df es todo lo que esta en diff_df_union_ti pero no en test_df (esta bien)
  • al terminar segunda iteracion:
    • union_ti es la union del nuevo test_df con el test_df del paso anterior
    • test_df es una porcion tomada al azar de diff_df_union_ti => podria tomar los mismos elementos que test_df del paso anterior (mal)
@constanzadieci
Copy link
Collaborator

constanzadieci commented Apr 17, 2018 via email

@brunopk
Copy link
Collaborator Author

brunopk commented Apr 17, 2018

Ahora entendí, con la corrección queda ok 👌 gracias

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants