Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: data frame with lazy relation AltrepDataFrameRelation #960

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

Antonov548
Copy link
Contributor

@Antonov548 Antonov548 commented Jan 6, 2025

Closes #949

Copy link
Collaborator

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect the test output to be unchanged? Please also add rel_filter2() .

R/relational.R Outdated Show resolved Hide resolved
src/relational.cpp Outdated Show resolved Hide resolved
projections.push_back(std::move(dexpr));
}

duckdb::rel_extptr_t rel = cpp11::as_cpp<cpp11::decay_t<duckdb::rel_extptr_t>>(rapi_rel_from_df(con, df, false));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rapi_() functions are external, this coupling is a bit too tight. Perhaps extract an internal function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just temporarily

src/relational.cpp Outdated Show resolved Hide resolved
R/relational.R Outdated Show resolved Hide resolved
R/relational.R Outdated Show resolved Hide resolved
@@ -162,6 +162,28 @@ using namespace cpp11;
return make_external_prot<RelationWrapper>("duckdb_relation", prot, res);
}

[[cpp11::register]] SEXP rapi_rel_filter2(data_frame df, duckdb::conn_eptr_t con, list exprs) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these to a separate .cpp file and keep them in the same order as here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still valid. Can you please follow up?

@krlmlr
Copy link
Collaborator

krlmlr commented Jan 7, 2025

I like the idea of rel_from_any_df(), please proceed.

}

const vector<ColumnDefinition> &AltrepDataFrameRelation::Columns() {
return columns;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Can this still be parent->...() ?

}
duckdb::rel_extptr_t rel = cpp11::as_cpp<cpp11::decay_t<duckdb::rel_extptr_t>>(rapi_rel_from_any_df(con, df, true));

auto filter = make_shared_ptr<FilterRelation>(rel->rel, std::move(filter_expr));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be wrapped in an ALTREP data frame at this stage.

src/relational.cpp Outdated Show resolved Hide resolved
@Antonov548
Copy link
Contributor Author

@krlmlr seems it works. Maybe it also have to checked with more examples/tests

@krlmlr
Copy link
Collaborator

krlmlr commented Jan 10, 2025

Current state:

drv <- duckdb::duckdb()
con <- DBI::dbConnect(drv)
df1 <- tibble::tibble(a = 1)

"mutate"
#> [1] "mutate"
df2 <- duckdb:::rel_project2(
  df1,
  con,
  list(
    {
      tmp_expr <- duckdb:::expr_reference("a")
      duckdb:::expr_set_alias(tmp_expr, "a")
      tmp_expr
    },
    {
      tmp_expr <- duckdb:::expr_constant(2)
      duckdb:::expr_set_alias(tmp_expr, "b")
      tmp_expr
    }
  )
)
"filter"
#> [1] "filter"
df3 <- duckdb:::rel_filter2(
  df2,
  con,
  list(
    duckdb:::expr_comparison(
      "==",
      list(
        duckdb:::expr_reference("b"),
        duckdb:::expr_constant(2)
      )
    )
  )
)

duckdb:::rel_from_altrep_df(df3)
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Filter [(b = 2.0)]
#>   Projection [a as a, 2.0 as b]
#>     r_dataframe_scan(0x1349a2ba8)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (DOUBLE)
#> - b (DOUBLE)

# This materializes the data frame
df2
#>   a b
#> 1 1 2

# Expecting this to use a data frame scan only, without a projection
rel2 <- duckdb:::rel_from_altrep_df(df2)
rel2
#> Error in `print()`:
#> ! RebuildRelationException

# Expecting this to use filter only, with a data frame scan based on rel2
rel3 <- duckdb:::rel_from_altrep_df(df3)
rel3
#> Error in `print()`:
#> ! RebuildRelationException

Created on 2025-01-10 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create and use adapter subclass of Relational that forwards to a parent object or to a data frame scan
2 participants