-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Long term] Look into Supersonic query API #11
Comments
Link expired? |
@velvia still expired |
Lol. Still expired ... |
I finally found a live link - though not sure how much longer this will be up too. Download the PDF while you can. |
So, Supersonic is C++. There is also Apache Drill, but that might be C++ too. |
I think in the short term that playing with Spark's Catalyst optimizer to get columnar or at least vector wise execution is the best bet. Here is a video: http://blog.madhukaraphatak.com/anatomy-of-spark-dataframe-api/ Some thoughts:
|
More notes on where in Spark codebase to look for SQL Optimizer stages (Spark 1.5.x):
Custom execution strategies can be inserted -- see Changing the optimizer steps might require a custom optimizer and a custom SQLContext/QueryExecution class. |
A current Spark ticket for pushing down aggregations into DataSources: https://issues.apache.org/jira/browse/SPARK-12449 See Santiago's comment right above mine, for links to how Druid, Magellan, HBase and other folks are modifying Spark Catalyst plans to get aggregation done on server side. |
https://slack-files.com/files-pri-safe/T03BMF0R2-F0A3LCQ3C/api-presentation_1_.pdf?c=1441299236-4641d956f1354dd200dd184c1f1fc76fc59b9d2c
The text was updated successfully, but these errors were encountered: