Inhere we are documenting cookbooks on how to use the platform:
-
Airflow
-
Trino (Formerly Presto SQL)
- Trino, Spark and Delta Lake (Spark 2.4.7 & Delta Lake 0.6.1) -
1.11.0
- Trino, Spark and Delta Lake (Spark 3.0.1 & Delta Lake 0.7.0) -
1.11.0
- Querying S3 data (MinIO) using MinIO -
1.11.0
- Querying Azure Data Lake Storage Gen2 data (ADLS) from Trino -
1.15.0
- Querying data in Postgresql from Trino -
1.11.0
- Querying data in Kafka from Trino (formerly PrestoSQL) -
1.14.0
- Querying HDFS data using Trino -
1.11.0
- Trino Security -
1.16.0
- Trino, Spark and Delta Lake (Spark 2.4.7 & Delta Lake 0.6.1) -
-
MinIO
-
MQTT
-
Spark
- Run Java Spark Application using
spark-submit
- Run Java Spark Application using Docker
- Run Scala Spark Application using
spark-submit
- Run Scala Spark Application using Docker
- Run Python Spark Application using
spark-submit
- Run Python Spark Application using Docker
- Spark and Hive Metastore -
1.15.0
- Spark with internal S3 (using on minIO)
- Spark with external S3
- Spark with PostgreSQL -
1.15.0
- Run Java Spark Application using
-
Delta Lake Table Format
- Spark with Delta Lake -
1.16.0
- Spark with Delta Lake -
-
Iceberg Table Format
- Spark with Iceberg -
1.16.0
- Spark with Iceberg -
-
Hadoop HDFS
-
Livy
-
Apache NiFi
-
StreamSets Data Collector
-
StreamSets DataOps Platform
-
StreamSets Transformer
-
Kafka
- Simulated Multi-DC Setup on one machine -
1.14.0
- Automate management of Kafka topics using Jikkou -
1.17.0
- Azure Event Hub as external Kafka -
1.16.0
- SASL/SCRAM Authentication with Zookeeper -
1.17.0
- SASL/SCRAM Authentication with KRaft -
1.17.0
- SASL/PLAIN Authentication -
1.17.0
- Simulated Multi-DC Setup on one machine -
-
Confluent Enterprise Platform
-
ksqlDB
-
Kafka Connect
-
Apicurio Registry
-
Oracle RDBMS
- Using private (Trivadis) Oracle EE image -
1.13.0
- Using public Oracle XE image -
1.16.0
- Using private (Trivadis) Oracle EE image -
-
Neo4J
- Working with Neo4J -
1.15.0
- Neo4J and yFiles graphs for Jupyter -
1.16.0
- Working with Neo4J -
-
Tipboard
- Working with Tipboard and Kafka -
1.14.0
- Working with Tipboard and Kafka -
-
Architecture Decision Records (ADR)
-
Jupyter
- Using Jupyter notebook with Spark and Avro -
1.16.0
- Using JupyterHub -
1.16.0
- Using Jupyter notebook with Spark and Avro -
-
Dataverse
- Dataverse with Minio (S3) storage -
1.17.0
- Dataverse with Minio (S3) storage -
-
MLflow
- Using MLflow from Jupyter -
1.16.0
- Using MLflow from Jupyter -
-
Docker Logging
- Collecting Docker Logs with Loki -
1.17.0
- Collecting Docker Logs with Loki -