Skip to content

Latest commit

 

History

History
149 lines (106 loc) · 3.27 KB

README.md

File metadata and controls

149 lines (106 loc) · 3.27 KB

Frameless-ext

This library contains additional syntax for Frameless.

Build Status Maven Central

Usage

Import the dependency:

libraryDependencies += "be.icteam" %% "frameless_ext" % "2.0.0"

Enable the additional syntax with the following import statement:

import be.icteam.frameless.syntax._

And now you can create TypedColumns via a simple lambda on a TypeDataSet, eg:

val tds: TypedDataset[Event] = ???

// the compiler can infer the types ;)
val userColumn = tds.tc(_.user)
val dayColumn = tds.tc(_.day)

The available aggregation functions become more discoverable in your IDE as well:

val result: TypedDataset[(String, Long, Int)] = tds.
  .groupBy(e.tds(_.user))
  .agg(
    tds.tc(_.day).countDistinct,
    tds.tc(_.hour).max)

Here is the complete example:

case class Event(user: String, year: Int, month: Int, day: Int, hour: Int)

object Demo {

  import frameless._
  import frameless.syntax._
  import frameless.functions.aggregate._
  import be.icteam.frameless.syntax._
  import org.apache.log4j.{Level, LogManager}
  import org.apache.spark.sql.SparkSession

  def initSpark: SparkSession = {

    LogManager.getLogger("org").setLevel(Level.ERROR)

    SparkSession
      .builder()
      .appName("demo")
      .master("local[*]")
      .config("spark.ui.enabled", "false")
      .getOrCreate()
  }

  def main(args: Array[String]): Unit = {

    implicit val spark = initSpark

    import spark.implicits._

    val events = spark.createDataset(List(
      Event("tim", 2020, 9, 1, 7),
      Event("tim", 2020, 9, 1, 3),
      Event("tim", 2020, 9, 2, 5),
      Event("tim", 2020, 9, 2, 3),
      Event("tiebe", 2020, 9, 1, 2)
    ))

    val e = TypedDataset.create(events)

    val result: TypedDataset[(String, Long, Long, Int, Long)] = e
      .groupBy(e.tc(_.user))
      .agg(
        count[Event](),
        e.tc(_.day).countDistinct,
        e.tc(_.hour).max,
        e.tc(_.year).sum)

    val job = result.show(10, false)
    job.run()

  }
}

Development

Compile and test:

sbt +clean; +cleanFiles; +compile; +test

Install a snapshot in your local maven repository:

sbt +publishM2

Release

Set the following environment variables:

  • PGP_PASSPHRASE
  • PGP_SECRET
  • SONATYPE_USERNAME
  • SONATYPE_PASSWORD

Leveraging the ci-release plugin:

sbt ci-release

Find the most recent release:

git ls-remote --tags $REPO | \
  awk -F"/" '{print $3}' | \
  grep '^v[0-9]*\.[0-9]*\.[0-9]*' | \
  grep -v {} | \
  sort --version-sort | \
  tail -n1

Push a new tag to trigger a release via travis-ci:

v=v1.0.5
git tag -a $v  -m $v
git push origin $v

License

Code is provided under the Apache 2.0 license available at http://opensource.org/licenses/Apache-2.0, as well as in the LICENSE file. This is the same license used as Spark and Frameless.