Wednesday, May 29, 2019

Scala Spark SBT build up fat jar

As of now 2019/05/29, to create a fat jar for your spark project, here are the steps

1). create file under your_project_root/project/assembly.sbt with contents

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.9")

https://github.com/sbt/sbt-assembly#using-published-plugin

2). add one section in your_project_root/build.sbt
assemblyMergeStrategy in assembly := {
  case PathList("org","aopalliance", xs @ _*) => MergeStrategy.last
  case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
  case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
  case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
  case PathList("org", "apache", xs @ _*) => MergeStrategy.last
  case PathList("com", "google", xs @ _*) => MergeStrategy.last
  case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
  case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
  case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
  case "about.html" => MergeStrategy.rename
  case "git.properties" => MergeStrategy.rename
  case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
  case "META-INF/mailcap" => MergeStrategy.last
  case "META-INF/mimetypes.default" => MergeStrategy.last
  case "plugin.properties" => MergeStrategy.last
  case "log4j.properties" => MergeStrategy.last
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

without this section, you'll get errors like:
[error] (assembly) deduplicate: different file contents found in the following:
[error] /home/h0l01if/.ivy2/cache/org.apache.arrow/arrow-vector/jars/arrow-vector-0.8.0.jar:git.properties
[error] /home/h0l01if/.ivy2/cache/org.apache.arrow/arrow-format/jars/arrow-format-0.8.0.jar:git.properties
[error] /home/h0l01if/.ivy2/cache/org.apache.arrow/arrow-memory/jars/arrow-memory-0.8.0.jar:git.properties
[error] deduplicate: different file contents found in the following:

3). run sbt assembly