What is Kryo?

Kryo serialization is one of the fastest on-JVM serialization libraries, and it is certainly the most popular in the Spark world.

To get the most out of this algorithm you must register the few classes that will have to be serialized. This will help to decrease the size of serialized objects, as their serialization will be tailor-made optimized by Kryo.

Registering your classes with Kryo in Spark is done this way:

val sparkConf = new SparkConf()
...
sparkConf.registerKryoClasses(
   Array(
       // your classes here
   )
)

How to serialize unaccessible classes?

It all goes fine until you hit private classes, or arrays of classes. If you tried to serialize classes from the Joda library probably you know what I am talking about. However there are a few tricks that will help you serialize almost anything:

// scala basic classes

Class.forName("scala.collection.immutable.$colon$colon")

Class.forName("scala.collection.immutable.Nil$")

classOf[Array[Tuple2[_,_]]]

// arrays of primitive classes

classOf[Array[String]]

classOf[Array[Int]]

// monads of primitive classes

classOf[Option[String]]

classOf[Option[Boolean]]

// case objects

Class.forName("scala.None$")

// protected static final classes

ClassTag(Class.forName("org.roaringbitmap.RoaringArray$Element"))
   .wrap.runtimeClass

// public static class FakeFileStatus

Class.forName(
   "org.apache.spark.sql.sources.HadoopFsRelation$FakeFileStatus")

ClassTag(Class.forName(
   "org.apache.spark.sql.sources.HadoopFsRelation$FakeFileStatus"))
   .wrap.runtimeClass

// joda classes

classOf[org.joda.time.tz.DateTimeZoneBuilder]

classOf[org.joda.time.tz.FixedDateTimeZone]

Class.forName(
   "org.joda.time.tz.DateTimeZoneBuilder$PrecalculatedZone")

Class.forName("org.joda.time.tz.DateTimeZoneBuilder$DSTZone")

Class.forName("org.joda.time.tz.DateTimeZoneBuilder$Recurrence")

Class.forName("org.joda.time.tz.DateTimeZoneBuilder$OfYear")

Class.forName("org.joda.time.tz.CachedDateTimeZone$Info")

ClassTag(Class.forName(
   "org.joda.time.tz.CachedDateTimeZone$Info")).wrap.runtimeClass

Won’t you serialize anything?