Visualizing Data in Kotlin

Visualizing Data in Kotlin#

As DataFrame and Multik, Lets-Plot library can be imported with the magic command:

%use multik
%use dataframe
%use lets-plot

For creating a Plot object, we invoke the function ggplot(), which accepts a Map<*, *> and a generic aesthetic mapping.

val x = mk.linspace<Double>(0.0, 2.0 * kotlin.math.PI, 100)
val y = mk.math.sin(x)

val points = mapOf<String, Any>(
    "x" to x.toList(),
    "y" to y.toList()
)

ggplot(points) { x= "x" ; y = "y" } +
    geomLine() +
    ggtitle("Sin function on range 0..2𝛑") +
    ggsize(650, 250)

In Lets-Plot, the main difference with python’s Matplotlib, is the creation of the plot by layers. In Kotlin, thanks to the possibility to overload the + operator, we create the figure with a chain of additions on top of the Plot object. In the example below, we created

The plot providing a series of points, and the mapping of map’s keys to plot axes.
The layer of the line.
A layer for the title
A layer for configuring the dimensions of the figure.

This approach seems a little bit more expensive than with python:

x = np.linspace(0, 2 * np.pi, 100)
plt.plot(x, np.sin(x))

but the benefits of layers can be seen in more complicated examples.

Note

Matplotlib encourages the use of the “Object Oriented” APIs, so the following code would be preferred:

x = np.linspace(0, 2 * np.pi, 100)

fig, ax = plt.subplots()
ax.plot(x, np.sin(x))
plt.show()

See matplotlib blog for more information.

Lets-Plot Architecture#

As already said, a plot is composed by one or more Layers. Each layer is responsible for creating the objects painted on the “canvas” and each one contains:

Data: the set of data specified for all layers, or one dataset per layer.
Aesthetic Mapping: describe how variables in the dataset are mapped to the visual properties of the layer.
Geometric Object: a geometric object that represents a particular type of chart.
Statistical Transformation: computes a statistical summary on the raw input data.
Positional Adjustment: method used to compute the final coordinates of geometry.

Geometric Objects#

They are responsible for drawing in the plot. All the functions that are of the type geomXxx() create a new layer that draws the data. Every geom object has its own default parameters and behavior, see the documentation for understanding what the desired plot does or require.

The geom package contains some statXxx() methods which also create a plot layer: sometimes is more natural to use statXxx() objects instead of geomXxx() to add a new plot layer.

val rand = java.util.Random(123)
val dataset = mapOf(
    "pts" to List(100) { rand.nextGaussian() } + List(100) { 1.5 * rand.nextGaussian() },
    "cat" to List(100) { "A" } + List(100) { "B" }
)
val p = ggplot(dataset)

p + statDensity(alpha = 0.5) { x="pts" ; fill="cat" }

`stat`#

stat can be added as an argument to a geometric object to define statistical transformation. The Stat object contains all the statistical transformations that can be applied to a dataset, and it can be used like geomXxx(stat = Stat.identity).

We can apply a statistical transformation like bin, density, count, smooth and more.

Position#

It’s possible to adjust the position of data, especially in all those cases where data overlaps.

We also introduce ggbunch that let us draw multiple plots in the same figure.

Consider this dataset and it’s corresponding bar plot:

val data = mapOf(
    "v" to List(100) { rand.nextInt(5) },
    "c" to List(100) { rand.nextInt(2) }
)

val p0 = ggplot(data) +
    geomBar(alpha = 0.8) { x = "v"; fill=asDiscrete("c") }
p0

We can now set the position of the data to better visualize data:

val p1 = ggplot(data) +
    geomBar(alpha = 0.8, position = positionDodge(0.5)) { x = "v"; fill = asDiscrete("c") }

val p2 = ggplot(data) +
    geomBar(alpha = 0.8, position = positionJitter(0.2) ) { x = "v"; fill = asDiscrete("c") }

val p3 = ggplot(data) +
    geomBar(alpha = 0.8, position = positionStack() ) { x = "v"; fill = asDiscrete("c") }

val p4 = ggplot(data) +
    geomBar(alpha = 0.5, position = positionNudge() ) { x = "v"; fill = asDiscrete("c") }

val p5 = ggplot(data) +
    geomBar(alpha = 0.8, position = positionFill() ) { x = "v"; fill = asDiscrete("c") }

GGBunch()
    .addPlot(p0 + ggtitle("Without Position"), 0, 0, 500, 250)
    .addPlot(p1 + ggtitle("Dodge"), 500, 0, 500, 250)
    .addPlot(p2 + ggtitle("Jitter"), 0, 250, 500, 250)
    .addPlot(p3 + ggtitle("Stack"), 500, 250, 500, 250)
    .addPlot(p4 + ggtitle("Nudge"), 0, 500, 500, 250)
    .addPlot(p5 + ggtitle("Fill"), 500, 500, 500, 250)

Features#

The entire plot can be provided with additional features layers. The features can be grouped in the following categories:;

Scale: enables choosing a scale for each mapped variable, depending on its attributes. With scales, we can tweak things like, the axis labels, legends keys, aesthetics (like the fill color) and so on.
Coordinate System: determine how x and y aesthetics combine, to position elements in the plot. (i.e. for overriding default axes ratio we can use coordFixed(ratio = 2)).
Legend: we can customize the legend (i.e. the number of columns) by using the guide methods, or the guide argument inside a scale method. The location of the legend can be tweaked with theme’s methods.
Sampling: we can pick samples of the dataset (sampling is applied after stat transformations), and if the dataset exceeds a certain threshold, sampling is applied automatically (the samplingNone value disables any sampling for the given layer). See the sampling documentation for more.

Integration with Kotlin DataFrame#

As you might have already seen, DataFrame objects has the toMap() method, making plotting a dataframe a trivial task. Let’s see an example on how we can integrate all the libraries that we have seen all together for computing and showing the log difference of two variables.

val df = DataFrame.readCSV("../resources/example-datasets/datasets/macrodata.csv")
df.head(5)

DataFrame: rowsCount = 5, columnsCount = 14

val df1 =  df["cpi", "m1", "tbilrate", "unemp"]
df1.head(5)

DataFrame: rowsCount = 5, columnsCount = 4

We select m1 and unemp variables and make a scatter plot with a regression line (geomSmooth())

// python equivalent: `np.log(df1).diff().dropna()`
val trans_data =  df1.columns()
    .map {
        val data = it.toList()
        mk.dnarray<Double, D1>(intArrayOf(data.size)) { data[it] as Double }
    }.map { mk.math.log(it).toList() }
    .mapIndexed { idx, x -> x.zipWithNext { a, b -> b - a }.toColumn(df1.columnNames()[idx]) }
    .toDataFrame().dropNA()

ggplot(trans_data["m1", "unemp"].toMap()) { x="m1"; y="unemp" } +
    geomPoint() + geomSmooth()

The same result with Matplotlib would not be as easy as in Kotlin, unless Seaborn would be used.

Conclusions#

In this chapter we have explored Lets-Plot library, a very powerful tool for visualizing data. This library and ggplot’s APIs are used in a lot of different programming languages, so a solid knowledge of this library can be portable on other platforms.

Its behavior is quite different from Python’s Matplotlib, but if you know what you can get with Matplotlib, you can easily find a solution using Lets-Plot!

This chapter wasn’t intended to cover all capabilities of this library, but a basic understanding on how to build plots stacking together a series of layers, the ability to customize every aspect of it, and how easy it is to plot data from a DataFrame. The documentation provided from the Lets-Plot team is very good, with plenty of examples for every geometric objects, kind of plots, scales and much more.