Data Science with Kotlin#

Data Science has rapidly evolved over the years, with numerous programming languages available to perform data analysis tasks. Python has become a go-to language for data scientists due to its extensive libraries and tools like NumPy or pandas. On the other hand, Kotlin, a language initially designed for Android development, has gradually gained popularity in other domains, including data science recently.

Python for Data Science#

Python has been the dominant language for data science for more than twenty years, and there are several reasons why it remains the preferred choice for many data scientist:

  • Python has a large and active community of developers.

  • Extensive libraries and tools, especially for data science.

  • Easy to learn and use.

  • Flexibility due to Python’s general purpose nature.

The most used and well known libraries for data science with python are:

Kotlin for Data Science#

While Kotlin is a relatively new language that has been gaining popularity, the ecosystem of libraries for data-related tasks created by the Kotlin community is rapidly expanding. Even if it is not widely adopted in the data science community, the advantages of using Kotlin include:

  • Kotlin is a statically typed langauge, improving bug prevention (spotting many errors at compile time!), code quality and performances.

  • Concise and expressive syntax, which can improve code readability and maintainability.

  • Interoperability with Java, seamlessly integrating with existing Java code and libraries.

  • Kotlin’s support for functional programming techniques, such as immutability, higher-order functions, and lambdas, can be very useful for data science tasks as it allows for concise and efficient processing of large datasets.

In this document, we will go through three libraries that should cover the Python tools mentioned above:

It’s important to notice that Multik and DataFrame are very “young” libraries, meaning that they are not as optimized and supported as Python’s data science libraries. The goal of this document is to illustrate and guide the reader on what, why and how Kotlin can be a viable alternative for data analysis tasks, taking advantage of its core features like static typing, functional programming techniques and its maintainability.


Working with Jupyter Notebook#

Nowadays it is more and more popular to work with Jupyter Notebooks for Data Science projects. Data visualization is a key aspect about this job, and with a notebook, it’s very easy to load, process, manipulate and visualize data.

Each notebook has to connect to a kernel, which provides the interpretation/compiling of the code inside a notebook.

There are kernels for most of the python versions, but also kernels that support the R programming language, Ruby and a lot more!

Fortunately, Kotlin Jupyter Kernel provide a kernel that make possible the use kotlin inside a Jupyter Notebook, and it adds support for libraries like Kotlin DataFrame and Lets-Plot for a proper rendering of Dataframes and Plots respectively.

In the repository linked to this page, the README.md contains some summarized instructions for downloading and enabling an environment to work with Jupyter Notebooks and Kotlin inside a notebook using Kotlin Jupyter Kernel.