FAQ

Why not X Y Z?

Libraries

Common topics: caret, tidyverse, data.table

caret is deprecated library. Its successor is tidymodels, and it’s a great successor. There is no equivalent in any other language at the moment, which makes R outstanding at the moment. It is a huge disappointment when we can still see caret based books at the top rankings in the Big Book of R.

tidyverse has been great until 2010 or so. Nowadays, big data is the norm, and dplyr and the like are simply not up for the modern tasks. C++ backend tidytable is recommended, while you can load other individual libs that are not covered, for example ggplot2, lubridate, or stringr. Individual libs are better for production anyway.

data.table: tidytable uses data.table backend with improvements and without any costs in benchmarks. data.table code is also not human-readable. Without repeating much into coding-philosophy here, feel free to go over 18.2.3 and 18.2.4 sections of the original R4DS as well as the tidy manifesto.

Languages

Common topics: other open source competitors e.g. Python, Julia

The phrase ‘Python is the second best at everything’ floats around the Internet, but it is becoming less and less agreeable as time goes on. Its performance has also been under heavy fire of criticisms in programming communities, resulting in more Cython/C backend solutions (see versions after 3.9 and promised benchmarks). However, that also came with lots of frictions in transition. This is all the while not being holistic in performance improvements, resulting in a somewhat of a disappointment within the community.

Julia’s performance was great. We should be grateful that it brought performance benchmarks and efficiencies of the programming languages/libraries/packages to be one of the priority concerns. However, once such trends and competition picked up and languages/libraries generally improved, Julia fell back into irrelevance, unfortunately. It was a nice and passionate (e.g. PyData vs. JuliaCon) community though.

What this book is not

There are infinite number of formulas to describe the laws of the universe, and one cannot discover and memorise every single one of them. Similar to this logic, since covering every models, methods, approaches, techniques etc. would not be possible, ‘CI for R’ will not cover the details for all topics.

As such, the informational distribution philosophy is ‘concise as possible, but avail as much as possible’ for this book. For those who are interested in diving further, the links provided and quizzes will help you explore such worlds.

Documentation

This book won’t go into how to document. However, if you are interested in an R documentation philosophy, see [HERE].

Best practices

‘Best practices’ do not exist.

Agile / lean

Agile is terrible. First section of Wiki says…

Ex-Google/Facebook/etc. on Quora say…

After the popularity of Toyota’s lean manufacturing framework, it makes sense that the generation after car tech, software tech, wants something similar as well. So everyone and their dog started to create their own framework, but agile won the marketing or propaganda.

However, tools like kanban (which agile took)

If you’re really interested in some system, framework, or overarching philosophy, I would recommend looking into Toyota Production System and its lean manufacturing framework.

Productionising code

After getting out of a coding course or textbook, one of the most important skills is probably to productionise code that is reproducible for multiple variable channels. This means on different environments (machines, environment variables, versions, etc.), input platforms (variants of SQLs, Linux/Windows/etc.), output platforms for the most reliable reproducibility.

Having a bit of Docker and a CI/CD tool (e.g. Jenkins or its alternatives) experience definitely helps.