8 September 2015

R: towards the future of statistics


Data & Statistics See all news

The recent acquisition of the company “Revolution Analytics” (a commercial provider of software and services for R) by Microsoft confirms the growing trend in recent years towards making R the most used language in the world for statistics and predictive analysis.

This trend is also confirmed by the TIOBE index (= measure of popularity of programming languages) in which R, especially for its use in the field of big data, has enjoyed a lightning rise: last December it reached 12th place in this ranking (it reached 38th place in December 2013).

Furthermore, the list of companies trusting R for their various needs in data analysis grows from year to year and in 2014 included: Facebook, Google, Twitter, The New York Times and even banking organisations such as ANZ Bank and Credit Suisse.

This trend is explained primarily by the fact that R, thanks to it being free and its Open Source philosophy, has many responsive and motivated contributors. Thus, these contributors are constantly enriching the R universe with “packages” developed as and when new needs arise, new statistical methodologies and new tools, including data management.
We can mention, in particular, the fact that R was able to follow the current trends in the data world by allowing, for example, to retrieve and analyse tweets, create web interfaces and web applications to track statistics or even to perform analysis directly on the cloud.

R is constantly changing and has been able to evolve into today’s essential tool for any statistician or data analyst or even “data scientist”. Thus, it is not unfair to say that R is already positioned as a reference for tomorrow and a bet for that matter and it has not finished surprising us!

[The opinion of the editor]
I regularly use R for “classic” statistical analysis (regression, AFC, typology, …) and I take advantage of many existing packages to avoid having to rewrite it all by myself. However, when a statistical analysis function does not allow me to achieve the desired result (for example, a custom chart), the fact that the entire R code is freely available allows me to enrich the existing function and just modify a few elements to have a result fully in line with my expectations.

Likewise, I also use R directly as a programming language to create ad hoc statistical analysis functions for specific needs, such as, for example, the implementation of a method of analysis of CBC trade- off.

Another highlight of R, from my point of view, is its large community available to assist via various media (mailing lists, forums, blogs, …), to which I actively participate by being a moderator of the forum dedicated to R on the site Developpez.com (= the largest French-speaking international community of IT professionals).

To sum up, I would say that the main strengths of R is that it’s free (significant nevertheless), its active community, its dual use as a statistical tool AND a programming language and lastly, its development speed in order to constantly adapt to the new environment of statistics and data.

-> And you, what do you think? Are you still refactR or you see R as an asset to be developed quickly?