This Is the One Topic Every Statistician Should Master

What are important topics in statistics that every data scientist must know? originally appeared on Quora: the knowledge sharing network where compelling questions are answered by people with unique insights.

Answer by Hadley Wickham, chief scientist, RStudio, on Quora:

If I were to pick one topic in statistics that every data scientist should know about, I'd say linear models. They unify many common statistical tests (t-tests, ANOVA, ANCOVA), and have many useful extensions (mixed models, generalized linear models, lasso and ridge regression). They are the modeling tool that I reliably start with.

I'm typically not in the business of building models, but I use linear models all the time when trying to create visualizations that dig into what's going on. If you see a strong pattern in a plot, it's a good idea to make that pattern explicit with a model. You can then look at the residuals to see the subtler trends that remain. That's particularly useful when the initial graphic is dominated by a known and uninteresting pattern. I explore this idea in depth in R for Data Science: model building.

Bear in mind George Box's maxim that "all models are wrong; some models are useful." I think an important part of a statistical mindset is to understand that uncovering Truth is extremely difficult, and even when possible, may be so complicated as to not be practically useful.

This question originally appeared on Quora. Ask a question, get a great answer. Learn from experts and access insider knowledge. You can follow Quora on Twitter, Facebook, and Google+.

More questions: