Edward Tufte is a leading expert in the data analysis and data visualization space. His books are classics and required reading for anyone interested in understanding how best to display quantitative information. I read his books just after I left Apple in 2003 to become a college professor in Japan. His books are foundational. I’ve talked about Tufte in my own books and on this website going back to at least this post in 2005. I have not seen him speak recently, so I was happy to see this 50-minute presentation by Dr. Tufte which took place at the Microsoft Machine Learning & Data Science Summit 2016 held this past September. Microsoft’s David Smith introduced Dr. Tufte at the 2:30 mark.
In his talk, Tufte warns against confirmation bias and massaging the data to arrive at findings that are desirable or somehow in your interest. He paraphrases one of Daniel Patrick Moynihan’s famous lines: “Everyone is entitled to their own opinions, but they are not entitled to their own facts.” This reminded me of the old Darrell Huff chestnut from How to Lie With Statistics (1954): “If you torture the data long enough, it will confess to anything.”
You want to generate your findings from the data not from the analysis, Tufte says. To do this he recommends specifying your analysis first before you collect the data. “This is to avoid all the generating findings just by analysis, not out of the data.” Tufte stressed the importance of “…full pre-specification of the analysis so they can’t over search the data, so they can’t run a million models and publish one. I think this is the future of confirmatory data analysis.”
Exploratory and replication
“We should be mucking around in our data to find out what’s going on. We can learn from it. We can run it through powerful exploratory things. We can run it like a map through millennial time and look it over and say, that look interesting….and what this means, though, this kind of searching, is that you must have an honest replication of the results of the search. To go back on innocent data, maybe somebody 500 miles away does it. Maybe that’s better, independent replication of the search results to distinguish now between noise and signal.”
“It is impossible for any normal human being to stare at a spreadsheet and look for contradictions, and problems.” This is where data forensics comes in. Tufte recommends the Quartz Guide to Bad Data at GitHub.
Scattering the eye and mind, producing vague anxiety & clutter
At the end of his talk, Tufte said something very wise. Something simple as can be, but it was one of the most important things he said in his talk. After talking about the need for us to learn about the entire process of data and analysis and to go out in the field and watch directly how original measurements are made, Tufte said this:
“In doing creative work do not start your day with addictive time-vampires such as The New York Times, email, and Twitter. All scatter the eye, and mind, produce diverting vague anxiety, clutter short-term memory. Instead, begin with your work. Many creative workers have independently discovered this principle.” I completely agree with this.
And finally this bit of wisdom concerning data analysis and thinking in general. The most powerful question you can ask yourself, and of others is: “How do I know that? How do you know that? How do they know that?”
These books by Dr. Tufte —especially The Visual Display of Quantitative Information, Visual Explanations, and Envisioning Information — are ones you want on your bookshelf. They are beautifully designed and well made. Over the years I have come back to these books often. Of course, the examples are dated, but the principles are the same and the examples hold up well and you can easily apply the concepts to modern problems. And they are just beautiful, smart books.