As John mentioned in his last post, we have been quite interested in the recent study by Fernandez-Delgado, et.al., “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?” (the “DWN study” for short), which evaluated 179 popular implementations of common classification algorithms over 120 or so data sets, mostly from the UCI Machine Learning Repository. For fun, we decided to do a follow-up study, using their data and several classifier implementations from
scikit-learn
,
the Python machine learning library. We were interested not just in
classifier accuracy, but also in seeing if there is a “geometry” of
classifiers: which classifiers produce predictions patterns that look
similar to each other, and which classifiers produce predictions that
are quite different? To examine these questions, we put together a Shiny app to interactively explore how the relative behavior of classifiers changes for different types of data sets.
No comments:
Post a Comment