Project: Topic Modeling to Discover Shopper Missions, Dunnhumby Dataset (Slides, Paper)
Alex Klibisz, 10/27/2016
I recently collaborated on a project with PhD student Josh Fagan. We used graphical topic modeling techniques to discover and analyze latent shopping missions from the Dunnhumby "The Complete Journey" dataset. The project was part of the Fall 2016 Advanced Topics in Data Mining course at the University of Tennessee, taught by Dr. Wenjun Zhou.
Our findings and techniques are summarized in the abstract:
The Dunnhumby dataset contains product-level transactions over two years from customers in 2,500 households; the specific retail context is not disclosed (i.e. grocery store, department store, convenience store). All transactions can be linked to a specific household and household demographics such as income or number of children are available. We explore this dataset using the Latent Dirichlet Allocation (LDA) and Author-Topic (AT) graphical models with the goal of understanding the latent missions of shoppers. In our analysis, both models yield distinct, yet reasonable categorizations of products into topics. We find the model performance increases after employing a low-entropy naming scheme. The model results are useful for answering several questions about the customers. Through further statistical analysis, we are able to determine which demographic classifications are most different from the entire population.