Automobile Survey Clustering, Dendrograms, and Principal Component Analysis
- The objective of this exercise examines Auto Survey data that contains reviews of different parameters of an automobile. The coding exercise is provided in the python jupyter notebook below.
- The data consists of yes or no questions (1 or 0) for whether the automobile satisfies that category (variable). The variables in this dataset are driving_properties, interior, technology, comfort, reliability, handling, power, consumption, sporty, safety, gender, and household.
- Dendrograms is a method of hierarchical clustering. More information can be found here for a general overview of the hierarchical-clustering and here for Scikit-Learn Dendrograms overview.
- Principal Component Analysis (PCA) is a data preprocessing technique used to extract the most informative features, while preserving the most relevant information from the dataset. PCA reduces the model's complexity/dimensionality. More information on PCA can be found here: IBM What is Principal Component Analysis.
- Tables and graphs can be found here: Automobile Survey Clustering, Dendrograms, and Principal Component Analysis Python Jupyter Notebook. Please note most of the notebook output has been cleared due to file size restrictions.