In this project, wine reviews have been used to determine the type of wine training on imbalanced an dataset using classification algorithms like SVM, Naive Bayes and Random Forest Classifier. Neural Network (CNN, RNN and LSTM) and LLM models (DistilBERT and RoBERTa) were also used followed by error analysis using SHAP.
We have been provided with a wine reviews dataset with two columns: “review_text” and “wine_variant” and the goal is to create a wine recommendation system using test classification.
- Target variable – ‘wine_variant’
- Categories – 8 Types - 'Pinot Noir', 'Sauvignon Blanc', 'Cabernet Sauvignon', 'Chardonnay', 'Syrah', 'Riesling', 'Merlot', 'Zinfandel'
- Train data – 10000 observations were split into test set of sample size 25% (2500). Stratified sampling used for appropriate representation of above-mentioned classes. An additional validation data with 5000 observations has been used.
- Distribution – In percentage
- TF-IDF vectorization
- Latent Semantic Analysis
- Sentence Transformer (all-mpnet-base-v2)
- torchtext.vocab
- Linear and Non-linear SVM
- SDG Classifier
- Multinomial Naive Bayes
- Random Forest Classifier
- CNN
- LSTM
- DistilBERT
- RoBERTa
From the above results we have the four best classifier along list in the order of descending macro average f1 score on validation set:
- RoBERTa (0.80)
- DistilBERT (0.79)
- TFIDF Vectorization + Linear SVC (with hyperparameter tuning) (0.78)
- CNN (0.77) We can conclude two things from the above analysis:
- Given the size of the training set, the transfer learning algorithms(RoBERTa and DistilBERT) are likely to provide much better results as seen in the table above.
- Given the class imbalance in the dataset, the best way to group the categories is on the basis of domain knowledge as stated above. Grouping on the basis of taste and flavour is more appropriate when building a wine recommendation system rather looking at the distribution of target variables. This has led to a significant improvement in results improving classification accuracy from low 70s to almost 80%.
- Although our model has shown a significant improvement in results from the baseline SVC model, the macro f1 score does not go above 80% even after working with multiple models. This is a clear indication that we need more training data to improve our classification report.
We have used the RoBERTa model for performing error analysis using SHAP. We have taken a sample of 30 mis-predicted observations from the provided test set of sample size 500 for this analysis. We will look into a few samples for our report, for a model detailed analysis please refer to the code.
While words like “light” and “oak” incline the results towards “Medium to Full-bodied Reds”, the final outcome seems to influenced by the use of “powerful”, “refrain” and “berries”.
In this example we see that the use of words like, “TONS” and “more fruit” has pushed the classifier to predict “Bold Red”
In the given scenario, the word “medium” clearly influences the result
The use of the word “champagne” which is a “Full-bodied white” has stirred the prediction to be as such. From the above analysis we see errors that are primarily domain knowledge related. However, in the reviews we also have text that are redundant and do not contribute to the classification with respect to taste of quality of wine as seen below. Hence, a recommendation from this would to carefully curate samples that are used to train the wine-recommendation model in order to obtain more accurate results.
For more details please refer to Project Report