Final Team Project - Essie Market Research Report

Team member: Yunjia Ma, Hang Zou, Meiyi Wang, Jingyi Wu, Grace Wang

Loading Data

Missing Values

17 rows in the dataset do not have summay and 13 rows do not have reviewText. So we will remove rows do not have reviewText information.

Now there are 35,845 product asin in our data frame, but not all asin has corresponding title. So we furthered review these asins and found titles for them.

Duplicated Values

Number of Unique Products in the Luxury Beauty Category = 1581

We found there are duplicated reviews. One person reviewed the same product for multiple times with the same text. This can mislead our analysis. So next, we will remove the duplicated reviews.

Explore the dataset

Top 20 Reviewed Products

Most reviewed products are B003OGV7UO and B004N2S2JM, which have 694 reviews.

These two products are coming from the same brand - CND with different colors.

%E6%88%AA%E5%B1%8F2022-04-23%20%E4%B8%8B%E5%8D%8810.01.41.png

Overall Ratings Distribution

Majority of examples were rated highly (>=4).

Exploratory Analysis

Data Processing

Insert pos_neg column for Sentiment modeling

Train/Test Split

Logistic Regression - Winner

CountVectorizer

Preformance

Multinomial Bayes

CountVectorizer

TfidfVectorizer

Clustering / Topic Modeling (NMF and Lda)

CountVectorizer & Tf-idf

Build Clustering Models (NMF & LDA)

Tf (NMF)

Tf (LDA)

Tfidf (NMF)

Tfidf (LDA)

Recommendation (unfinished)

All_Nails

Filter

Positive

Negative

Word2Vec

LDA

Positive

Negative

Essie

Filter

Positive

Negative

Word2Vec

LDA

Positive

Negative

OPI

Filter

Positive

Negative

Word2Vec

LDA

CND

Filter

Word2Vec

LDA