Data Science Case Study: Song Popularity Prediction System using Spotify API

Content Recommendation and Classification

2024

Machine Learning

Content Recommendation and Classification

2024

Machine Learning

Content Recommendation and Classification

2024

Machine Learning

Project: Predicting Song Popularity Using Data Analysis and Machine Learning

Vision:
To harness the power of data science to quantify creativity and predict song popularity, bridging the gap between emerging talent and audience discovery. This project aims to use data-driven insights to foster a supportive musical ecosystem by identifying tracks with the potential for success.

Objective:
To analyze song characteristics, address multicollinearity in features, and develop a machine learning model capable of predicting a song's popularity based on measurable metrics.

Methodology:

  1. Data and Feature Analysis:

    • Utilized data from Statso to analyze key features like duration, energy, loudness, and danceability.

    • Investigated correlations to detect multicollinearity and biases in the dataset.

  2. Feature Engineering:

    • Applied Principal Component Analysis (PCA) to resolve multicollinearity, creating composite features like energy_loudness_pca.

    • Selected features with significant correlations to the target variable (popularity) while excluding irrelevant ones.

  3. Model Training and Evaluation:

    • Trained three models: Linear Regression, Random Forest, and XGBoost.

    • Evaluated models using error metrics and residual analysis, identifying Random Forest as the optimal model.

    • Conducted hyperparameter tuning for improved performance.

  4. Model Application:

    • Developed a demo app that classifies songs into categories (Not Popular, Moderately Popular, Highly Popular) based on song attributes.

Results:

  • The Random Forest model outperformed the baseline Linear Regression model by a significant margin. Specifically:

    • Root Mean Squared Error (RMSE): Reduced from 7.97 to 6.01, an improvement of approximately 24.6%, indicating better predictive accuracy.

    • R-squared (Explained Variance): Increased from 0.045 to 0.457, an improvement of over 900%, demonstrating a substantial enhancement in the model's ability to explain the variance in song popularity.

  • Despite promising results, biases in the dataset and the presence of outliers limited the model’s predictive accuracy.

Conclusion:
This case study demonstrates the potential of machine learning in music analytics, enabling stakeholders like labels, independent artists, and streaming platforms to make data-informed decisions. With access to more unbiased data, the model could better predict popularity trends and support the discovery of emerging talent, aligning data-driven insights with creative expression.

Content Recommendation and Classification

2023

Machine Learning

Create a free website with Framer, the website builder loved by startups, designers and agencies.