This Django microservice provides an efficient way to collect large datasets required for training classification models using the Spotify Web API. It automates the process of authenticating and retrieving metadata about tracks and artists, allowing uninterrupted data collection over an extended period.
- Clone the repository:
git clone https://github.com/Xx_Rolo_xX/spotler-api.git
- Change into the project directory:
cd spotler-api
- Create and activate a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate
- Install the required dependencies:
pip install -r requirements.txt
- Set up the database:
python manage.py migrate
- Run the microservice:
python manage.py runserver
To access the Spotify Web API, you need to provide a valid access token. This microservice handles authentication automatically and ensures uninterrupted data retrieval by automatically refreshing the access token before it expires.
The microservice exposes the following endpoints for data retrieval:
/tracks
: Retrieve basic information about a track, including its identifier, name, and artist(s) associated with the track./artists
: Retrieve basic information about an artist based on their identifier, including the artist's name and associated genres./audio-features
: Retrieve metadata about a track based on its identifier, including features like acousticness, danceability, energy, instrumentalness, key, loudness, liveness, mode, speechiness, tempo, time signature, and valence./playlists
: Retrieve the track identifiers from a specific user's playlist./users/{user_id}/playlists
: Retrieve playlists belonging to a specific user.
The microservice utilizes a RESTful architecture to handle data collection efficiently. Upon receiving a GET request with a playlist identifier, the microservice retrieves data for all tracks in the playlist, stores it in a SQLite relational database, and applies the appropriate data schema for tracks, artists, and genres.
The advantages of this approach include eliminating entries for artists without associated genres and ensuring data integrity through database validations.
The analyzed tracks are sourced from over 800 playlists belonging to a Spotify user's official account. These playlists were chosen for their diversity and centralized around a single user, guaranteeing a vast and diverse dataset while mitigating biases from private users.
The collected dataset adheres to the following principles:
- Each track is associated with at least one artist.
- Each artist can be associated with any number of genres.
- Each track possesses a set of metadata, including features such as acousticness, danceability, energy, instrumentalness, key, loudness, liveness, mode, speechiness, tempo, time signature, and valence.
Contributions to this microservice are welcome. If you encounter any issues or have suggestions for improvement, please submit an issue or a pull request.
This project is licensed under the MIT License.