Link to deployed project: https://mihirphalke1-datathon-v2-app-byghe1.streamlit.app/ This project is made as a part of DataThon 2025 held at KJ Somaiya College of Engineering.
The Customer Churn Analytics Model is a comprehensive solution for subscription-based businesses to predict, analyze, and prevent customer churn. Built with Streamlit and powered by advanced machine learning algorithms, this application provides real-time insights and actionable recommendations to improve customer retention.
- Advanced K-means clustering with optimized cluster selection
- Interactive 2D and 3D PCA visualizations
- Detailed cluster characteristics analysis
- Feature importance visualization
- Downloadable cluster statistics and analysis
- High-accuracy Random Forest classification model
- Feature importance analysis
- Comprehensive performance metrics
- Interactive confusion matrix visualization
- Exportable prediction results
- AI-driven personalized retention recommendations
- Risk-based action prioritization
- Downloadable strategy recommendations
- Customer-specific intervention suggestions
- Usage pattern analysis
- Customer satisfaction metrics
- Support ticket analysis
- Pricing impact assessment
- Key risk indicators
- Frontend: Streamlit
- Data Processing: Pandas, NumPy
- Machine Learning: Scikit-learn, Lifelines
- Visualization: Plotly
- Statistical Analysis: SciPy
streamlit>=1.0.0
pandas>=1.3.0
numpy>=1.21.0
scikit-learn>=1.0.0
plotly>=5.3.0
lifelines>=0.26.0
- Clone the repository:
git clone https://github.com/yourusername/customer-churn-analytics.git
cd customer-churn-analytics
- Install dependencies:
pip install -r requirements.txt
- Run the application:
streamlit run app.py
The application expects a CSV file with the following key columns:
- Customer demographic information
- Usage metrics (ViewingHoursPerWeek, ContentDownloadsPerMonth)
- Payment information (MonthlyCharges, TotalCharges)
- Engagement metrics (UserRating, SupportTicketsPerMonth)
- Churn status
- Clustering Accuracy: Optimized using silhouette scores
- Churn Prediction: >85% accuracy (varies by dataset)
- Feature Importance: Identifies top churn indicators
- Risk Assessment: Real-time probability scoring
The clustering module segments customers based on behavior patterns, enabling targeted retention strategies for different customer groups. It uses:
- K-means clustering with automatic optimal cluster selection
- PCA for dimensionality reduction
- Interactive visualizations for cluster analysis
Our prediction model uses an ensemble approach to identify at-risk customers:
- Random Forest Classifier with GridSearchCV optimization
- Feature importance analysis
- Robust cross-validation
- Probability-based risk scoring
The system provides automated, personalized retention recommendations based on:
- Customer segment analysis
- Historical intervention success rates
- Risk level assessment
- Customer lifetime value