KrDevanshu06.
Back to Works
Python
Scikit-Learn
SVD
KNN
Pandas

Collaborative Filtering Movie Recommendation System

2025-08-10
Repository
Abstract

A data science initiative to improve content discovery. By implementing Matrix Factorization (SVD) and K-Nearest Neighbors (KNN), we achieved a Precision@10 of 81.02% and an RMSE of 0.766.

Introduction

During my internship at Infotact Solutions, the challenge was to improve user engagement by suggesting relevant content. We chose a Collaborative Filtering approach to leverage user-item interaction data.

Mathematical Framework

We utilized Singular Value Decomposition (SVD) to decompose the user-item interaction matrix AA into latent factors.

AUΣVTA \approx U \Sigma V^T

Where:

  • UU: User preferences matrix
  • VTV^T: Item attributes matrix
  • Σ\Sigma: Weights of latent factors

K-Nearest Neighbors (KNN)

For real-time "Similar Item" recommendations, we used KNN with Cosine Similarity:

similarity(A,B)=ABA×B\text{similarity}(A, B) = \frac{A \cdot B}{||A|| \times ||B||}

Implementation

The pipeline involved cleaning the MovieLens dataset, creating a sparse matrix, and training the models.

from surprise import SVD from surprise.model_selection import cross_validate # Algorithm Selection algo = SVD(n_factors=100, n_epochs=20, lr_all=0.005, reg_all=0.02) # Training and Evaluation results = cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Results

  • RMSE (Root Mean Square Error): 0.766 (Lower is better)
  • Precision@10: 81.02% (High relevance in top 10 results)

Conclusion

The hybrid approach of using SVD for matrix completion and KNN for localized similarity proved highly effective for this dataset scale.

End of Document
DP

Devanshu Kumar Prasad

Data Associate & AI Engineer

Bridging the gap between data science and distributed systems. Winner of Summer Analytics Hackathon (IIT Guwahati).

© 2025 Devanshu Kumar Prasad. All rights reserved.

System Status: Operational