A HYBRIDIZED RECOMMENDATION SYSTEM ON MOVIE DATA USING CONTENT-BASED AND COLLABORATIVE FILTERING

ABSTRACT
In recent times, the rate of growth in information available on the internet has resulted in large amounts of data and an increase in online users. The Recommendation System has been employed to empower users to make informed and accurate decisions from the vast abundance of information. In this Research, we propose a hybrid recommender engine which combines Content-Based and Collaborative filtering recommendations. This seeks to explore how prediction accuracy can be enhanced in existing collaborative filtering frameworks.

We investigate to see if a Recommendation System combining Content-based and Collaborative filtering, using a Mahout Framework and built on Hadoop will improve recommendation accuracy and also alleviate scalability issues currently experienced in processing large volumes of data for recommending items to users.


We employed the Feature augmentation hybrid technique where the output from the Content-based recommendation is used as an input to Collaborative filtering. The well-known MovieLens data was matched with the Internet Movie Database (IMDB) in order to extract user and item content features. The input files generated from the integration of both databases was converted to text files which serve as an input into the Collaborative filtering framework in Mahout.


By means of various experiments, the best parameter optimization for Mahout Components was determined for our model. We further examined these models by comparing the Root Mean Square Error of our model against the state of art model.


The proposed model showed significant improvement when compared with the pure collaborative model. It was demonstrated from our analysis that the extracted user and items content features can, in some cases, lead to a better prediction accuracy. To be more precise, it was discovered that the user feature, gender, has no marginal impact on our underlying model while an item feature like Country is more beneficial than genre, contrary to findings in some other research work.


TABLE OF CONTENTS

ABSTRACT
TABLE OF CONTENTS
LIST OF ABBREVIATIONS
LIST OF FIGURES
LIST OF TABLES

CHAPTER ONE
INTRODUCTION
1.1       BACKGROUND OF THE STUDY
1.2       PROBLEM STATEMENT
1.3       AIM AND OBJECTIVES
1.4       SIGNIFICANCE OF THE STUDY
1.6       SYNOPSIS

CHAPTER TWO
LITERATURE REVIEW
2.1       INFORMATION RETRIEVAL AND FILTERING
2.2       RECOMMENDER SYSTEM TYPES AND TECHNIQUES
2.2.1    ENTITIES IN RECOMMENDATION SYSTEMS
2.2.2    COLLABORATIVE FILTERING (CF)
2.2.3    CONTENT-BASED RECOMMENDATION (CBR)
2.2.3.1 THE STRENGTH AND WEAKNESS OF CONTENT-BASED RECOMMENDATION
2.2.4    HYBRID RECOMMENDATION AND APPROACH
2.2.4.1 POSSIBLE COMBINATION OF HYBRID RECOMMENDATION
2.3       APACHE MAHOUT
2.3.1    DEVELOPMENT OF A SIMPLE RECOMMENDER USING MAHOUT LIBRARY
2.4       HADOOP
2.5       RELATED WORK

CHAPTER THREE
RESEARCH METHODOLOGY
3.1       INTRODUCTION
3.2       METHODOLOGY
3.3       CONTENT BASED RECOMMENDATION
3.4       COLLABORATIVE FILTERING USING MAHOUT
3.5       RECAP

CHAPTER FOUR
IMPLEMENTATION, RESULTS, PRESENTATION AND DISCUSSION
4.1       OVERVIEW OF THE IMPLEMENTATION APPROACH
4.2       EXTRACTION OF IMDB DATA
4.2.1    SOFTWARE TOOLS
4.2.1.1 SQLObject
4.2.1.2 PSYCOPG
4.2.1.3 POSTGRESQL
4.3       EXTRACTION OF MOVIELENS DATA
4.3.1    MOVIELENS RATING INFORMATION
4.3.2    MOVIELENS ITEM INFORMATION
4.3.3    EXTRACTING MOVIELENS USER FEATURES
4.4       ITEM FEATURES EXTRACTION AND COMBINATION
4.5       IMPLEMENTATION OF RECOMMENDER ENGINE BY APACHE MAHOUT
4.5.1    CLOUDERA
4.5.2    APACHE MAVEN
4.6       MAHOUT RECOMMENDER COMPONENTS – PARAMETERS OPTIMIZATION
4.6.1    DATASET
4.6.2 SIMILARITY METRICS AND NEIGHBORHOOD CRITERIA
4.7       SYSTEM EVALUATION
4.7.1    PERFORMANCE MEASURE
4.7.2    USER CONTENT FEATURES
4.7.3    ITEM CONTENT FEATURES
4.7.4 COMPARING USER/ITEM CONTENT FEATURES

CHAPTER FIVE
SUMMARY AND CONCLUSIONS
5.1       SUMMARY
5.2       CONCLUSION
5.3       RECOMMENDATION AND FUTURE WORKS
REFERENCES
APPENDIX A: SOURCE CODE SNIPPET
APPENDIX B: RECOMMENDER ENGINE - JAVA PROGRAM
APPENDIX C: EXPERIMENTAL RESULT


CHAPTER ONE

INTRODUCTION

1.1 BACKGROUND OF THE STUDY


The rate at which information is growing on the internet has resulted in large amounts of data and an increase in online users. This huge explosion of data has flooded users with large volumes of information and hence poses a great challenge in terms of information overload. Resultantly, this has made it very difficult for human beings to process such information manually and quite difficult for them to find the right information. The ability to make informed and accurate decisions from the sheer abundance of information by users often creates immense confusion. . Large internet companies like Amazon, Google, and Facebook have been faced with a difficulty in managing this explosion of information. Recommendation systems have been employed in order to transform this problem in a smart way. Figure 1.1 shows how recommender engines have stepped in this regard to rescue users from such confusion.

The vast increase in online data and users led to the rise of big data. The Big Data world has paid the most attention to the Recommendation System. Big Data has improved the capacity to do recommendations on a large scale. It has made the Recommendation System more important for the users as it predicts right piece of information out of vast amounts of information. The system is a particular form of information filtering that exploits users past behaviors or by the behavior of similar users to generate a list of information items that is personally tailored to an end user's preferences.

At present, in E-commerce, Recommendation Systems (RSs) are broadly used for information filtering processes to deliver personalized information by predicting user’s preferences to particular items [1]. RSs attempt to suggest items (Movies, music, books, news, web pages, etc.) that are most likely to interest the users. Amazon, Netflix and other such portals use RSs extensively for suggesting content to their users. RSs aim to alleviate.....

For more Computer Science Projects click here
================================================================
Item Type: Project Material  |  Size: 65 pages  |  Chapters: 1-5
Format: MS Word   Delivery: Within 30Mins.
================================================================

Share:

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Search for your topic here

See full list of Project Topics under your Department Here!

Featured Post

HOW TO WRITE A RESEARCH HYPOTHESIS

A hypothesis is a description of a pattern in nature or an explanation about some real-world phenomenon that can be tested through observ...

Popular Posts