CSI, Computer Society of India conducted its 25th student convention at R.V College of Engineering on 13th and 14th of October,2011. I got an opportunity to be part of the convention and present our paper entitled ” Map/Reduce Algorithm Performance Analysis in Computing Frequency of Tweets ” along with my co-author Nagashree.
The convention was a great time for all the students who came from all across the state to learn about the latest trends in the field of Information Technology.Also it was a wonderful platform for innovative young minds to share there ideas and innovations.Students who came from different places of karnataka took part in the convention and presented there papers.
Hadoop and map/reduce being my area of interest we decided to present a paper on “Map/Reduce Algorithm Performance Analysis” so that more and more students get to know about this latest emerging technology.We were given just 10min to present our paper and we had only 10 min to impress the judges and to communicate our ideas with our counter friends who were present in the convention.It was a wonderful experience to present paper in front of eminent professionals who were the judges for the event between were quite nervous as it was our first ever paper presentation.The day became much more memorable when we got to know that we got 3rd place for our presentation.
Here is the abstract of our presentation:
Abstract of Paper presentation
Title:Map/Reduce Algorithm Performance Analysis in Computing Frequency of Tweets
This paper proposes method to extract the tweets from twitter and analyses the efficiency of Map/Reduce algorithm on Hadoop framework hence achieves maximum performance.
New research in cloud computing has shown that implementing mapreduce not only influencing the performance -it also influences on more reliable storage management.
For about a decade it was considered that distributed computing is more complex to handle than expanding memory of single node cluster since inter-process communication (IPC) to be used to communicate with the nodes which was tedious to implement as the code would run longer than the computation procedure itself. But now apache.hadoop offers a more scalable and reliable platform to implement distributed computing .Through this paper we have analysed that Map/Reduce algorithm run on hadoop influences the performance significantly while handling huge data set stored on different nodes of a multi-node cluster .
Aim of the study
Cloud computing is the future and it will focuses more on distributed computing. In order to evaluate the features offered by hadoop for cloud computing huge unstructured data set is required. The present study investigated those questions.
The main focus of the study was to analysis the performance of Map/Reduce algorithm in computing the frequency of tweets.
About 6 to 10 lines of python algorithm was used to extract the tweets of people, taking input from twitter search API. Tweets were extracted consecutively for about 1 week resulted in a huge data set piling up to 50MB
The study was carried out in to parts. The first part was extracting tweets as mentioned above and the second was to implement customized Map/Reduce algorithm to compute the frequency of tweets on particular keywork(say “Anna Hazare”).
It was found that this approach offers a more reliable method to analyse huge data compared to any other classic methods.
Here is the slides of our presentation: