Data about Airbnb postings in New York City, including their area, cost, and different qualities, can be found in the New York City Airbnb dataset, which is uninhibitedly open on the web. In this project, we will be working with this dataset to create a mapper, reducer, and driver class using Java and Hadoop. Specifically, we will be using these classes to find the maximum, minimum, and average price for each room type in the dataset.
The main objective of this project is to analyze the Airbnb dataset and provide insights into the prices of different room types in New York City. To achieve this, we will be using the Hadoop MapReduce framework to develop a solution that can handle large amounts of data efficiently. Our solution will use two mapper classes to process the input data, two reducer classes to aggregate the data, and a driver class to orchestrate the entire MapReduce job. The output of our solution will be a set of key-value pairs that represent the maximum, minimum, and average prices for each room type in the dataset.
The dataset we will use for this task is the New York City Airbnb Open Information, which is accessible on Kaggle at https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-information. Over 48,000 New York City Airbnb listings, along with information about their location, price, and room type, are included in this dataset.
Our solution will consist of two mapper classes, two reducer classes, and a driver class. Reading the input data and generating key-value pairs, where the key is the room type and the value is the price, will fall under the purview of the first mapper class, Mapper1. The subsequent mapper class (Mapper2) will get the result of Mapper1 and produce key-esteem matches where the key is the room type and the worth is a tuple containing the value, the count of costs, and the amount of costs.. The first reducer class (Reducer1) will receive the output of Mapper2 and calculate the maximum and minimum price for each room type. The second reducer class (Reducer2) will receive the output of Mapper2 and calculate the average price for each room type. Finally, the driver class will orchestrate the entire MapReduce job and output the results.
The Mapper1 class will read the input data using the TextInputFormat class and emit key-value pairs using the LongWritable and Text classes, where the key is the price and the value is the room type. The Mapper2 class will receive Mapper1's output and generate key-value pairs with the room type as the key and a tuple containing the price, count, and sum of prices as the value. The Reducer1 class will receive the output of Mapper2 and use the Text and DoubleWritable classes to calculate the maximum and minimum price for each room type. The Reducer2 class will receive the output of Mapper2 and use the Text and DoubleWritable classes to calculate the average price for each room type. The driver class will use the Job class to configure and submit the MapReduce job, and the TextOutputFormat class to output the results.
The arrangement comprises of two mapper classes (Mapper1 and Mapper2), two minimizer classes (Reducer1 and Reducer2), and a driver class. The fundamental motivation behind these classes is to handle the information, create middle key-esteem coordinates, and perform important calculations to deliver the last result.
Mapper1, the first mapper class, is in charge of reading the input data and creating key-value pairs with the room type as the key and the price as the value. This is accomplished utilizing the TextInputFormat class to peruse the info information and producing key-esteem matches utilizing the LongWritable and Text classes.
After receiving Mapper1's output, the second mapper class, Mapper2, generates key-value pairs in which the room type serves as the key and a tuple containing the price, cost count, and cost amount serves as the value. This intermediate step helps in performing further computations on the data.
The first reducer class, Reducer1, receives the output of Mapper2 and calculates the maximum and minimum price for each room type using the Text and DoubleWritable classes.
The second reducer class, Reducer2, receives the output of Mapper2 and calculates the average price for each room type using the Text and DoubleWritable classes.
Finally, the driver class orchestrates the entire MapReduce job, configures and submits it using the Job class, and outputs the results using the TextOutputFormat class.
Overall, the MapReduce paradigm provides an efficient and scalable way of processing large datasets by dividing the workload into smaller, independent tasks that can be executed in parallel across multiple machines in a cluster.
Our solution will output a set of key-value pairs that represent the maximum, minimum, and average prices for each room type in the dataset. We can evaluate the effectiveness of our solution by comparing the results with the expected values based on our knowledge of the dataset. We can also use visualization techniques to gain insights into the distribution of prices across different room types in New York City.
In conclusion, this project demonstrates the use of Hadoop MapReduce to analyze a large dataset and provide insights into the prices of different room types in New York City. Our solution uses two mapper classes, two reducer classes, and a driver class to efficiently process
Dean Wampler and Alexey Grishchenko. (2015). Hadoop Application Architectures. O'Reilly Media, Inc.
Tom White. (2015). Hadoop: The Definitive Guide. O'Reilly Media, Inc.
Java Documentation. (n.d.). Java SE Documentation. Retrieved from https://docs.oracle.com/en/java/javase/index.html
Hadoop Documentation. (n.d.). Apache Hadoop Documentation. Retrieved from https://hadoop.apache.org/docs/
(n.d.). New York City Airbnb Open Data. Retrieved from https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data
Apache Hadoop. (n.d.). Hadoop MapReduce Tutorial. Retrieved from https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
Hadoop Tutorial. (n.d.). Hadoop Tutorial: Developing MapReduce Applications. Retrieved from https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm
(2015). MapReduce Design Patterns. IBM Press.
Lars George. (2011). Hadoop: The Definitive Guide. Yahoo Press.
Apache Hadoop. (n.d.). Hadoop Streaming. Retrieved from https://hadoop.apache.org/docs/current/hadoop-streaming/HadoopStreaming.html
If you have compiled and packaged the code into a JAR file named "ny_airbnb.jar", you can run the Hadoop job using the following commands:
$ hadoop fs -mkdir input
$ hadoop fs -put /AB_NYC_2019.csv input/
$ hadoop jar ny_airbnb.jar Driver input output1 output2
Here, "input" is the input directory containing the dataset, "output1" is the output directory for the first MapReduce job, and "output2" is the output directory for the second MapReduce job.
After the job completes, you can view the output of the second reducer using the following command:
$ hadoop fs -cat output2/part-r-00000
The output will be in the form of (room_type, min_price) tuples for each room type.
You Might Also Like;-
Great UK Universities for Computer Science and IT Degrees
1,212,718Orders
4.9/5Rating
5,063Experts
Turnitin Report
$10.00Proofreading and Editing
$9.00Per PageConsultation with Expert
$35.00Per HourLive Session 1-on-1
$40.00Per 30 min.Quality Check
$25.00Total
FreeGet
500 Words Free
on your assignment today
Request Callback
Doing your Assignment with our resources is simple, take Expert assistance to ensure HD Grades. Here you Go....
🚨Don't Leave Empty-Handed!🚨
Snag a Sweet 70% OFF on Your Assignments! 📚💡
Grab it while it's hot!🔥
Claim Your DiscountHurry, Offer Expires Soon 🚀🚀