Services
- Subjects
- Nursing
- Law
- Management
- Finance
- Accounting
- Statistics
- Engineering
- Psychology
- Business
- View All
Free Samples
Blog
Reviews 4.5/5
Support
- Help & Support
- FAQs
- Our Policies
- Contact Us
- Request Callback
Order Now

CS5710 High Performance Computational Infrastructures Assessment Answer

Subject Name : IT Computer Science

Introduction

Data about Airbnb postings in New York City, including their area, cost, and different qualities, can be found in the New York City Airbnb dataset, which is uninhibitedly open on the web. In this project, we will be working with this dataset to create a mapper, reducer, and driver class using Java and Hadoop. Specifically, we will be using these classes to find the maximum, minimum, and average price for each room type in the dataset.

Problem Description

The main objective of this project is to analyze the Airbnb dataset and provide insights into the prices of different room types in New York City. To achieve this, we will be using the Hadoop MapReduce framework to develop a solution that can handle large amounts of data efficiently. Our solution will use two mapper classes to process the input data, two reducer classes to aggregate the data, and a driver class to orchestrate the entire MapReduce job. The output of our solution will be a set of key-value pairs that represent the maximum, minimum, and average prices for each room type in the dataset.

Related Dataset

The dataset we will use for this task is the New York City Airbnb Open Information, which is accessible on Kaggle at https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-information. Over 48,000 New York City Airbnb listings, along with information about their location, price, and room type, are included in this dataset.

Design & Implementation

Our solution will consist of two mapper classes, two reducer classes, and a driver class. Reading the input data and generating key-value pairs, where the key is the room type and the value is the price, will fall under the purview of the first mapper class, Mapper1. The subsequent mapper class (Mapper2) will get the result of Mapper1 and produce key-esteem matches where the key is the room type and the worth is a tuple containing the value, the count of costs, and the amount of costs.. The first reducer class (Reducer1) will receive the output of Mapper2 and calculate the maximum and minimum price for each room type. The second reducer class (Reducer2) will receive the output of Mapper2 and calculate the average price for each room type. Finally, the driver class will orchestrate the entire MapReduce job and output the results.

The Mapper1 class will read the input data using the TextInputFormat class and emit key-value pairs using the LongWritable and Text classes, where the key is the price and the value is the room type. The Mapper2 class will receive Mapper1's output and generate key-value pairs with the room type as the key and a tuple containing the price, count, and sum of prices as the value. The Reducer1 class will receive the output of Mapper2 and use the Text and DoubleWritable classes to calculate the maximum and minimum price for each room type. The Reducer2 class will receive the output of Mapper2 and use the Text and DoubleWritable classes to calculate the average price for each room type. The driver class will use the Job class to configure and submit the MapReduce job, and the TextOutputFormat class to output the results.

Explanation

The arrangement comprises of two mapper classes (Mapper1 and Mapper2), two minimizer classes (Reducer1 and Reducer2), and a driver class. The fundamental motivation behind these classes is to handle the information, create middle key-esteem coordinates, and perform important calculations to deliver the last result.

Mapper1, the first mapper class, is in charge of reading the input data and creating key-value pairs with the room type as the key and the price as the value. This is accomplished utilizing the TextInputFormat class to peruse the info information and producing key-esteem matches utilizing the LongWritable and Text classes.

After receiving Mapper1's output, the second mapper class, Mapper2, generates key-value pairs in which the room type serves as the key and a tuple containing the price, cost count, and cost amount serves as the value. This intermediate step helps in performing further computations on the data.

The first reducer class, Reducer1, receives the output of Mapper2 and calculates the maximum and minimum price for each room type using the Text and DoubleWritable classes.

The second reducer class, Reducer2, receives the output of Mapper2 and calculates the average price for each room type using the Text and DoubleWritable classes.

Finally, the driver class orchestrates the entire MapReduce job, configures and submits it using the Job class, and outputs the results using the TextOutputFormat class.

Overall, the MapReduce paradigm provides an efficient and scalable way of processing large datasets by dividing the workload into smaller, independent tasks that can be executed in parallel across multiple machines in a cluster.

Results & Evaluation

Our solution will output a set of key-value pairs that represent the maximum, minimum, and average prices for each room type in the dataset. We can evaluate the effectiveness of our solution by comparing the results with the expected values based on our knowledge of the dataset. We can also use visualization techniques to gain insights into the distribution of prices across different room types in New York City.

Conclusion

In conclusion, this project demonstrates the use of Hadoop MapReduce to analyze a large dataset and provide insights into the prices of different room types in New York City. Our solution uses two mapper classes, two reducer classes, and a driver class to efficiently process

Bibliography

Dean Wampler and Alexey Grishchenko. (2015). Hadoop Application Architectures. O'Reilly Media, Inc.

Tom White. (2015). Hadoop: The Definitive Guide. O'Reilly Media, Inc.

Java Documentation. (n.d.). Java SE Documentation. Retrieved from https://docs.oracle.com/en/java/javase/index.html

Hadoop Documentation. (n.d.). Apache Hadoop Documentation. Retrieved from https://hadoop.apache.org/docs/

(n.d.). New York City Airbnb Open Data. Retrieved from https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data

Apache Hadoop. (n.d.). Hadoop MapReduce Tutorial. Retrieved from https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

Hadoop Tutorial. (n.d.). Hadoop Tutorial: Developing MapReduce Applications. Retrieved from https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm

(2015). MapReduce Design Patterns. IBM Press.

Lars George. (2011). Hadoop: The Definitive Guide. Yahoo Press.

Apache Hadoop. (n.d.). Hadoop Streaming. Retrieved from https://hadoop.apache.org/docs/current/hadoop-streaming/HadoopStreaming.html

Appendices

Output

If you have compiled and packaged the code into a JAR file named "ny_airbnb.jar", you can run the Hadoop job using the following commands:

$ hadoop fs -mkdir input

$ hadoop fs -put /AB_NYC_2019.csv input/

$ hadoop jar ny_airbnb.jar Driver input output1 output2

Here, "input" is the input directory containing the dataset, "output1" is the output directory for the first MapReduce job, and "output2" is the output directory for the second MapReduce job.

After the job completes, you can view the output of the second reducer using the following command:

$ hadoop fs -cat output2/part-r-00000

The output will be in the form of (room_type, min_price) tuples for each room type.

You Might Also Like;-

Great UK Universities for Computer Science and IT Degrees

IT Assignment Help