Search the whole station

代做大数据编程 CISC 5950代写 代做编程作业 Java作业代写

CISC 5950, Fall 2020 Big Data Programming

CISC 5950 — Lab 1

代做大数据编程 We have completed three tasks in class, 1. Set up a 3-node cluster with Hadoop Distributed File System and run examples. 2. On top of HDFS,

We have completed three tasks in class,

1. Set up a 3-node cluster with Hadoop Distributed File System and run examples.

2. On top of HDFS, set up the cluster with MapReduce programming framework.

3. Run examples of MapReduce programs.

Based on our examples, we will develop our own MapReduce program to analysis a simple log file. The following figure shows the structure of the log file.

代做大数据编程
代做大数据编程

Each line is a record of visit, which consists of IP Address, Time, Type of HTTP Request, Requested File, HTTP Version and Status, etc.

Example Programs 代做大数据编程

We have provided two examples that related to this lab.

logstat: It counts the number of visits for each IP address in the log file.

logstat2: It counts the number of visits for each IP address in the same hour.

As discussed in the lectures, MapReduce programming framework seperates the data and operation (two stages). It uses Hadoop Stream, which represents by sys.stdin in Python and Writable, Text in Java.

代做大数据编程
代做大数据编程

In the Reduce phase, we start counting the records based on the same IP addresses. After that, we can sort the result and print it out. As Fig. 5 and 6 present, the Map Phase for logstat2 is different than the previous version since we need to consider the time. Since we have processed data at Map Phase, the intermediate data of Map is already at the granularity of a hour. Therefore, the Reduce Phase is the same as logstat.

Lab 1 Assignment: Part 1 and 2 代做大数据编程

The given programs of logstat and logstat2 were written in both JAVA and Python.

Lab 1 consists of the following two parts.

1. Output the top-3 IP addresses with the granularity of an hour

You should read the two examples.

Develop your code based on examples. The program may take more than one round of MapReduce.

2. Make your program like a database search

Your program should be able to accept parameters from users, such as 0-1, which means from time 00:00 to 01:00, and output the top-3 IP addresses in the given time period.

Run it along with three other examples, WordCount, Sort, Grep, at the same time, and test fair and capacity schedulers.

Grading Rubric 代做大数据编程

You should complete the lab in groups of 4 students.

(80%) Part 1;

(15%) Part 2;

(5%) Report about the your design and experiments, please include screenshots for running your code on the cloud;

Submission

You should email your submission (one per a group) by the end of Oct. 16.

代做大数据编程
代做大数据编程

更多代写:Cs Online exam代考价格  雅思在家考作弊  英国社会学课程作业代写  Rhetorical Analysis Essay写作  Assignment加拿大代写 经济问题集代写

合作平台:essay代写 论文代写 写手招聘 英国留学生代写

The prev:

Related recommendations

1
您有新消息,点击联系!