Search the whole station

大数据分析考试代考 CIS 545代写 Big Data Analytics代写

CIS 545: Big Data Analytics

Sample Final

大数据分析考试代考 This exam will be administered over 80 minutes, and it is worth 80 total points. You should budget at most 1 minute per point on average.

_____________________________________

This exam will be administered over 80 minutes, and it is worth 80 total points. You should budget at most 1 minute per point on average. A single “cheat sheet” is allowed, and otherwise the exam is closed-book, closed-devices, closed-notes.

Your name:

_________________________________________________________

Your PennKey (Penn email ID) for disambiguation:

________________________________

Please sign: I understand and agree to follow Penn’s code of academic integrity in taking this exam.

__________________________________________________________

NOTES:

  1. The MCQs ask to select the single best answer from the options (per your best judgement)
  2. For many of the short/long answer questions, we only provide sample or model answers. Many other possible answers were also accepted

Part I: Multiple Choice. 大数据分析考试代考

For the following multiple-choice questions, please select the single best answer. Each is worth 3 points.

1.

If we find multicollinearity in our data and are using regression, what can we do?

a. apply PCA to map the data to a new subspace

b. use regularization

c. drop some of the correlated features

d. all of the above

2.

A random forest is most similar to which type of ensemble method?

a. stacking

b. bagging

c. boosting

d. logging

3.

What components do we need to perform a gradient descent operation?

a. weights and input, cost function, error, step size

b. weights and input, error, step size

c. weights and input, derivative of cost function, error, step size

d. weights and input, derivative cost function, error

4. 大数据分析考试代考

If you spot many more negative than positive instances in your training data, you can:

a. up-sample the positive instances

b. down-sample the positive instances

c. up-sample the negative instances

d. add random data

5.

When performing (ordinary least-squares without regularization) linear regression, we need to be careful about:

a. multicollinearity

b. scale

c. order in which the data instances appear

d. balance between two classes

6.

In a CNN, which types of layers have associated weights?

a. convolution

b. pooling

c. dense

d. convolution and dense only

e. all of the above

7. 大数据分析考试代考

What is a local receptive field?

a. a technique that converts from color to grayscale images

b. a sliding window over an image, connected to a single activation unit

c. a type of activation function

d. a mechanism for back-propagation

8.

High-level distributed stream processing systems can only join stream data that is (choose the best answer):

a. Infinite

b. Static

c. Dynamic

d. within a window

e. Sharding doesn’t produce speedup.

9.

Stream processing and message passing systems are similar in that the programmer:

a. has to handle issues in which data goes out of order

b. writes callback functions

c. does microbatching

d. writes computations in SQL

10. 大数据分析考试代考

When downsampling time-varying data to a lower frequency, we should expect to:

a. average multiple samples

b. interpolate a new value between existing samples

c. insert null values

d. pad with copies of an existing value

11.

What types of data might be suitable to archive in a data lake (choose the best answer):

a. JSON data

b. tabular data

c. text files

d. PDFs

e. all of the above

大数据分析考试代考
大数据分析考试代考

a. (0, black, 0, 2), (3, orange, 3, 2)

b. (0, black, 0, 3), (0, black, 0, 2), (3, orange, 3, 1), (3, orange, 3, 2)

c. (0, black, 0, 3), (3, orange, 3, 1)

d. no output

13.

In a distributed computing environment, whether among servers, Spark cluster nodes, or distributed mxnet cluster nodes — as we increase the number of nodes, the biggest bottleneck to performance is ultimately:

a. coordination and synchronization

b. data size

c. dimensionality

d. the lack of GPUs

14.

Operational DBMSs often are tuned for:

a. machine learning workloads

b. aggregation queries

c. transactional updates

d. data warehouses

Part II. Short-Answer. 大数据分析考试代考

Please give brief answers inside the provided area.

15.As part of a company project, you create a visualization over big data collected from the web. List 4 of the types of information you would need to capture, to have enough information to reconstruct (and thus allow someone else to verify) your visualization? (8 points)

17.You should select which value for cluster coefficient k? (5 points)

A decision tree greedily determines the attribute on which to split by maximizing what measure? (5 points)

Part III. Long-form questions. 大数据分析考试代考

Each question is worth 10 points.

18.Suppose you have an input dataset with labels, and you wish to train a logistic regression classifier on this. You’ll be using ridge regularization.

Outline the “best practice” steps you would take to prepare the training data and to train the classifier. What do you check for, what data transformations should you apply, etc.

19.Suppose you are building a computer vision algorithm for a Jurassic Park-themed

amusement park. Explain the steps involved in taking a convolutional neural network trained on ImageNet and using domain adaptation to train it to recognize carnivorous dinosaurs, versus herbivorous ones. (Assume you have access to many labeled dinosaur images.)

大数据分析考试代考
大数据分析考试代考

(Image copyright Prime 1 Studio.)

更多代写:C++代考推荐  duolingo代考  英国bio生物学网课代上价格  ECO essay代写  Dissertation学位论文代写 如何写论文摘要

合作平台:essay代写 论文代写 写手招聘 英国留学生代写

The prev: The next:

Related recommendations

1
您有新消息,点击联系!