大数据分析考试代考 CIS 545代写 Big Data Analytics代写

2022/10/10数据代写考试助攻 Big Data Analytics代写 CIS 545代写大数据分析代写大数据分析考试代考477

CIS 545: Big Data Analytics

Sample Final

大数据分析考试代考 This exam will be administered over 80 minutes, and it is worth 80 total points. You should budget at most 1 minute per point on average.

_____________________________________

This exam will be administered over 80 minutes, and it is worth 80 total points. You should budget at most 1 minute per point on average. A single “cheat sheet” is allowed, and otherwise the exam is closed-book, closed-devices, closed-notes.

Your name:

_________________________________________________________

Your PennKey (Penn email ID) for disambiguation:

________________________________

Please sign: I understand and agree to follow Penn’s code of academic integrity in taking this exam.

__________________________________________________________

NOTES:

The MCQs ask to select the single best answer from the options (per your best judgement)
For many of the short/long answer questions, we only provide sample or model answers. Many other possible answers were also accepted

Part I: Multiple Choice. 大数据分析考试代考

For the following multiple-choice questions, please select the single best answer. Each is worth 3 points.

1.

If we find multicollinearity in our data and are using regression, what can we do?

a. apply PCA to map the data to a new subspace

b. use regularization

c. drop some of the correlated features

d. all of the above

2.

A random forest is most similar to which type of ensemble method?

a. stacking

b. bagging

c. boosting

d. logging

3.

What components do we need to perform a gradient descent operation?

a. weights and input, cost function, error, step size

b. weights and input, error, step size

c. weights and input, derivative of cost function, error, step size

d. weights and input, derivative cost function, error

4. 大数据分析考试代考

If you spot many more negative than positive instances in your training data, you can:

a. up-sample the positive instances

b. down-sample the positive instances

c. up-sample the negative instances

d. add random data

5.

When performing (ordinary least-squares without regularization) linear regression, we need to be careful about:

a. multicollinearity

b. scale

c. order in which the data instances appear

d. balance between two classes

6.

In a CNN, which types of layers have associated weights?

a. convolution

b. pooling

c. dense

d. convolution and dense only

e. all of the above

7. 大数据分析考试代考

What is a local receptive field?

a. a technique that converts from color to grayscale images

b. a sliding window over an image, connected to a single activation unit

c. a type of activation function

d. a mechanism for back-propagation

8.

High-level distributed stream processing systems can only join stream data that is (choose the best answer):

a. Infinite

b. Static

c. Dynamic

d. within a window

e. Sharding doesn’t produce speedup.

9.

Stream processing and message passing systems are similar in that the programmer:

a. has to handle issues in which data goes out of order

b. writes callback functions

c. does microbatching

d. writes computations in SQL

10. 大数据分析考试代考

When downsampling time-varying data to a lower frequency, we should expect to:

a. average multiple samples

b. interpolate a new value between existing samples

c. insert null values

d. pad with copies of an existing value

11.

What types of data might be suitable to archive in a data lake (choose the best answer):

a. JSON data

b. tabular data

c. text files

d. PDFs

e. all of the above

a. (0, black, 0, 2), (3, orange, 3, 2)

b. (0, black, 0, 3), (0, black, 0, 2), (3, orange, 3, 1), (3, orange, 3, 2)

c. (0, black, 0, 3), (3, orange, 3, 1)

d. no output

13.

In a distributed computing environment, whether among servers, Spark cluster nodes, or distributed mxnet cluster nodes — as we increase the number of nodes, the biggest bottleneck to performance is ultimately:

a. coordination and synchronization

b. data size

c. dimensionality

d. the lack of GPUs

14.

Operational DBMSs often are tuned for:

a. machine learning workloads

b. aggregation queries

c. transactional updates

d. data warehouses

Part II. Short-Answer. 大数据分析考试代考

Please give brief answers inside the provided area.

15.As part of a company project, you create a visualization over big data collected from the web. List 4 of the types of information you would need to capture, to have enough information to reconstruct (and thus allow someone else to verify) your visualization? (8 points)

17.You should select which value for cluster coefficient k? (5 points)

A decision tree greedily determines the attribute on which to split by maximizing what measure? (5 points)

Part III. Long-form questions. 大数据分析考试代考

Each question is worth 10 points.

18.Suppose you have an input dataset with labels, and you wish to train a logistic regression classifier on this. You’ll be using ridge regularization.

Outline the “best practice” steps you would take to prepare the training data and to train the classifier. What do you check for, what data transformations should you apply, etc.

19.Suppose you are building a computer vision algorithm for a Jurassic Park-themed

amusement park. Explain the steps involved in taking a convolutional neural network trained on ImageNet and using domain adaptation to train it to recognize carnivorous dinosaurs, versus herbivorous ones. (Assume you have access to many labeled dinosaur images.)

(Image copyright Prime 1 Studio.)

合作平台：essay代写论文代写写手招聘英国留学生代写

The prev: 统计学的基本概念代写统计学代写统计考试代考统计代做The next: 机器学习考试代考 CS5487代写 Machine Learning代写

Related recommendations

金融大数据分析代写 FIN 510代写 R语言代写大数据分析代写
2023/06/15 182
FIN 510 Big Data Analytics in Finance Lab 4: Data Exploration 金融大数据分析代写 Summarizing Data on Firm Fundamentals The file firm.csv containsfirm fundamental variables, including total ...
View details