Search the whole station

代做统计问题集 STA130H1F代写 STA课业代写 统计作业代写

STA130H1F – Week 10 Problem Set

代做统计问题集 Instructions How do I hand in these problems for the 11:59 a.m. ET, November 27th deadline? Your complete .Rmd file that you create for this


How do I hand in these problems for the 11:59 a.m. ET, November 27th dead-line?

Your complete .Rmd file that you create for this problem set AND the resulting .pdf (i.e., the one you ‘Knit to PDF’ from your .Rmd file) must be uploaded into a Quercus assignment (link: by 11:59 a.m. ET, on November 27th. Late problem sets or problems submitted another way (e.g., by email) are not accepted.

Problem set grading

There are two parts to your problem set. One is largely R-based with short written answers and the other is more focused on writing. We recommend you use a word processing software like Microsoft Word to check for grammar errors in your written work. Note: there can be issues copying from Word to R Markdown so it may be easier to write in this file first and then copy the text to Word. Then you can make any changes flagged in Word directly in this file.

Part 1 代做统计问题集

Question 1

Using data from the Gallup World Poll (and the World Happiness Report), we are interested in predicting which factors influence life expectancy around the world. These data are in the file happinessdata_2017.csv.

happiness2017 <- read_csv("happinessdata_2017.csv")

(a) Begin by creating a new variable called life_exp_category which takes the value “Good” for countries with a life expectancy higher than 65 years, and “Poor” otherwise.

(b) Divide the data into training (80%) and testing (20%) datasets. Build a classification tree using the training data to predict which countries have Good vs Poor life expectancy, using only the social_support variable as a predictor. Use the last 3 digits of your student ID number for the random seed.

(c) Use the same training dataset created in (b) to build a second classification tree to predict which countries have good vs poor life expectancy, using logGDP, social_support, freedom, and generosity as potential predictors.

(d) Use the testing dataset you created in (b) to calculate the confusion matrix for the trees you built in (b) and (c). Report the sensitivity (true positive rate), specificity (true negative rate) and accuracy for each of the trees. Here you will treat “Good” life expectancy as a positive response/prediction.


Question 2

Two classification trees were built to predict which individuals have a disease using different sets of potential predictors. We use each of these trees to predict disease status for 100 new individuals. Below are confusion matrices corresponding to these two classification trees.

a) Calculate the accuracy, false-positive rate, and false negative rate for each classification tree. Here, a “positive” result means we predict an individual has the disease and a “negative” result means we predict they do not.

Tree A

• Overall accuracy:

• False-positive rate:

• False-negative rate:

Tree B

• Overall accuracy:

• False-positive rate:

• False-negative rate:

b) Suppose the disease is very serious if untreated. Explain which classififier you would prefer to use.

Question 3

Data was collected on 30 cancer patients to investigate the effectivness (Yes/No) of a treatment. Two quantitative variables, xi (0, 1), i = 1, 2, are considered to be important predictors of effectiveness. Suppose that the rectangles labelled as nodes in the scatterplot below represent nodes of a classification tree.


Part 2 代做统计问题集

You are working at a news station and your boss comes running into your office and asks if you had seen the most recent report. It turns out the lost city of Atlantis has been found! The first thing they do is start to collect information on the city.

The country completed the Gallup World Poll and the following data is collected:

For some reason, in the process of collecting the data the life expectancy variable was missed. Now everyone is curious, is the life expectancy of the citizens good or poor? Given your recent work on the Gallup World Poll and the World Happiness Report, your boss is looking to you to provide some insights. Your boss has asked you to prepare a report with the following information:

• A little bit of background about the data you have used

• A description of the methods. Include at least 2 vocabulary words. Remember to explain in lay terms (i.e. explain it in a way that somebody not familiar with our fifield and specialized language can understand).

• The results

• Include at least one figure, and ensure that it is properly labeled (e.g. it should have informative titles and axes)

• A conclusion that provides a key take-home message

• Any limitations in your findings

Other things to consider:

• Try to not spend more than 20 minutes on the prompt.

• Aim for more than 200 but less than 500 words.

• Use full sentences.

• Grammar is not the main focus of this assessment, but it is important that you communicate in a clear and professional manner (i.e., no slang or emojis should appear).

Vocabulary – Classification – Prediction – Predictor(s) – Covariate(s) – Independent variable(s) – Dependent variable(s) – Input(s) – Output(s) – Training set/sample – Testing set/sample – Fitting a model – Confusion

matrix – Category – Tree – Terminal node – Stopping rule – Threshold – True positive (sensitivity) – True negative (specificity) – False positive – False negative – Accuracy – Classifier – Node(s) – Terminal Node – Binary – Split(ting)


更多代写:Python代考价格  线上考试怎么防止作弊  英国Midterm代考价格  report润色  Personal Biography代写 paraphrase软件

合作平台:essay代写 论文代写 写手招聘 英国留学生代写

The prev: The next:

Related recommendations