Search the whole station

# 代做统计问题集 STA130H1F代写 STA课业代写 统计作业代写

169

## STA130H1F – Week 10 Problem Set

### Instructions

How do I hand in these problems for the 11:59 a.m. ET, November 27th dead-line?

Your complete .Rmd file that you create for this problem set AND the resulting .pdf (i.e., the one you ‘Knit to PDF’ from your .Rmd file) must be uploaded into a Quercus assignment (link:https://q.utoronto.ca/courses/184002/assignments/465532) by 11:59 a.m. ET, on November 27th. Late problem sets or problems submitted another way (e.g., by email) are not accepted.

There are two parts to your problem set. One is largely R-based with short written answers and the other is more focused on writing. We recommend you use a word processing software like Microsoft Word to check for grammar errors in your written work. Note: there can be issues copying from Word to R Markdown so it may be easier to write in this file first and then copy the text to Word. Then you can make any changes flagged in Word directly in this file.

### Part 1代做统计问题集

#### Question 1

Using data from the Gallup World Poll (and the World Happiness Report), we are interested in predicting which factors influence life expectancy around the world. These data are in the file happinessdata_2017.csv.

`happiness2017 <- read_csv("happinessdata_2017.csv")`

(a) Begin by creating a new variable called life_exp_category which takes the value “Good” for countries with a life expectancy higher than 65 years, and “Poor” otherwise.

(b) Divide the data into training (80%) and testing (20%) datasets. Build a classification tree using the training data to predict which countries have Good vs Poor life expectancy, using only the social_support variable as a predictor. Use the last 3 digits of your student ID number for the random seed.

(c) Use the same training dataset created in (b) to build a second classification tree to predict which countries have good vs poor life expectancy, using logGDP, social_support, freedom, and generosity as potential predictors.

(d) Use the testing dataset you created in (b) to calculate the confusion matrix for the trees you built in (b) and (c). Report the sensitivity (true positive rate), specificity (true negative rate) and accuracy for each of the trees. Here you will treat “Good” life expectancy as a positive response/prediction.

#### Question 2

Two classification trees were built to predict which individuals have a disease using different sets of potential predictors. We use each of these trees to predict disease status for 100 new individuals. Below are confusion matrices corresponding to these two classification trees.

a) Calculate the accuracy, false-positive rate, and false negative rate for each classification tree. Here, a “positive” result means we predict an individual has the disease and a “negative” result means we predict they do not.

Tree A

• Overall accuracy:

• False-positive rate:

• False-negative rate:

Tree B

• Overall accuracy:

• False-positive rate:

• False-negative rate:

b) Suppose the disease is very serious if untreated. Explain which classififier you would prefer to use.

#### Question 3

Data was collected on 30 cancer patients to investigate the effectivness (Yes/No) of a treatment. Two quantitative variables, xi (0, 1), i = 1, 2, are considered to be important predictors of effectiveness. Suppose that the rectangles labelled as nodes in the scatterplot below represent nodes of a classification tree.

### Part 2代做统计问题集

You are working at a news station and your boss comes running into your office and asks if you had seen the most recent report. It turns out the lost city of Atlantis has been found! The first thing they do is start to collect information on the city.

The country completed the Gallup World Poll and the following data is collected:

For some reason, in the process of collecting the data the life expectancy variable was missed. Now everyone is curious, is the life expectancy of the citizens good or poor? Given your recent work on the Gallup World Poll and the World Happiness Report, your boss is looking to you to provide some insights. Your boss has asked you to prepare a report with the following information:

• A little bit of background about the data you have used

• A description of the methods. Include at least 2 vocabulary words. Remember to explain in lay terms (i.e. explain it in a way that somebody not familiar with our fifield and specialized language can understand).

• The results

• Include at least one figure, and ensure that it is properly labeled (e.g. it should have informative titles and axes)

• A conclusion that provides a key take-home message

• Any limitations in your findings

Other things to consider:

• Try to not spend more than 20 minutes on the prompt.

• Aim for more than 200 but less than 500 words.

• Use full sentences.

• Grammar is not the main focus of this assessment, but it is important that you communicate in a clear and professional manner (i.e., no slang or emojis should appear).

Vocabulary – Classification – Prediction – Predictor(s) – Covariate(s) – Independent variable(s) – Dependent variable(s) – Input(s) – Output(s) – Training set/sample – Testing set/sample – Fitting a model – Confusion

matrix – Category – Tree – Terminal node – Stopping rule – Threshold – True positive (sensitivity) – True negative (specificity) – False positive – False negative – Accuracy – Classifier – Node(s) – Terminal Node – Binary – Split(ting)

The prev: The next:

### Related recommendations

• #### 统计数据分析作业代写 Statistics代写 统计作业代写 数据分析代写

531

Statistics 统计数据分析作业代写 Background: Exoplanets are planets which orbit other stars, like the Earth orbits the Sun. Exoplanet discovery is currently an exciting and Background: 统计...

View details
• #### 数学统计作业代写 数学作业代写 统计作业代写 数学代写

347

Homework 3 数学统计作业代写 Instructions: Solve the problems in the spaces provided and save as a single PDF. Then upload the PDF to Canvas Assignments by the due date. Instructions: Solve t...

View details
• #### 统计计算方法代写 R语言代写 统计作业代写 Bayesian代写

621

HW 10 统计计算方法代写 Question (7 pts) Recall the Beta distribution, which is defined for θ ∈ (0, 1) with parameters α and β, has a density proportional to: Question (7 pts) 统计计算方法代...

View details
• #### 统计建模作业代写 统计作业代写 R语言作业代做 统计r代写

162

Project 3 统计建模作业代写 You can choose between two versions of this assignment. 1.Bradley-Terry modeling with NCAA basketball data 2.Rasch modeling with school security You can choose b...

View details
1