代考应用统计和数据分析 AMS 553.414/614代写 统计代考

Practice questions for final exam

1. 代考应用统计和数据分析

The data for this practice question is based on the cars dataset which is automatically comes with R.

(a) Let dist be the response variable and speed be the explanatory variable. Do quintic polynomial regression (including the interecept). Which individual coefficient has the highest statistical significance?

(b) Using stepwise backwards elimination, continue to drop the least statistically significant regressors (but do not drop the intercept) until all (non-intercept) regressors have p-values of less than 0.05. Which regressors remain?

(c) Now treat the intercept as just another regressor. Using stepwise backwards elimination, continue to drop the least statistically significant regressors (drop the intercept if it is least significant) until all regressors have p-values of less than 0.05. Which regressors remain?

(d) First regress on the intercept only. Then using stepwise forward selection, continue to include the most statistically significant regressors (up to and including the quintic term) until no more additional regressors would have p-values of less than 0.05. Which regressors are selected for the model?

(e) Which two regressors (plus intercept) give the best fit? And which set of regressors gives the best BIC? (Hint: Use the leaps package.)

2. 代考应用统计和数据分析

The data for this practice question is based on the cars dataset which is automatically comes with R. The intercept will is included.

(a) Regress dist on speed. What is the AIC?

(b) Do set.seed(0). Use bootstrapping to create 10,000 more AIC statistics. What is their standard deviation? (Hint: Use dplyr::sample_n to appropriately sample rows from a dataframe.)

(c) Plot a histogram of the AIC’s. Does the distribution look skewed left, or skewed right, or symmetric?

3.

The data for this practice question is based on the Titanic_train.csv which is available in Blackboard. The intercept will always be included.

(a) Logistically regress Survived (the response variable) on the regressors Pclass (treat as cardinal variable), Sex and Age. What is the least signficant regressor?

(b) The difference between the null and residual deviance is distributed as chi-squared with how many degrees of freedom?

(c) Make a box plot of the Pearson residuals versus the texttPclass variable. Hint: If you get a mismatched length error, make an adjustment to the appropriate parameter in your glm call. See R glm() documentation for help.

(d) Find the mean Pearson residual for Pclass = 2. Hint: One method is to regress the Pearson residuals versus Pclass as a categorical variable. Another method is to use the aggregate() function.

4. 代考应用统计和数据分析

The data for this practice question is based on the Titanic_train.csv which is available in Blackboard. The intercept will always be included.

(a) What is the most common value of Embarked?

(b) Do multinomial logistic regression with Embarked as the response variable and Pclass (treat as cardinal variable), Sex, Age and Survived as the regressor. Use the most common value of Embarked as the reference value. For predicting which passengers embarked from France, what is the most signfificant regressor? The least significant regressor?

(c) Is a survivor more or less likely to have embarked from France? How much to the log odds change?

5.

The data for this practice question is based on the Titanic_train.csv which is available in Blackboard. The intercept will always be included.

(a) How many values of Age are missing?

(b) Do sed.seed(0). Using the mice package and the default method, create five imputed datasets. What are the five imputed ages for passenger number 6?

(c) Do sed.seed(0) and repeat the above using the norm.boot method. What are the five imputed ages for passenger number 6?

