Search the whole station

统计建模作业代写 统计作业代写 R语言作业代做 统计r代写

Project 3

统计建模作业代写 You can choose between two versions of this assignment. 1.Bradley-Terry modeling with NCAA basketball data 2.Rasch modeling with school security

You can choose between two versions of this assignment.

  1. Bradley-Terry modeling with NCAA basketball data
  2. Rasch modeling with school security data

Notes:

  • Aim for a length of 6-8 pages (as opposed to the 5-7 we suggested for prior assignments). Assume your audience to consist of educated college professionals who are not statisicians but may be quantitatively inclined, and who take an interest either in football (version 1) or in school security and related issues (version 2).
  • Option 2 requires some reshaping of data. The mtg3a_supplement notes  Download notes
  • and/or slides  Download slides
  • and/or accompanying presentation from FA2020 show you how.
  • Both options call for penalized fitting of a version of your model. (You’ve seen this in other contexts, but not necessarily the context of the Bradley-Terry or Rasch model. It’s scheduled for discussion in Unit 3 (not later than mtg3d) and/or in a recorded presentation to be made available shortly thereafter. Either way, you can set your self up for the parts of the assigment involving penalized fitting by doing the other data analyses that the assignment calls for.)
  • For this assignment, you’ll receive full rubric scores from 2 peer reviewers, while the instructor will score a subset of the rubric items, some selected at random and others at our discretion.  Only the instructor scores affect your grade.

I. Bradley-Terry modeling with NCAA basketball data 统计建模作业代写

Fit your own Bradley-Terry models to rank NCAA men’s1 basketball teams for the 2021-22 season. Use the win-loss data for Division 1 contests during the regular season and through the first 4 rounds of postseason, up to but not including Final Four games: http://stat.lsa.umich.edu/~bbh/s485/data/cbb-mens-2022-03-27.csv (Links to an external site.).

(The data set has an additional column, “Date”, indicating the date of the game. If you ensure that it is class is “Date”, either by making sure it’s read in that way or by coercing it using as.Date(), it’s easier to use it to create new features [variables].) Your analysis should include the following components.

1.

Fit a plain-vanilla Bradley-Terry model — fitting via maximum likelihood, without penalties or Bayesian priors; no home-team advantage — to these data. (Cf. mtg3a).

2.

Next, fit a version of the model using a penalized form of logistic regression, such as provided by arm::bayesglm(), brglm::brglm(), glmnet::glmnet()2 or lme4::glmer()3.

3.

Adapt your model specification to account for game location, and to allow for the possibility that teams’ strength changes over the course of the season. To do this:

a. Incorporate in your model an additional parameter4 permitting an increased (or decreased) probability of winning for whichever of the two contestants is playing at home. Note, however, that this parameter should affect chances of winning only for pre-tournament games. As during the tournament both teams are generally playing away from home. (The data continue to list one team as Home and the other as away, but this is arbitrary for games after the end of the regular season.)

b. Also incorporate into your model interactions of the team variables with one or more functions of the game’s date. Design these functions so as to enable you to extract from the fitted model an improved measurement of relative strengths of the 4 teams that made the final four.

4. 统计建模作业代写

Once you’ve decided on a model specification including a suitable home court advantage parameter and permitting teams’ strengths to change over the season. Try fitting your model using both maximum likelihood and the penalized fitting routine you selected for step 2. (If you can’t get one of them to work, that’s OK, but if neither works you’ll need to revise your model until you get one of them to fit.)

5.

Which of the 3-4 fitted models do you prefer? Present appropriate statistical calculations that inform your choice. (Answer using these data only — i.e., without letting your personal preferences, or knowledge of what happened during the championship tournament, interfere.)

6.

How does your preferred model rank the teams? Present the top 10, bottom 5 and 0-5 additional teams of interest to you.

7.

Picking a favorite of the models you’ve fit, demonstrate the uncertainty accompanying your estimated model by associating** **a standard error5 with the difference of any two teams’ fitted ability coefficients. If you are reporting hypothesis tests (see #7), these standard errors don’t need to be reported within your paper.

8. 统计建模作业代写

For Version 2 of the paper (tentative): prepare and present either of the following additional analyses:

a. Picking a favorite of the models you’ve fit, demonstrate that the model permits you to test the statistical hypothesis that Team A was stronger than Team B, for any teams A and B in the league. Give a version of the hypothesis test assuming that it is the only test being conducted, and another version of the hypothesis test situating it in a family of hypotheses. One for each ordered pair (A, B) of distinct teams in the league, and maintaining familywise Type 1 error rate α.

b. For each of the 3-4 models you fitted to the regular season data, for paper version 1, evaluate the model’s predictions for postseason games. Do this by treating the post-season games as a holdout sample, evaluating an appropriate loss function as averaged over it. Comment on whether the results of this evaluation validate the model you selected in advance of receiving postseason data, and/or the criteria you used to select that model.

Your writeup should take the introduction-methods-results-discussion format used in earlier papers, and should also include a technical appendix. The paper can be addressed to non-statistical readers interested in NCAA tournament rankings. Such as family members, friends not majoring in statistics or youth sports officials interested in adopting methods similar to the NCAA’s.

II. Rasch modeling with school security data 统计建模作业代写

Finn and Servoss (2014, Appl. Res. on Children (Links to an external site.)) studied security and student discipline practices at a national sample of U.S. K-12 schools, framing the goals of their research as follows (p.1):

Proponents of high security and strict disciplinary codes in American high schools argue that they make schools safer and create an orderly environment for learning. But the same practices can also create ‘prison like’ conditions that make some students feel ill at ease and others aware that serious misconduct may occur at any time. (Brooks, Schiraldi, & Zeidenberg, 2000) The result can be feelings of defensiveness on the part of students, accompanied by emotional and physical disengagement from school….

This research examined the relationships among student misbehavior, suspensions, and security measures in a nationwide sample of high schools. The purpose was two-fold: first, to identify the characteristics of schools that implemented the most invasive security measures. We asked whether these schools were the largest, had the highest proportion of at-risk students, and whether they were located primarily in urban or high-crime neighborhoods. We also asked if high security in a school was related to increased suspension rates, thus depriving students of continuous instruction and increasing their sense of alienation.

Next, 统计建模作业代写

We examined conditions related to racial/ethnic and gender inequities in suspensions, focusing on the role of school security and students’ misbehavior.…

Fundamental to this research is the manner in which the authors quantify schools’ “security measures.”

A small team of social science researchers is preparing a proposal for funding to do a similar study. The new study would use more recent data, and perhaps updated statistical models and methods. They need to decide whether to propose to use Finn and Servoss’s (F. & S.’s) model or a different one and whether to propose updates to F. and S.’s model-fitting methods. These researchers are trained in basic statistics and regression modeling. And understand well such concepts as Normality and skewness, sampling distributions versus data distributions; they even know what a Rasch model is. But they lack facility with R or comparable computing platforms, and don’t know how to go about fitting Rasch models; so they’ve retained you as a statistical consultant.

Your assignment is to:

(a) reconstruct measurement models similar to the one used by Finn and Servoss to combine ELS:2002 data elements into an index of the security environment at U.S. schools, using both maximum likelihood and a penalized variant of maximum likelihood for fitting; (b) consider whether simple variations of the model specification might improve it; (c) compare the alternative Rasch models you will have fit, arriving at a recommended model; and (d) compare Finn and Servoss’s school security measurement to your preferred alternative. You’ll convey your findings to the researchers in a memo, following the introduction-methods-results-discussion structure with a reproducibility appendix.

Additional detail on (a)-(d): 统计建模作业代写

Additional detail on (a)-(d):

a.

Finn and Servoss selected 7 of 20 security-related data elements from responses to the ELS:2002 Administrator Questionnaire. A reduction of the data including received responses to all 20 of these questions and sub-questions is available to you via http://dept.stat.lsa.umich.edu/~bbh/s485/data/security_wide.csv (Links to an external site.) (see unit 3 class notes). The full questionnaire is on Perusall (extras/ELS2002_AdminQ_baseyear.pdf) and at the US Education Department’s website (Links to an external site.), while an extract of it presenting the survey questions behind these items is posted on Perusall (extras/ELS2002_AdminQ_baseyear_securityQs.pdf).

Your first task is to (try to) replicate their construction of a Rasch-type security index in R. Identify ELS:2002 questions contributing to their model, if necessary reshaping and combining these data elements to match transformations they did. Then fit the Rasch model, first using base R’s glm() function in R, and next using the brglm package’s brglm() function, the arm package’s bayesglm(), the glmnet package (glmnet() and/or cv.glmnet()6) or lme4::glmer()7. You should now have 2 versions of the model, one fit with glm() and the other with one of bayesglm(), brglm(), glmnet()/cv.glmnet() or glmer().

b.

Recommended for paper version 1, required for version 2: be sure to include schools providing responses to some but not all of the seven questions F. and S. selected. Indicate at a suitable place in your writeup the formal name of the missing data mechanism you will have assumed in addressing the missing data issue in this way.

c. 统计建模作业代写

Fit a Rasch model using a different collection of ELS security-related questions than the 7 selected by Finn and Servoss. Your questions can overlap with Finn and Servoss’s or be entirely distinct from theirs, provided that the two sets are not precisely the same. Select questions that in your view are equal in validity to those Finn and Servoss selected; include a brief justification of this assessment. Do this two ways, using glm() and whichever of bayesglm(), brglm(), glmnet() and cv.glmnet() you selected under (a) above. You should now have 4 fitted Rasch models: two fit with glm() and the other two with your selected alternate fitter; two using Finn and Servoss’s questions and two using another set of questions that you selected.

d.

Recommended for paper version 1, required for version 2: Again, include schools providing some but not all of the seven questions F. and S. selected. There may be more or fewer schools presenting complete data on your selected questions as opposed to F. and S.’s. And there may be more or fewer schools with partially complete patterns of response. Indicate how your choice of questions (as opposed to F. and S.’s) affects sample size.

e.

Which of the four fitted models do you prefer? If your preference is for the F.-S. model — the model fit by maximum likelihood using the same survey questions Finn and Servoss used — also state your next most preferred version of the model. Either way, present appropriate statistical calculations that inform your choice.

f. 统计建模作业代写

Create school security indices from the F.-S. model and your preferred model among the others. Since Rasch index values can be challenging to interpret, it’s common to focus on where objects stand on a Rasch index scale relative to one another — that is, on the ordering of objects that the Rasch index induces8. Compare your index’s induced ordering of schools to F. and S.’s. Present a visualization as well as numeric descriptions of the extent to which your and F. and S.’s indices places schools in different orders. (This isn’t a situation where you’ll locate appropriate statistics or charts by searching the Shalizi textbook or course notes; you’ll need to design yourself or select from options you find on the web or elsewhere both the visualization and the numerical summary.)

g.

For version 2 of the paper only (tentative), study the sensitivity of your preferred model to bias due to missing data, by fitting a variation of that model incorporating one or more additional independent variables. Compare the fit of your model with additional independent variables to an otherwise similar Rasch model without those variables. Indicate how the use of the additional variable(s) allows this model to weaken the MAR assumption.

h.

Your discussion should include your recommendations for follow-up research using the same or similar data sources to construct indices of schools’ security environments. And/or using F. and S.’s or a similar index as an explanatory variable. A table with several additional school-level variables is given here (Links to an external site.); here are a couple of things to be aware of when working with it.

i. 统计建模作业代写

An excerpt of the Education Longitudinal Study of 2002 codebook (Links to an external site.) describing these variables is posted here (Links to an external site.).several of these new variables are themselves subject to missingness (coded as -9, a value you’ll need to switch out for “NA”). You can use these in a sensitivity analysis of your original model, but you’ll be introducing one missingness problem to address another one – a limitation you’ll want to note in discussion.

j.

You can use dplyr::left_join() to merge these new variables with the security_wide data.

1.

Alternately, do the same for 2021-22 NCAA Women’s basketball teams. As of this writing, we only have the pre-tournament version of the data for women: http://stat.lsa.umich.edu/~bbh/s485/data/cbb-womens-2022-03-17.csv (Links to an external site.)↩︎

2. 统计建模作业代写

Functions in the glmnet package follow a different format than glm() and company, in particular not accepting formula specifications of models (eg y\~x1+x2) and instead requiring you to pass a vector dependent variable and a numeric matrix of independent variables. To produce the latter (a numeric “design matrix”) it may be easiest to first fit a glm() to your data set and then extract the matrix from your fitted glm using model.matrix(). This trick is demonstrated in mtg2n\_ch7-handout.pdf. Glmnet expects your X matrix not to include an intercept column, as demonstrated in mtg2n. It assumes that you want an intercept and fits one any way. To override this default behavior, fitting a model without an intercept, specify intercept=F.

3.

A commonality between glmnet functions and base:glm() is that both accept a “family” argument, with family=”binomial” indicating logistic regression. If you leave out the family argument, then you’ll get an ordinary linear model.↩︎

4. 统计建模作业代写

The lme4 package now contains the lmer() and glmer() functions that the Gelman and Hill (2006) readings describe as residing in the matrix package.↩︎

5.

How to do this is likely to be discussed in class. In the meantime, don’t hesitate to try out your own ideas, which you can always update later if you learn something that looks more promising.↩︎

6.

The standard errors you’ll need for this won’t appear in a summary() of your fitted model: that will provide standard errors for each coefficient separately, but those aren’t enough to calculate the standard error of a difference of coefficients. The necessary information is encoded in the matrix of estimated sampling covariances of coefficient estimates, i.e. the matrix with (i,j) entry Cov(i, j), which you can extract from a fitted glm using vcov(). If you take this approach, it’s a part of your task to figure out how to arrange and transform entries of this matrix to get a valid standard error for i – j.

7. 统计建模作业代写

Alternatively, you can use a pre-set routine for testing “contrasts” of coefficients, such as the glht() function in the multcomp package. (A code demonstration of the use of this package is provided on the Canvas site, in Files: problem_sets/u3paperhints_coefcontrasts.pdf.* *Sections 1 and 2 of the extended examples document provided with the package are also relevant: vignette(‘multcomp-examples’, package=’multcomp’).)↩︎

8.

Functions in the glmnet package follow a different format than glm() and company, in particular not accepting formula specifications of models (eg y\~x1+x2) and instead requiring you to pass a vector dependent variable and a numeric matrix of independent variables. To produce the latter (a numeric “design matrix”) it may be easiest to first fit a glm() to your data set and then extract the matrix from your fitted glm using model.matrix(). This trick is demonstrated in mtg2n_ch7-handout.pdf. If there are incomplete cases within the data set you passed to glm(), then you’ll also need to provide glmnet() with a version of the response vector that has these cases removed. glm() will have generated this trimmed-down response in the course of its work, and you can retrieve it using my_glm\$y (assuming you saved your glm object under the name my_glm).

9. 统计建模作业代写

Glmnet expects your X matrix not to include an intercept column, as demonstrated in mtg2n. It assumes that you want an intercept and fits one any way. To override this default behavior, fitting a model without an intercept, specify intercept=F.

10.

A commonality between glmnet functions and base:glm() is that both accept a “family” argument, with family=”binomial” indicating logistic regression. If you leave out the family argument, then you’ll get an ordinary linear model.↩︎

11.

The lme4 package now contains the lmer() and glmer() functions that the Gelman and Hill (2006) readings describe as residing in the matrix package.↩︎

12.

If you’re puzzled by the concept of “induced ordering,” here is an example to illustrate it. If Jane is taller than Jim but Joe is taller than Jane, then the ordering of the three individuals that the Height variable induces is: {Jim <= Jane <= Joe}. If at the same time Jane is richer than Joe who is in turn richer than Jim, then the ordering Wealth induces is: {Jim <= Joe <= Jane}.

13.

Just as a height variable might induce a very different ordering than a corresponding wealth variable, security indices based off of different questions, or based off of a common model that was fit in different ways, can induce different orderings of schools.↩︎

统计建模作业代写
统计建模作业代写

更多代写:Cs Online Test代考多少钱  gmat online作弊  英国GMAT考试代考  Essay英文代写  Law Assignment代写 统计分析作业代写

合作平台:essay代写 论文代写 写手招聘 英国留学生代写

The prev: The next:

Related recommendations

1
您有新消息,点击联系!