AMS 553.414/614: Applied Statistics and Data Analysis Practice questions for final exam 代考应用统计和数据分析 1. The data for this practice question is based on the cars dataset which is a...View details
Take-home assignment #3
数据分析课业代写 When an assignment involves working with data or/or coding, you do not need to submit the data or the code that you used to solve the assignment.
- When an assignment involves working with data or/or coding, you do not need to submit the data or the code that you used to solve the assignment. You only need to submit your tables, charts, and discussions.
- Your submitted work should all be within one single document that must be either a pdf document or a Word document. Please note that there are no exceptions to this rule. That is, assignments that are not in pdf or Word format will not be accepted. (Please note that “Google”-format documents are not accepted.)
- If the assignment involves creating charts, I suggest you please take a screenshot of your charts and then paste them (one at a time) into a single Word document, and then submit just such single Word document. (Please recall that we ask that you please submit only one document.)
- Please note that TurnitIn will be used to check for plagiarism of work. Turnitin checks for plagiarism on documents across all submissions in the class and also submissions from other classes. It also checks for plagiarism on documents submitted in past classes.
- Please show all your work to receive full credit. If you provide just the answer, without showing how you derived it, then you will not receive full credit.
1.(Nominal vs “real” [inflation-adjusted] time-series) 数据分析课业代写
The attached Excel spreadsheet, “Price of a Big Mac in U.S. over time.xlsx,” contains monthly time-series data on the nominal price of a Big Mac (the sandwich from McDonald’s) over time in U.S. dollars. The sources of the data are listed immediately on top of the data (within the spreadsheet).
Although this is monthly data, please note that certain months are missing. This is simply a fact of these data. Very rarely we have the luxury of working with a perfectly balanced data, particularly when there is a time dimension to the data, and this particular data is no exception to this fact.
Using the “CPI-U, all items, non-seasonally adjusted” monthly series (i.e., CPI for urban consumers that includes all items), which you can obtain from the Bureau of Labor Statistics (BLS), please convert this nominal price time-series to a “real” price time-series (i.e., an inflation-adjusted series). Next, please plot both the nominal and the real time-series within one single chart. Finally, please write a brief write-up on your findings, For example, when were Big Mac’s the most vs the least expensive, in real terms? Were you surprised by any of the results? Do you observe any specific interesting trends and patterns in the data? Etc.
Please note that when constructing a “real” time-series of prices, one wants to inflation-adjust each nominal price to the most recent time period available within the data.
Finally, please make sure that you follow the “do’s” and “don’t’s” of charting, because points will be deducted if those guidelines are not followed.
2.(Summarizing data numerically) 数据分析课业代写
One of the most important aspects of analyzing data is the ability to summarize it efficiently. An important tool that allows us to summarize data efficiently is a table of descriptive statistics. This type of table displays, within one single page, some key statistics such as average, standard deviation, skew, min, max, and percentiles for each of the variables within a dataset. Below is an example of such a table, which summarizes a dataset on 270 MSAs (Metropolitan Statistical Areas) across the U.S. That table takes the original dataset, which is at the MSA-level, and provides statistics on the share of the insured population across the MSAs, on real GDP per capita across the MSAs, on the number of microbreweries across the MSAs, and on several other variables. (Please note that the “p”s represent percentiles; for example, “p5” is fifth percentile.) 数据分析课业代写
This question asks you to create a table of descriptive statistics for a different dataset. Specifically, attached to this assignment is a cross-sectional dataset, called “County-level dataset.xlsx,” that contains information at the county level for the year 2012. For each county in the U.S., it contains data on, for example, the percentage of the population that is obese, median income, the number of fast-food restaurants per 100,000 people, the share of the population that is Hispanic, the percentage of the population that has a high school diploma or higher, and so on. This data was compiled from the following sources:
- U.S. Census Bureau
- American Community Survey (ACS)
- Small Area Income and Poverty Estimates (SAIPE)
- U.S. Centers for Disease Control and Prevention (CDC)
- U.S. Bureau of Economic Analysis (BEA)
- U.S. Department of Agriculture (DOA)
- U.S. Department of Health and Human Services (DHHS)
- FBI Uniform Crime Reporting
Please use a statistical software (R, Excel Stata, Python, etc) to create a table of descriptive statistics for this dataset. Please follow the format of the table that is displayed below, because in economics that is the typically used format. In other words, your table should have an informative title, should fit within one single page (and not spillover across multiple pages), should have a footnote, the description of the variables should be in the first column and be self-explanatory, the names of the statistics should be on top (as a row), the table should be as clutter-free as possible (for example, there is no need to display more than two decimal numbers for each number), it should be typo-free and with no grammatical errors, its columns should be aligned, etc.
In other words, the “look” of the table matters, and its look will be reflected on the grade that you receive.
From directly inspecting the table that you just created, and directly using the numbers in the table (and not in the raw data), please answer the following:
i. Which variables (if any) are moderately skewed right? Please explain why.
ii. Which variables (if any) are strongly skewed left? Please explain why.
iii. Which variables (if any) have “extreme” outliers on the right? Please explain and show your calculations.
iv. Suppose there is a particular county whose obesity rate is 19.7%. What is the z-score of this county for obesity rate?
v. Your friend tells you that 45% of the observations for median income fall within the range of $43,039 and $65,799. Is your friend correct? Please explain.
vi. Please note that, for certain variables, the mean is meaningfully larger (say, more than 50% larger) than the median. Please explain, making reference to one of the statistics displayed in the table, why that is the case.
Statistics in Education Policy 统计课业代写 There are 12 question groupings. You are expected to address each question in the grouping. Each question “grouping” is worth 8 points and ...View details