Homework 3 数学统计作业代写 Instructions: Solve the problems in the spaces provided and save as a single PDF. Then upload the PDF to Canvas Assignments by the due date. Instructions: Solve t...View details
应用数学计算代写 Read these instructions carefully!!! This project involves predicting what happens to a mortgage loans that have been purchased by FNMA during the
Read these instructions carefully!!!
This project involves predicting what happens to a mortgage loans that have been purchased by FNMA during the 3rd quarter of 2001.
The data has 30 predictor variables described in the file PredictorsDescription.csv”. All 30 of these variables are known to FNMA when the loans were acquired. Recently obtained performance information about all of the loans has become available relatively recently (in around 2021). Some of the loans performed as lender would expect: monthly payments made on time up until performance data stopped being collected. Other loans were problematic in that the holder of the mortgage may have foreclosed (payments stopped, home was taken by the lender, etc..) In many instances, the holder of the mortgage sold the home and paid the outstanding balance owed before the mortgage term (15 years, 30 years, etc..)
Every loan has a 12 digit loan id LID.
Two performance variables are provided in the training set for each loan: 应用数学计算代写
FORCLOSED – a Boolean variable that indicates whether or not foreclosure took place
NMONTHS – a variable giving the number of months that the mortgage remained on the books by the lender
A large data set has been randomly split into
TrainingData.csv – this one has all 30 predictors, a loan ID (LID), and the two performance variables (FORCLOSED, NMONTHS ) for 588,490 loans.
TestDataYremoved.csv – this one has only the 30 predictors and loan ID (LID) for a different set of loans (196,164 of them).
Your task is to use the 31 predictors to make predictions about performance for the loans in the test set.
Specifically for each loan in the test set:
1) Predict whether the loan will foreclose or not by supplying a Boolean value (True for foreclosure, False for non-foreclosure)
2) Predict the accuracy of your predictions in 1)
3) Predict NMONTHS – this should be a number but needn’t be an integer
4) Predict the accuracy of your predictions in 3).
Important details: 应用数学计算代写
- Your final submission should involve:o Uploading a .csv file in Blackboard (see Upload CSV File link)
o Uploading jupyter notebooks you used to do your calculations
o Answering 3 questions in a Blackboard final submission assessment (see Three Questions link)
- The .csv file should have a header column and a row for every loan in the test dataset. This file should have exactly 3 columns, labeled LID, FORCLOSED, NMONTHS and the number of rows not including the header should be exactly 1+196,164 = 196,165.
- The final submission questions appear in Blackboard (Three questions) as an assessment where you are asked to provide 3 numbers:
o the true positive rate (TPR) for your predictions in 1) – see definitions below
o and the false positive rate (FPR) for your predictions in 1) – see definitions below
o the MAD (mean absolute deviation) between predicted and true values of NMONTHS i.e. the average of the absolute difference between your predicted value of NMONTHS and the true value of NMONTHS, averaged over all loans in the test dataset.
- The quality of your predictions in 1) will be measured by taking the difference TPR-FPR. You should try to make this as large as you can.
- The quality of your predictions in 3) will be measured by how small the MAD is for your predictions.
Some definitions: 应用数学计算代写
For foreclosure prediction (since I know the true status of each loan) I can fill a 2×2 table with counts of where each loan falls in the following table based on its true and predicted foreclosure status:
So once you submit your predictions the four numbers in the table should add up to 196,164. I can then calculate the quantities
and you should be striving to make TPR high and FPR low.