MAT 170 FINAL PROJECT 数据分析工具代写 All Final Projects need to be done individually. No teams are allowed. We will refer all suspected acts of collaboration to student judicial All Final ...View details
MA1607 | Project 2
数学算法代写 1 Project background This project builds upon the work done from week 11, where we learned the mathematics underlying the Google PageRank algorithm.
1 Project background
This project builds upon the work done from week 11, where we learned the mathematics underlying the Google PageRank algorithm. Most of the key learning materials needed to implement this project can be found on the Blackboard, in the folder for Project 2. More specifically
Lecture 0 (Week 11) explained the motivation for ranking nodes on a network and outlined the imaginary computer simulation that underpins the PageRank algorithm. During the Lecture 1 (Week 12) we introduced the key concepts needed to represent networks mathematically: adjacency matrices, as well as in- and out-degrees. We learned how to handle adjacency matrices in Matlab during Lab Session 1 (Week 12).
The simulation that motivates the PageRank algorithm describes surfers who browse a network according to some set rules. The simulation of individual surfers is possible, but very computationally inefficient, so direct simulation needs to be replaced by studying the probabilities for surfers to be on every particular node within the network. And studying how these probabilities change from iteration to iteration. Seminar Session 1 (Week 12) was concerned with hands-on calculations of these probabilities.
In Lecture 2 (Week 13) we showed how the effect of iterations on probabilities can be reproduced by multiplying probabilities vector by the conditional probability matrix. Lab Session 2 (Week 13) was devoted to computing this matrix from a given adjacency matrix, see cmat1.m in Lab Session 2 (Week 13). This allowed us to reproduce the effect of simulations more easily, see iterations1.m in Lab Session 2 (Week 13).
Experimentation with script iterations1.m in Lab Session 2 (Week 13) showed that probabilities become stable after a few iterations of the PageRank algorithm. Lecture 3 (Week 14) argues why this ought to be expected for many networks of interest and shows how to reduce the problem of finding the limiting probabilities to the solution of a system of linear equations. This transformation was exemplified for a tiny network considered during Seminar Session 3 (Week 14); a general computer implementation of the same algorithm was covered in Lab Session 3 (Week 14). As the result, we wrote the script direct1.m which takes the adjacency matrix and determines the limiting probabilities for surfers to be on any particular node. The nodes are then ranked in the order of decreasing limiting probabilities (higher probabilities mean higher rankings).
Unfortunately, the simple simulation described in the Lecture 0 (Week 11) that motivated the PageRank algorithm does not lead to useful results for some networks. We illustrated the potential difficulties during Lecture 4 (Week 15) and also proposed generalized surfing rules that address the issue. We shown that these new surfing rules only affect
elements of the conditional probabilities matrix and implement the generalized version of the matrix during Lab Session 4 (Week 15).
Last, but not least, we also need examples of networks to study, so Seminar Session 2 (Week 13) was devoted to the examples of nodes and edges that would be appropriate to a network showing relationships between mathematical topics.
2 Project tasks (Part 1) 数学算法代写
Let u and v be the last two digits of your student number (as read from left to right). The tasks in Parts 1 and 2 of the project will use these digits to personalize your instructions.
In the first group of tasks you will need to compute and analyse the ranking of nodes for a pre-defined network, describing the dependencies between 6 mathematical topics. Each node represents a topic, and each edge represents the dependence. The direction of the edge goes from topic A to topic B if topic B underlies or is employed in some way in topic A; for instance the edge from “Systems of Linear Equations” to “Matrices” indicates that any analysis of systems of linear equations can be performed by studying the corresponding system matrices. If u (the penultimate digit of your student number) is odd, you will be working with the Network 1, otherwise, you will be working with the Network 2, see the diagrams above.
Create script task1a.m, based upon script direct1.m, to compute the vector of limiting probabilities for your network. These are the probabilities that a random surfer following the PageRank algorithm will visit each node. The PageRank will then rank the node with the largest probability first, the node with the second largest probability second, etc (10%).
Task 1(b): 数学算法代写
Write a short section of your report to describe your findings. It must include the mathematical definition of your adjacency matrix (typed using the equation editor), as well as a brief statement of what changes to direct1.m were actually needed. Also write a paragraph to state what are the top three ranked nodes for your network, what are the values of their limiting probabilities and their ranks, and why do you think the associated topics got the highest ranks (10%).
For the next group of tasks you will need to add 3–5 nodes to your network, as well as at least 7 more edges. The resulting network should contain not fewer than 9 and not more than 11 topics.
The following are some suitable examples of mathematical topics that can become new nodes, as well as possible dependencies between them:
• Systems of linear equations and matrix algebra: the latter can be used to solve the former, so if nodes systems of linear equations and matrix algebra are included in your network. There would be a directed edge pointing from systems of linear equations to matrix algebra.
• Infinite series −→ limits: the theory of limits is needed to evaluate infinite series.
• Definite integrals ←− area formulas in geometry: definite integrals can be used to calculate the area under a curve. Is there a link in the opposite direction?
• There is a dependence between discrete distributions (in probability) and infinite series: the moments of a discrete distribution are often calculated using an infinite series. Hence, if nodes discrete distributions and infinite series are included in your network. There would be a directed edge pointing from discrete distributions to infinite series.
Also check out the discussion of suitable topics during Seminar Session 2 (Week 13). You may use one of these examples in your network, but not more than one. These are tasks that you need to perform for the newly created network:
Task 2(a): 数学算法代写
Prepare a graphical representation of your network, with each node numbered and labelled by a mathematical topic. Do not forget to clearly indicate which nodes and edges have already been there, and which ones you contributed. People usually draw their networks using Microsoft Word or Microsoft Powerpoint. But you are free to use any other software if you prefer (10%).
For each edge that you added to the network to connect a pair of nodes, provide a brief justification in your report (1-2 sentences) of why do you believe these nodes must be connected (20%).
Create script task2c.m based upon your earlier created script task1a.m to compute limiting probabilities for your extended network. Try running it. Write a few sentences in your report to briefly describe the results you obtain (10%).
Depending on the configuration of the network you just created, your script from Task 2(c) may fail to report the limiting probabilities (or it may give you warnings about the matrix M). If this is the case, write one or two sentences to explain why this happens for your network.
3 Project tasks (Part 2) 数学算法代写
The tasks in Part 2 are about applying the extended version of the PageRank algorithm to your network and comparing the results with your observations from Task 2(a–b).
Task 3(a): Create script task3c.m, based upon script task2a.m, to compute the vector of limiting probabilities for your network using an extended version of the PageRank algorithm. This will require you to modify task1a.m to use function cmat2() instead of cmat1(). Use the value of α = 0.15 + 0.0v, where v is the last digit of your student number (10%).
Task 3(b): Add a brief section to your report to describe the top three ranked topics for your modified network, what are their limiting probabilities and ranks. And how these results are different (if at all) from your results in Task 2(c). For full marks, get your program to print out top three PageRank values in descending order (10%). The remaining two tasks are slightly harder.
Task 4(a): Create Matlab script task4a.m based upon script task3a.m, to generate a plot showing how limiting probabilities depend on the parameter α (α should change between 0 and 1). Make sure all your modifications are properly commented in the script (10%).
Task 4(b): Write a section in your report that includes the plot obtained during Task 4(a). Explain what specific modifications to task3a.m were needed to generate such a plot and also describe how limiting probabilities change for α ∈ [0, 1] (10%).
4 Project summary 数学算法代写
Overall, the key deliverables for this project are as follows:
• The report that contains answers to Tasks 1(b), 2(b), 3(b) and 4(b), which should be submitted as a .PDF file. The overall report should not be longer than 3 sides of A4 (assuming 2cm margins, at least 11pt font and 1.5 line spacing).
• The network diagram as specified in Task 2(a), which should be submitted either as a part of the report, or as a separate .PDF document.
• All Matlab scripts that were produced to answer Tasks 1(a), 2(c), 3(a) and 4(a), which should be submitted within a single .ZIP file. If you have incomplete answers to some of the tasks, please submit incomplete solutions as well. Because partial marks are often possible. Use your student ID as the name of the .ZIP file.
5 Submission instructions
The submission deadline for this project is 18:00 on Friday 10 June 2022. Your work will have to be submitted via WISEflow, and should include:
1. Your report and network diagram, combined into a single .PDF file.
2. All (both partial and complete) Matlab scripts that you produced while working on the project, combined into a .ZIP file with your student ID as the name. Submit this .ZIP file as an attachment to the WISEflow submission.