Some people have asked about Q1.3. What we’re doing here is introducing you to a concept from machine learning: given a bunch of observations you have made, if you assume the data follows some pattern (a model), what are the best parameters for that model? Imagine the case of flipping a biased coin. You flip it 100 times and see heads 60 times. Which bias p is most likely to explain your observations? For example, if you had two coins with biases p = 0.05 and p = 0.65, it’s incredibly more likely that you would get 60 heads with the p = 0.65 coin. Is this p = 0.65 coin the ‘best’ one (the one most likely to recreate the observed outcome)? No, it’s not. You can actually show that p = 0.6 maximizes Pr[ seeing 60 heads out of 100 flips | Pr[heads] = p]. In Q 1.3, you want to find the value p* of Pr[Yes] such that Pr[Pr[Yes] = p | 75+, 25-] is maximized at p = p*.
So what is the deal with ‘assuming a uniform prior’? If you flip the conditional probability Pr[Pr[Yes] = p | 75+, 25-] around using Bayes’ rule, you’ll get Pr[Pr[Yes] = p] somewhere in there. The uniform prior is the assumption that all values for Pr[Yes] are equally likely. We usually use a uniform prior when we have no insight about what the model should be. When might we not use a uniform prior? Maybe you have some information about drug usage at other schools. In that case you could still consider all values of Pr[Yes], but give preference towards hypotheses in which Pr[Yes] is closer to those values that have been observed elsewhere. You give preference towards certain values of Pr[Yes] by changing the distribution Pr[ Pr[Yes] = p]; for example, you could have Pr[Pr[Yes] = p] = 2(1-p) to give preference towards smaller values of Pr[Yes].
This subproblem just scratches the surface of a class of problems called model fitting. For example, in some of the work that I have done with Twitter data we try to find an independent cascade model that ‘best describes’ our observations. Using this model we can make predictions about how far certain tweets will spread within the network.