Homework 3 is out

October 7, 2011

Grab homework 3 off of the website and make sure to attend class on Tuesday to hear about how to make us go bankrupt!  It’s due at the start of class on October 18.

Please note: We’re going to do Bankrupt Us! on the next homework.  Don’t worry about it for this one.

Exam Details

October 5, 2011

The average on the exam was a 76 and the standard deviation was 12.  The test has been curved so that a 92 is the 100% mark.  Detailed solutions have been posted on the website.  Notice that for the G(n,p) question there’s a discussion about something interesting that some students did.  It’s not quite right in a subtle way, and it would be to your benefit to read over the solution and discussion for that problem.

If you have questions, see Severin for problems 1 and 2, Polo for question 3, U for questions 4 and 5, and Bren for questions 6 and 7.

A few more minor details for the homework

September 28, 2011

In question 2 part 1 I’ve change the problem to specify that max(\emptyset) = average(\emptyset) = 0, otherwise the problem wouldn’t be fully specified (for example, we could have taken F(\emptyset) = -\pi).

Also, if you ever need to, you may assume that the set function which is 0 for every element of its domain is a NN-M-SM-SF.

A Sweet Deal

September 27, 2011

Because you guys have been studying for the exam, parts 4-6 of question 2 will now be worth 45 bonus points (15 bonus points each).  I’ll redistribute the 30 points among the remaining questions and post a version with the new point distribution later.  If you don’t have time or are unable to do the bonus, do not worry.  We will calculate grade cutoffs before any bonus is applied.  Doing extra credit throughout the semester can only help you, and not doing it cannot hurt you in the sense that if you were the only person who didn’t do extra credit, your final grade would be the same as if no one did extra credit.

UPDATE:  The Homework is now due a week from Today (on Tuesday, October 4th).  Hopefully this will enable you to attempt all of the extra credit.

Some clarifications for the homework

September 27, 2011

Some people have asked about Q1.3.  What we’re doing here is introducing you to a concept from machine learning: given a bunch of observations you have made, if you assume the data follows some pattern (a model), what are the best parameters for that model?  Imagine the case of flipping a biased coin.  You flip it 100 times and see heads 60 times.  Which bias p is most likely to explain your observations?  For example, if you had two coins with biases p = 0.05 and p = 0.65, it’s incredibly more likely that you would get 60 heads with the p = 0.65 coin.  Is this p = 0.65 coin the ‘best’ one (the one most likely to recreate the observed outcome)?  No, it’s not.  You can actually show that p = 0.6 maximizes Pr[ seeing 60 heads out of 100 flips | Pr[heads] = p].  In Q 1.3, you want to find the value p* of Pr[Yes] such that Pr[Pr[Yes] = p | 75+, 25-] is maximized at p = p*.

So what is the deal with ‘assuming a uniform prior’?  If you flip the conditional probability Pr[Pr[Yes] = p | 75+, 25-] around using Bayes’ rule, you’ll get Pr[Pr[Yes] = p] somewhere in there.  The uniform prior is the assumption that all values for Pr[Yes] are equally likely.  We usually use a uniform prior when we have no insight about what the model should be.  When might we not use a uniform prior?  Maybe you have some information about drug usage at other schools.  In that case you could still consider all values of Pr[Yes], but give preference towards hypotheses in which Pr[Yes] is closer to those values that have been observed elsewhere.  You give preference towards certain values of Pr[Yes] by changing the distribution Pr[ Pr[Yes] = p]; for example, you could have Pr[Pr[Yes] = p] = 2(1-p) to give preference towards smaller values of Pr[Yes].

This subproblem just scratches the surface of a class of problems called model fitting.  For example, in some of the work that I have done with Twitter data we try to find an independent cascade model that ‘best describes’ our observations.  Using this model we can make predictions about how far certain tweets will spread within the network.

Review for the exam

September 23, 2011

Hi everyone,

I will have some extra office hours on Sunday at 3:00 pm.  Come with questions about the HW or content that could be on the test.  Meet at the 7th floor atrium, or look for a sign indicating where I have gone if I can’t hold them there.

Also, on question 2 of the HW, when I ask you to prove the economies of scale property, you should also assume that e \not\in B (that is, e \in X \setminus B).  If you don’t have this assumption, you might end up with the case where you are trying to prove that F is monotone.

Homework 2 is out

September 21, 2011

Homework 2 is posted on the website and all of the lectures have been updated.  Please take a look at it soon and let us know of any questions you have!  I’ll have extra office hours over the weekend (probably on Sunday) so that you can ask questions about the test and homework.

 

Getting ready to submit your HW

September 14, 2011

Please make sure you’re following all of our instructions when putting final preparations on your HW:

  • Typeset your homework
  • Include the cover sheet on top
  • Staple everything together
  • Email Brendan your artifact (exported pdf) and a description (instructions are at the beginning of the third lecture)
  • Turn in your assignment by the beginning of class on Thursday
Finally, a few common questions about the HW:  In 2.2, the binary tree we are talking about is the binary tree of n levels where every level is full, so it has 2^(n+1) – 1 nodes in it and 2^n leaf nodes.  When talking about betweenness centrality, you should assume that the graphs are connected, and that they are sufficiently large (> 2 vertices).
Please email the staff any other questions you might have!

Office Hours

September 6, 2011

Office hours have been posted on the course website.  Let us know if you can’t make any of the times that we’ve listed.  Class will generally end a bit early, so you should have that opportunity to ask us questions!

Nom nom nom… more data for your consumption

September 2, 2011

Here are a few more sources of data that might be interesting for you:

http://wiki.gephi.org/index.php?title=Datasets — right off of the Gephi download page, it has a fair amount of overlap with Newman’s collection of networks but also includes some cool ones such as a graph of Github users.

http://snap.stanford.edu/data/index.html — these data are probably not very good for this assignment, since they are quite large.  However, if you’re interested in large-scale data, this is the place to find it.

If you find other data sources, please share them with the class in the comments section!


Follow

Get every new post delivered to your Inbox.