<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>15-396 Science of the Web</title>
	<atom:link href="http://scienceoftheweb.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://scienceoftheweb.wordpress.com</link>
	<description>Math, algorithms, and a hint of social networks.</description>
	<lastBuildDate>Wed, 07 Dec 2011 22:53:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='scienceoftheweb.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>15-396 Science of the Web</title>
		<link>http://scienceoftheweb.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://scienceoftheweb.wordpress.com/osd.xml" title="15-396 Science of the Web" />
	<atom:link rel='hub' href='http://scienceoftheweb.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Test 2 solutions posted, notes about the final</title>
		<link>http://scienceoftheweb.wordpress.com/2011/12/07/test-2-solutions-posted-notes-about-the-final/</link>
		<comments>http://scienceoftheweb.wordpress.com/2011/12/07/test-2-solutions-posted-notes-about-the-final/#comments</comments>
		<pubDate>Wed, 07 Dec 2011 22:53:35 +0000</pubDate>
		<dc:creator>bmeeder</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scienceoftheweb.wordpress.com/?p=73</guid>
		<description><![CDATA[I posted the second exam and some solutions.  Feel free to look over them before the exam.  Remember that the exam will not cover Networks I or Natural Language Processing.  Don&#8217;t worry too much about the details you saw on HW 4, especially related to the HITS algorithm, appearing on the exam.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=73&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I posted the second exam and some solutions.  Feel free to look over them before the exam.  Remember that the exam will not cover Networks I or Natural Language Processing.  Don&#8217;t worry too much about the details you saw on HW 4, especially related to the HITS algorithm, appearing on the exam.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scienceoftheweb.wordpress.com/73/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scienceoftheweb.wordpress.com/73/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=73&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scienceoftheweb.wordpress.com/2011/12/07/test-2-solutions-posted-notes-about-the-final/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4ede948b7d7fd5857408b0621803dcdb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">bmeeder</media:title>
		</media:content>
	</item>
		<item>
		<title>Homework 6 is out</title>
		<link>http://scienceoftheweb.wordpress.com/2011/12/07/homework-6-is-out/</link>
		<comments>http://scienceoftheweb.wordpress.com/2011/12/07/homework-6-is-out/#comments</comments>
		<pubDate>Wed, 07 Dec 2011 04:28:21 +0000</pubDate>
		<dc:creator>bmeeder</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scienceoftheweb.wordpress.com/?p=70</guid>
		<description><![CDATA[Homework 6 has been posted.  It&#8217;s due via email by December 11th at 11:59 pm (this Sunday, end of day).  Try to give useful comments as it is appreciated and will result in you getting full credit.  You do not have to write an extremely detailed account if you don&#8217;t want to, but a one [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=70&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Homework 6 has been posted.  It&#8217;s due via email by December 11th at 11:59 pm (this Sunday, end of day).  Try to give useful comments as it is appreciated and will result in you getting full credit.  You do not have to write an extremely detailed account if you don&#8217;t want to, but a one sentence response will not get you 50 points</p>
<p>In the lecture slides for today I grayed out the two subjects, Networks I and NLP, that you voted off of the final subject list.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scienceoftheweb.wordpress.com/70/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scienceoftheweb.wordpress.com/70/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=70&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scienceoftheweb.wordpress.com/2011/12/07/homework-6-is-out/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4ede948b7d7fd5857408b0621803dcdb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">bmeeder</media:title>
		</media:content>
	</item>
		<item>
		<title>Handing in your HW</title>
		<link>http://scienceoftheweb.wordpress.com/2011/11/30/handing-in-your-hw/</link>
		<comments>http://scienceoftheweb.wordpress.com/2011/11/30/handing-in-your-hw/#comments</comments>
		<pubDate>Wed, 30 Nov 2011 03:36:57 +0000</pubDate>
		<dc:creator>bmeeder</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scienceoftheweb.wordpress.com/?p=66</guid>
		<description><![CDATA[The solution to problems 1 and 3 should be handed in ( printed out with cover sheet attached)  at the start of class on Thursday.  The .tar.gz or .zip of your files should be sent to my @cs.cmu.edu account with the subject &#8220;Homework 5 Handin&#8221;. I suggest putting in a README file in case something is [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=66&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The solution to problems 1 and 3 should be handed in ( printed out with cover sheet attached)  at the start of class on Thursday.  The .tar.gz or .zip of your files should be sent to my @cs.cmu.edu account with the subject &#8220;Homework 5 Handin&#8221;. I suggest putting in a README file in case something is wrong, and it would be in your best interest to describe at a high level what your code is doing so that I can better assign partial credit!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scienceoftheweb.wordpress.com/66/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scienceoftheweb.wordpress.com/66/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=66&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scienceoftheweb.wordpress.com/2011/11/30/handing-in-your-hw/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4ede948b7d7fd5857408b0621803dcdb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">bmeeder</media:title>
		</media:content>
	</item>
		<item>
		<title>Clarifying comments on memory usage, and a hint!</title>
		<link>http://scienceoftheweb.wordpress.com/2011/11/30/clarifying-comments-on-memory-usage-and-a-hint/</link>
		<comments>http://scienceoftheweb.wordpress.com/2011/11/30/clarifying-comments-on-memory-usage-and-a-hint/#comments</comments>
		<pubDate>Wed, 30 Nov 2011 03:32:18 +0000</pubDate>
		<dc:creator>bmeeder</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scienceoftheweb.wordpress.com/?p=64</guid>
		<description><![CDATA[I&#8217;ve gotten some questions about what is an appropriate amount of memory to use when writing your map reduce tasks.  My previous post guides you towards writing a solution that is more in the spirit of a streaming algorithm, and less in the spirit of a traditional algorithm in which data is stored in RAM. [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=64&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve gotten some questions about what is an appropriate amount of memory to use when writing your map reduce tasks.  My previous post guides you towards writing a solution that is more in the spirit of a streaming algorithm, and less in the spirit of a traditional algorithm in which data is stored in RAM.  For example, storing the whole social graph of Twitter, FB, etc. in RAM is not feasible.  However, that doesn&#8217;t mean you can&#8217;t use some reasonable amount of memory.  Even JBiebs and Lady Gaga have a number of followers and friends on Twitter that could be stored in mapper or the reducer.  In this case, it&#8217;d be OK to write a reduce task that takes in all of Lady Gaga&#8217;s followers and do something interesting with it (such as compute the average account age of her followers).</p>
<p>With that said, here&#8217;s a bit of a hint:  Suppose I gave you two files, one which had user IDs and another which is a directed graph, and I ask you to output the subgraph induced by those vertices (edges for which both endpoints are in the set).  We&#8217;ll do this in two steps: The first will filter out edges (u,v) for which u isn&#8217;t in the set.  Then we&#8217;ll filter out the rest of the edges.  We do so as follows:</p>
<p>Mapper 1:</p>
<p>If the line is &#8220;user U&#8221;, we output &#8220;U found&#8221; and &#8220;user U&#8221;, where U is some number.  If the line is &#8220;edge U V&#8221;, we output &#8220;U V&#8221;</p>
<p>Reducer 1:</p>
<p>For a given key, we are going to store all of the values that we see for that key (if it&#8217;s numeric), otherwise we&#8217;ll just echo  the line.  We can keep this collection of values in a set as it will be small relative to the amount of data in the file.  After we read in all of the (U, value) pairs for a given U, we see if &#8216;found&#8217; is in set of values.  If so, we write out &#8220;edge U V&#8221; for every V != &#8220;found&#8221; in the set of values for the key U.  Again, if the reducer sees a line of the form &#8220;user U&#8221;, we simply output &#8220;user U&#8221;.</p>
<p>Think about what we&#8217;ve accomplished at this point, and how you can finish the task.  It&#8217;s ok to take a given line and do things such as repeat it, output it in different ways, pack it together with some other data to make a more complex key, etc.</p>
<p>Hopefully this leads you toward a solution in which you store a relatively small amount of data! <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scienceoftheweb.wordpress.com/64/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scienceoftheweb.wordpress.com/64/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=64&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scienceoftheweb.wordpress.com/2011/11/30/clarifying-comments-on-memory-usage-and-a-hint/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4ede948b7d7fd5857408b0621803dcdb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">bmeeder</media:title>
		</media:content>
	</item>
		<item>
		<title>A note about memory requirements and efficiency</title>
		<link>http://scienceoftheweb.wordpress.com/2011/11/28/a-note-about-memory-requirements-and-efficiency/</link>
		<comments>http://scienceoftheweb.wordpress.com/2011/11/28/a-note-about-memory-requirements-and-efficiency/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 06:12:13 +0000</pubDate>
		<dc:creator>bmeeder</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scienceoftheweb.wordpress.com/?p=62</guid>
		<description><![CDATA[Something that a few of you have noticed is a comment in the word count reducer that says you wouldn&#8217;t actually want to do things the way that I have.  The reason is that you shouldn&#8217;t make assumptions about the amount of data any particular reduce (or map) task is going to see.  If you [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=62&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Something that a few of you have noticed is a comment in the word count reducer that says you wouldn&#8217;t actually want to do things the way that I have.  The reason is that you shouldn&#8217;t make assumptions about the amount of data any particular reduce (or map) task is going to see.  If you have to keep state for potentially every line that you see, that&#8217;s A Bad Thing.  In particular, you really want to keep the amount of memory that you use to a minimum.  It&#8217;s much better to write a two-phase mapreduce algorithm in which every mapper and reducer uses a bounded amount of memory over one which requires only a single pass but for which the amount of required memory could be proportional to the input size.</p>
<p>You should write programs that require a small amount of memory.  Most of the time if you are storing some data into a dictionary where keys are the MR keys and all you do is increment some small set of values, you&#8217;re doing it &#8216;wrong.&#8217;  Yes, what you wrote is correct, but it will fail if I run your code on a file with billions of distinct keys.  The moral of the story is to take advantage of the fact that you see all of the (key,value) pairs for a given key contiguously.</p>
<p>With regard to efficiency, we are most concerned about memory efficiency.  In the real-world you want to be conscious of how much intermediate data you are writing to disk- all of those mapper output (key,value) pairs get written somewhere, sent over the network, and sorted somewhere else.  Often times we think about the complexity of a MR algorithm as the number of phases required and how much intermediate data gets produced.</p>
<p>Finally, the way that we&#8217;re faking the MR framework potentially enforces stronger conditions on the (key,value) ordering.  If you do mapper -&gt; sort -&gt; reducer, the values are going also be ordered.  I&#8217;m not exactly sure of the behavior of the shuffle task in Hadoop, so just be careful about writing code which requires that the values for a given key appear in some particular order.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scienceoftheweb.wordpress.com/62/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scienceoftheweb.wordpress.com/62/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=62&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scienceoftheweb.wordpress.com/2011/11/28/a-note-about-memory-requirements-and-efficiency/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4ede948b7d7fd5857408b0621803dcdb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">bmeeder</media:title>
		</media:content>
	</item>
		<item>
		<title>Homework 5 Sample Data</title>
		<link>http://scienceoftheweb.wordpress.com/2011/11/25/homework-5-sample-data/</link>
		<comments>http://scienceoftheweb.wordpress.com/2011/11/25/homework-5-sample-data/#comments</comments>
		<pubDate>Fri, 25 Nov 2011 19:36:37 +0000</pubDate>
		<dc:creator>bmeeder</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scienceoftheweb.wordpress.com/?p=59</guid>
		<description><![CDATA[I put up some small sample data that&#8217;s simple enough you should be able to debug it by hand.  Check out the directory http://www.scienceoftheweb.org/15-396/assignments_f11/hwk5_files/ for the files. small_graph1.txt &#8211; A small graph with non-consecutive vertex IDs.  The degree count file is small_graph1_degree_dist.txt. The other graph has two complete, directed graphs with components {1,2,3,4} and {10,11,12}.  There&#8217;s a [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=59&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I put up some small sample data that&#8217;s simple enough you should be able to debug it by hand.  Check out the directory http://www.scienceoftheweb.org/15-396/assignments_f11/hwk5_files/ for the files.</p>
<p>small_graph1.txt &#8211; A small graph with non-consecutive vertex IDs.  The degree count file is small_graph1_degree_dist.txt.</p>
<p>The other graph has two complete, directed graphs with components {1,2,3,4} and {10,11,12}.  There&#8217;s a hashtag file in which one HT is used by a user not in the edge list, one HT is used by two users in different components, and the other two HTs are used by some subset of users within each component.  Make sure that you print out the empty edge list (&#8220;#HT edges &#8220;) if there aren&#8217;t any edges in the graph between any two users who used a particular HT.  This is illustrated in the output file.</p>
<p>&nbsp;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scienceoftheweb.wordpress.com/59/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scienceoftheweb.wordpress.com/59/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=59&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scienceoftheweb.wordpress.com/2011/11/25/homework-5-sample-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4ede948b7d7fd5857408b0621803dcdb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">bmeeder</media:title>
		</media:content>
	</item>
		<item>
		<title>Homework 5 is out</title>
		<link>http://scienceoftheweb.wordpress.com/2011/11/19/homework-5-is-out/</link>
		<comments>http://scienceoftheweb.wordpress.com/2011/11/19/homework-5-is-out/#comments</comments>
		<pubDate>Sat, 19 Nov 2011 00:57:50 +0000</pubDate>
		<dc:creator>bmeeder</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scienceoftheweb.wordpress.com/?p=57</guid>
		<description><![CDATA[I just posted homework 5.  It involves programming in Python, so if you&#8217;ve never programmed in Python before PLEASE get in touch with me.  You have a bit less than two weeks to complete the assignment, and there&#8217;s hopefully an enticing bonus project for you to do. Please check out the homework over the weekend [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=57&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I just posted homework 5.  It involves programming in Python, so if you&#8217;ve never programmed in Python before PLEASE get in touch with me.  You have a bit less than two weeks to complete the assignment, and there&#8217;s hopefully an enticing bonus project for you to do.</p>
<p>Please check out the homework over the weekend and come to class on Tuesday with questions.  I haven&#8217;t yet posted test data for the Map-Reduce problem and will do so as soon as I prepare a nice data set for you to look at!  </p>
<p>tl;dr- Read through the assignment.  Start thinking about the problems.  Come to class with questions.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scienceoftheweb.wordpress.com/57/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scienceoftheweb.wordpress.com/57/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=57&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scienceoftheweb.wordpress.com/2011/11/19/homework-5-is-out/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4ede948b7d7fd5857408b0621803dcdb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">bmeeder</media:title>
		</media:content>
	</item>
		<item>
		<title>Homework 4 is out!</title>
		<link>http://scienceoftheweb.wordpress.com/2011/11/04/homework-4-is-out/</link>
		<comments>http://scienceoftheweb.wordpress.com/2011/11/04/homework-4-is-out/#comments</comments>
		<pubDate>Fri, 04 Nov 2011 15:44:03 +0000</pubDate>
		<dc:creator>bmeeder</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scienceoftheweb.wordpress.com/?p=54</guid>
		<description><![CDATA[Homework 4 is out.  It should be pretty short and sweet and is due on November 15.  Take a look at it over the weekend and come to us with questions next week!  We should also have your tests graded by Tuesday.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=54&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Homework 4 is out.  It should be pretty short and sweet and is due on November 15.  Take a look at it over the weekend and come to us with questions next week!  We should also have your tests graded by Tuesday.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scienceoftheweb.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scienceoftheweb.wordpress.com/54/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=54&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scienceoftheweb.wordpress.com/2011/11/04/homework-4-is-out/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4ede948b7d7fd5857408b0621803dcdb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">bmeeder</media:title>
		</media:content>
	</item>
		<item>
		<title>All Materials Updated + Notes on Recommendation Systems</title>
		<link>http://scienceoftheweb.wordpress.com/2011/10/27/all-materials-updated-notes-on-recommendation-systems/</link>
		<comments>http://scienceoftheweb.wordpress.com/2011/10/27/all-materials-updated-notes-on-recommendation-systems/#comments</comments>
		<pubDate>Thu, 27 Oct 2011 23:14:25 +0000</pubDate>
		<dc:creator>bmeeder</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scienceoftheweb.wordpress.com/?p=52</guid>
		<description><![CDATA[All of the lectures have been posted as well as solutions to HWs 2 and 3. http://public.research.att.com/~volinsky/netflix/sigkddexp.pdf has a nice discussion about the Netflix challenge.  The &#8216;rematch&#8217; was canceled over privacy concerns, but the first one was completed and the $1M reward was given out. This paper details a method by which using both the Netflix Challenge [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=52&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>All of the lectures have been posted as well as solutions to HWs 2 and 3.</p>
<p><a href="http://public.research.att.com/~volinsky/netflix/sigkddexp.pdf" target="_blank">http://public.research.att.com/~volinsky/netflix/sigkddexp.pdf</a> has a nice discussion about the Netflix challenge.  The &#8216;rematch&#8217; was canceled over privacy concerns, but the first one was completed and the $1M reward was given out.</p>
<p><a title="This link" href="http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf" target="_blank">This paper</a> details a method by which using both the Netflix Challenge dataset and reviews from IMDB they were able to figure out the identities of some of the users!</p>
<p>Finally, you can look at the distribution of certain movie rentals in various cities on <a href="http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html" target="_blank">this NYT interactive page</a>.  As always, feel free to email the staff any questions that you have about the course content!</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scienceoftheweb.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scienceoftheweb.wordpress.com/52/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=52&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scienceoftheweb.wordpress.com/2011/10/27/all-materials-updated-notes-on-recommendation-systems/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4ede948b7d7fd5857408b0621803dcdb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">bmeeder</media:title>
		</media:content>
	</item>
		<item>
		<title>A Note on Web Crawlers</title>
		<link>http://scienceoftheweb.wordpress.com/2011/10/18/a-note-on-web-crawlers/</link>
		<comments>http://scienceoftheweb.wordpress.com/2011/10/18/a-note-on-web-crawlers/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 20:41:35 +0000</pubDate>
		<dc:creator>bmeeder</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scienceoftheweb.wordpress.com/?p=49</guid>
		<description><![CDATA[Here are a few links that touch on a few of the ideas or topics we discussed today: Check out http://www.drunkmenworkhere.org/219 to see how different search engines crawl the &#8216;webspider tree of death&#8217;. Webrings: http://dir.webring.org/rw  (http://www.webring.org/hub/drtvgw for example) Directories from the past: http://dir.yahoo.com/ &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=49&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Here are a few links that touch on a few of the ideas or topics we discussed today:</p>
<p>Check out <a href="http://www.drunkmenworkhere.org/219">http://www.drunkmenworkhere.org/219</a> to see how different search engines crawl the &#8216;webspider tree of death&#8217;.</p>
<p>Webrings: <a href="http://dir.webring.org/rw">http://dir.webring.org/rw</a>  (<a href="http://www.webring.org/hub/drtvgw">http://www.webring.org/hub/drtvgw</a> for example)</p>
<p>Directories from the past: <a href="http://dir.yahoo.com/">http://dir.yahoo.com/</a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scienceoftheweb.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scienceoftheweb.wordpress.com/49/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=scienceoftheweb.wordpress.com&#038;blog=26693627&#038;post=49&#038;subd=scienceoftheweb&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://scienceoftheweb.wordpress.com/2011/10/18/a-note-on-web-crawlers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4ede948b7d7fd5857408b0621803dcdb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">bmeeder</media:title>
		</media:content>
	</item>
	</channel>
</rss>
