<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dave Lester's Finding America &#187; Data-mining</title>
	<atom:link href="http://blog.davelester.org/tag/data-mining/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.davelester.org</link>
	<description>American Studies, Digital Humanities, Public History, and all that's in between (or not)</description>
	<lastBuildDate>Tue, 01 Jun 2010 04:10:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Approaches to Academic Blog Directories</title>
		<link>http://blog.davelester.org/2007/08/19/approaches-to-academic-blog-directories/</link>
		<comments>http://blog.davelester.org/2007/08/19/approaches-to-academic-blog-directories/#comments</comments>
		<pubDate>Sun, 19 Aug 2007 21:20:00 +0000</pubDate>
		<dc:creator>Dave Lester</dc:creator>
				<category><![CDATA[American Studies]]></category>
		<category><![CDATA[Crossroads Project]]></category>
		<category><![CDATA[Data-mining]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[History]]></category>

		<guid isPermaLink="false">http://www.davelester.org/2007/08/19/approaches-to-academic-blog-directories/</guid>
		<description><![CDATA[Following the recent indexing of Cliopatria&#8216;s History Blogroll, it&#8217;s worth offering a side-by-side comparison of two different approaches to academic blog directories. This follows several months of experimentation of approaching my goal to establish an American Studies blog directory as &#8230; <a href="http://blog.davelester.org/2007/08/19/approaches-to-academic-blog-directories/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Following the recent indexing of <a href="http://hnn.us/blogs/2.html">Cliopatria</a>&#8216;s History Blogroll, it&#8217;s worth offering a side-by-side comparison of two different approaches to academic blog directories. This follows several months of experimentation of approaching my goal to establish an American Studies blog directory as part of the <a href="http://crossroads.georgetown.edu">Crossroads Project</a>.  The two fundamental differences between the directories I&#8217;ve seen deal with categorization and aggregation.  My purpose isn&#8217;t to criticize any approaches, but spur discussion on how to measure authority and organize the content of academic blogs.</p>
<p><strong>Museum Blogs.org</strong><br />
<a href="http://museumblogs.org/">http://museumblogs.org/</a><br />
Despite its &#8220;forever beta&#8221; tagline that&#8217;s suspiciously similar to <a href="http://www.clioweb.org">Clioweb</a>&#8216;s &#8220;history is a perpetual beta&#8221;, Museum Blogs is the best academic blogging directory I&#8217;ve seen.  The site topically categorizes museum blogs, and aggregates them into one large feed on their homepage.  What&#8217;s interesting is how they use &#8220;authority&#8221; to filter results &#8211; blogs with more authority become more visible.  Authority is determined based upon how many people link to the blog, which is likely an outgrowth of using <a href="http://google.com/coop/cse/">Google&#8217;s custom search</a>.  Anyone can create a Google custom search for free â€“ allowing them to search the text of specified websites, a terrific tool that&#8217;s easy to use when creating a blog directory.  Several of my readers may want to consider adding their blogs to the directory.<br />
<strong><br />
Cliopatria&#8217;s History Blogroll</strong><br />
<a href="http://hnn.us/blogs/entries/9665.html">http://hnn.us/blogs/entries/9665.html</a><br />
I was pleased to see myself included in the Blogroll, and appreciate the indexing work of Jonathan Dresner.  My first observation was that my blog is listed under United States History &#8211; ok.  True, my background is in American Studies, but my own blog often deviates from US history, dealing more with the digital humanities and ludology among other things.  It&#8217;s obvious that Jonathan was aware of these limitations when indexing it in the first place, writing:</p>
<blockquote><p>Categories are an abstraction. Many blogs do not categorize well. We&#8217;ve done the best we can. Neither category, order or position are intended as value or quality judgements.</p></blockquote>
<p>Despite the limitations of abstraction, I&#8217;ve found the blogroll to be an incredible resource &#8211; finding many terrific history blogs just this afternoon.  Authority is decided by whoever created the blogroll, however when users have left comments pointing to their individual blogs, they&#8217;ve been included in the blogroll as well.  Individual posts haven&#8217;t been aggregated into one feed, and users must visit each individual blog to read their contents.</p>
<p><strong>The Crossroads Project Blog Directory</strong><br />
I&#8217;ve been working on creating an American Studies blog directory for <a href="http://crossroads.georgetown.edu">the Crossroads Project</a> that combines the better parts of both the Cliopatria History Blogroll, and MuseumBlogs.  Given the wide-range of topics covered within the discipline, it  requires a comprehensive solution to make it usable.  I&#8217;ve been working to integrate this blog directory into the <a href="http://lamp.georgetown.edu/asw/">American Studies Web search engine</a> I created last winter as well.  Here&#8217;s the solution I&#8217;ve come up with:</p>
<p>Google&#8217;s custom search is incredibly powerful, allowing you to search the contents of each page/site indexed.  My hope is to integrate this into American Studies Web, so when a blog is added to the directory, it&#8217;s also made entirely searchable.  In addition, blogs will be topically tagged, so they can be included in more than one narrow categorization.  I&#8217;d also like to create a master feed for each tag, where you could read all American Studies blogs tagged as &#8220;gender studies&#8221; or &#8220;material culture.&#8221;  These are all reasonable and relatively simple additions to make.</p>
<p>A step beyond this integration would be to categorize each individual post, based upon upon the contents of each.  You could use the tags associated with each post, however bloggers are inconsistent about what tags they use, and if they tag their entries at all.  To some degree this necessity is diminished by the Google Custom Search.  If anyone can offer any new ideas on how to approach this, I&#8217;d love to hear.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.davelester.org/2007/08/19/approaches-to-academic-blog-directories/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Your &#8216;first life&#8217; on the net</title>
		<link>http://blog.davelester.org/2007/04/26/your-life-on-the-net/</link>
		<comments>http://blog.davelester.org/2007/04/26/your-life-on-the-net/#comments</comments>
		<pubDate>Fri, 27 Apr 2007 04:21:33 +0000</pubDate>
		<dc:creator>Dave Lester</dc:creator>
				<category><![CDATA[Data-mining]]></category>
		<category><![CDATA[Website Showcase]]></category>

		<guid isPermaLink="false">http://www.davelester.org/2007/04/26/your-life-on-the-net/</guid>
		<description><![CDATA[Evolution is a data-mining tool that searches archived information including cached websites, DNS records, phone records and IP email addresses. This information has been on the net for a long time, however the goal of Evolution is to map this &#8230; <a href="http://blog.davelester.org/2007/04/26/your-life-on-the-net/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.paterva.com/evolution.html">Evolution</a> is a data-mining tool that searches archived information including cached websites, DNS records, phone records and IP email addresses.  This information has been on the net for a long time, however the goal of Evolution is to map this information together and visualize a person&#8217;s online activity.  The web version is in beta and client not yet downloadable, but this project has some startling possibilities that have me concerned about privacy.</p>
<p>Searching someone&#8217;s name, you can see that they own a domain name. A whois search will give that person&#8217;s home address, which can then be used to search for their home phone number.  Searching their email address, you may find out they use a certain alias on several websites, where you may uncover other accounts and websites they use to find out where they work, or who they&#8217;re dating.  If they have a myspace or linkedin page, those have been cached as well &#8211; don&#8217;t forget their blog too!  You don&#8217;t need Evolution to do these things, but it promises to make this all possible with one click.<br />
<span id="more-35"></span><br />
Searching myself on Evolution, I was confused with the other Dave Lester&#8217;s in the world.  (YES &#8211; there are many of us)  For the record, I am not a musician from Portland, an environmentalist from Michigan, nor do I have a beard and ride motorcycles.  How can I differentiate myself from the other Dave Lesters online so this doesn&#8217;t happen?</p>
<p><a href="http://www.claimid.org">ClaimID</a> is an easy place to start &#8211; it allows you to &#8220;claim&#8221; online content and associate it with your OpenID username.  When your ClaimID page is saved by Google or a tool like Evolution, it&#8217;s clear what content is associated with you.  Another approach is to monopolize every username (with your name) on the face of the Internet.  My recent acquisition of davelester.ORG was an attempt to do this &#8211; plus I&#8217;m not selling anything so davelester.COM wasnâ€™t really appropriate.</p>
<p>Do you use <a href="http://www.twitter.com">twitter</a>?  You may want to make that RSS feed &#8220;friends only&#8221; unless you want a running tally of your daily activities associated with your home address, phone number and where you work.  This information is currently not protected.  I&#8217;ve been very careful about my blogging &#8211; mindful of the fact that I may have people reading this in 20 years looking back (or not, that&#8217;s quite presumptuous).  My concern is that these data-mining tools could be used to do instant background checks on anyone.</p>
<p>Projecting into the future, my own hunch is that the ability to archive all this data about individuals&#8217; lives, and increasingly sophisticated and publicly available data-mining tools like Evolution that emerge will start to have a large impact in the not so distant future.  There is so much drama concerning the personal lives of politicians &#8211; could you imagine reading George Bush&#8217;s blog from college or seeing the photos on his myspace page?  A friend jokingly remarked that this just means that in 30 years our President will be a farmer from the mid-west who didn&#8217;t grow up with the Internet; perhaps he&#8217;s right.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.davelester.org/2007/04/26/your-life-on-the-net/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
