<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"><link rel="alternate" type="application/rss+xml" title="RSS" href="rss.xml"><link rel="shortcut icon" href="/media/img/effbot.ico"><link rel="stylesheet" href="/media/css/effbot-min.css" type="text/css" media="screen"><link rel="stylesheet" href="/media/css/effbotprint-min.css" type="text/css" media="print"><title>Generating SeeAlso Indexes for the Python Library&nbsp;Reference</title></head><body data-page-id="260"><div id="doc2" class="yui-t2"><div id="hd"></div><div id="bd"><div id="yui-main"><div class="yui-b"><div class="content"><div class="yui-g"><h1 class="maintitle">Generating SeeAlso Indexes for the Python Library&nbsp;Reference</h1></div><div class="yui-ge"><div class="yui-u first"><p class="info">December 10, 2005 | Fredrik Lundh</p><p>Here&#8217;s a simple script that uses <s><a href="http://docs.python.org/modindex.html">the global module index</a></s> (dead link)
to generate <a href="idea-seealso.htm">a seealso index</a> for the
Python library reference.</p><p>Note that it queries python.org for all reference pages, so you
shouldn&#8217;t run it more often than needed.  If you just want a sample
file to play with, you can use <a href="seealso-python-library.xml">
this copy</a>.</p></div><div class="yui-u">&#160;</div></div><div class="yui-g"><p class="wide">Required components (elementtree and elementtidy) can be found at
the <a href="/downloads">effbot.org downloads site</a>.

<pre class="example python wide">
<span class="pykeyword">import</span> os, re, urllib, urlparse

<span class="pykeyword">from</span> elementtidy.TidyHTMLTreeBuilder <span class="pykeyword">import</span> parse
<span class="pykeyword">from</span> elementtree.ElementTree <span class="pykeyword">import</span> dump

URI = <span class="pystring">"http://docs.python.org/modindex.html"</span>

<span class="pycomment">#</span>
<span class="pycomment"># helpers</span>

<span class="pykeyword">class</span> <span class="pyclass">NS</span>:
    <span class="pykeyword">def</span> <span class="pyfunction">__init__</span>(self, uri):
        self.uri = uri
    <span class="pykeyword">def</span> <span class="pyfunction">__getattr__</span>(self, name):
        <span class="pykeyword">return</span> self.uri + name

XHTML = NS(<span class="pystring">"{http://www.w3.org/1999/xhtml}"</span>)

<span class="pykeyword">def</span> <span class="pyfunction">innertext</span>(elem):
    text = elem.text <span class="pykeyword">or</span> <span class="pystring">""</span>
    <span class="pykeyword">for</span> elem <span class="pykeyword">in</span> elem.getiterator():
        <span class="pykeyword">if</span> elem.text: text += elem.text
        <span class="pykeyword">if</span> elem.tail: text += elem.tail
    <span class="pykeyword">return</span> text

<span class="pykeyword">def</span> <span class="pyfunction">gettitle</span>(href, default):
    s = urllib.urlopen(href).read(1024)
    m = re.search(<span class="pystring">"&lt;title&gt;[\d\s.]+(.*)&lt;/title&gt;"</span>, s)
    <span class="pykeyword">if</span> m:
        <span class="pykeyword">return</span> m.group(1)
    <span class="pykeyword">return</span> default

<span class="pycomment">#</span>
<span class="pycomment"># get going</span>

tree = parse(urllib.urlopen(URI))

<span class="pycomment"># the index is in the second table on the page</span>
tables = tree.findall(<span class="pystring">".//"</span> + XHTML.table)

<span class="pykeyword">print</span> <span class="pystring">"&lt;seealso target-domain='python'&gt;"</span>

<span class="pykeyword">print</span> <span class="pystring">" &lt;info xmlns:dc='http://purl.org/dc/elements/1.1/'&gt;"</span>
<span class="pykeyword">print</span> <span class="pystring">"  &lt;dc:title&gt;Python Library Reference&lt;/dc:title&gt;"</span>
<span class="pykeyword">print</span> <span class="pystring">"  &lt;dc:identifier&gt;http://docs.python.org/index.html&lt;/dc:identifier&gt;"</span>
<span class="pykeyword">print</span> <span class="pystring">"  &lt;dc:language&gt;en&lt;/dc:language&gt;"</span>
<span class="pykeyword">print</span> <span class="pystring">" &lt;/info&gt;"</span>

<span class="pykeyword">for</span> a <span class="pykeyword">in</span> tables[1].findall(<span class="pystring">".//"</span> + XHTML.a):
    href = urlparse.urljoin(URI, a.get(<span class="pystring">"href"</span>))
    text = innertext(a)
    target = text.split()[0]
    <span class="pykeyword">print</span> <span class="pystring">" &lt;item href='%s'&gt;"</span> % href
    <span class="pykeyword">print</span> <span class="pystring">"  &lt;title&gt;%s&lt;/title&gt;"</span> % gettitle(href, text)
    <span class="pykeyword">print</span> <span class="pystring">"  &lt;target&gt;%s&lt;/target&gt;"</span> % target
    <span class="pykeyword">print</span> <span class="pystring">" &lt;/item&gt;"</span>

<span class="pykeyword">print</span> <span class="pystring">"&lt;/seealso&gt;"</span></pre></p></div><div class="yui-g"></div></div></div></div><div class="yui-b"><div id='menu'><ul><li><b><a href="/" title="Go to effbot.org.">::: effbot.org</a></b></li><li><b><a href="." title="Go to zone index page.">::: zone :::</a></b></li></ul></div></div></div><div id="ft"><p><a href="http://www.djangoproject.com/"><img src="/media/img/djangosite80x15.gif" border="0" alt="A Django site." title="A Django site." style="vertical-align: bottom;" width="80" height="15" ></a>
rendered by a <a href="http://www.djangoproject.com/">django</a> application.  hosted by <a href="http://www.webfaction.com/shared_hosting?affiliate=slab">webfaction</a>.</p></div></div><script src="/media/js/effbot-min.js" type="text/javascript"></script></body></html>
