mirror of
https://github.com/almet/notmyidea.git
synced 2025-04-28 19:42:37 +02:00
230 lines
No EOL
10 KiB
HTML
230 lines
No EOL
10 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
<head>
|
|
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
|
<meta http-equiv="content-type" content="text/html; charset=utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">
|
|
<link rel="shortcut icon" type="image/x-icon" href="favicon.ico" />
|
|
|
|
<title>PyPI on CouchDB - Alexis - Carnets en ligne</title>
|
|
|
|
<meta charset="utf-8" />
|
|
<link href="https://blog.notmyidea.org/feeds/all.atom.xml" type="application/atom+xml" rel="alternate" title="Alexis - Carnets en ligne Full Atom Feed" />
|
|
<link rel="stylesheet" href="https://blog.notmyidea.org/theme/css/poole.css"/>
|
|
<link rel="stylesheet" href="https://blog.notmyidea.org/theme/css/syntax.css"/>
|
|
<link rel="stylesheet" href="https://blog.notmyidea.org/theme/css/lanyon.css"/>
|
|
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=PT+Serif:400,400italic,700%7CPT+Sans:400">
|
|
<link rel="stylesheet" href="https://blog.notmyidea.org/theme/css/styles.css"/>
|
|
|
|
|
|
|
|
<style>
|
|
|
|
h1 {
|
|
font-family: "Avant Garde", Avantgarde, "Century Gothic", CenturyGothic, "AppleGothic", sans-serif;
|
|
padding: 80px 50px;
|
|
text-align: center;
|
|
text-transform: uppercase;
|
|
text-rendering: optimizeLegibility;
|
|
color: #202020;
|
|
letter-spacing: .1em;
|
|
text-shadow:
|
|
-1px -1px 1px #111,
|
|
2px 2px 1px #eaeaea;
|
|
}
|
|
|
|
#main {
|
|
text-align: justify;
|
|
text-justify: inter-word;
|
|
}
|
|
#main h1 {
|
|
padding: 10px;
|
|
}
|
|
|
|
.post-headline {
|
|
padding: 15px;
|
|
}
|
|
</style>
|
|
</head>
|
|
|
|
<body>
|
|
<!-- Target for toggling the sidebar `.sidebar-checkbox` is for regular
|
|
styles, `#sidebar-checkbox` for behavior. -->
|
|
<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox">
|
|
<!-- Toggleable sidebar -->
|
|
<div class="sidebar" id="sidebar">
|
|
<div class="sidebar-item">
|
|
<div class="profile">
|
|
<img src="https://blog.notmyidea.org/theme/img/profile.png"/>
|
|
</div>
|
|
</div>
|
|
|
|
<nav class="sidebar-nav">
|
|
<a class="sidebar-nav-item" href="/">Articles</a>
|
|
|
|
<a class="sidebar-nav-item" href="https://www.vieuxsinge.com">Brasserie du Vieux Singe</a>
|
|
<a class="sidebar-nav-item" href="http://blog.notmyidea.org/pages/about.html">A propos</a>
|
|
<a class="sidebar-nav-item" href="https://twitter.com/ametaireau">Messages courts</a>
|
|
<a class="sidebar-nav-item" href="https://github.com/almet">Code</a>
|
|
</nav>
|
|
</div> <div class="wrap">
|
|
<div class="masthead">
|
|
<div class="container">
|
|
<h3 class="masthead-title">
|
|
<a href="https://blog.notmyidea.org/" title="Home">Alexis - Carnets en ligne</a>
|
|
</h3>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="container content">
|
|
<div id="main" class="posts">
|
|
<h1 class="post-title">PyPI on CouchDB</h1>
|
|
<span class="post-date">20 janvier 2011, dans <a class="no-color" href="category/technologie.html">Technologie</a></span>
|
|
<img id="illustration" src="" />
|
|
|
|
<div class="post article">
|
|
<h1>🌟</h1>
|
|
|
|
<p>By now, there are two ways to retrieve data from PyPI (the Python
|
|
Package Index). You can both rely on xml/rpc or on the "simple" API. The
|
|
simple API is not so simple to use as the name suggest, and have several
|
|
existing drawbacks.</p>
|
|
<p>Basically, if you want to use informations coming from the simple API,
|
|
you will have to parse web pages manually, to extract informations using
|
|
some black vodoo magic. Badly, magic have a price, and it's sometimes
|
|
impossible to get exactly the informations you want to get from this
|
|
index. That's the technique currently being used by distutils2,
|
|
setuptools and pip.</p>
|
|
<p>On the other side, while XML/RPC is working fine, it's requiring extra
|
|
work to the python servers each time you request something, which can
|
|
lead to some outages from time to time. Also, it's important to point
|
|
out that, even if PyPI have a mirroring infrastructure, it's only for
|
|
the so-called <em>simple</em> API, and not for the XML/RPC.</p>
|
|
<h2 id="couchdb">CouchDB</h2>
|
|
<p>Here comes CouchDB. CouchDB is a document oriented database, that knows
|
|
how to speak REST and JSON. It's easy to use, and provides out of the
|
|
box a replication mechanism.</p>
|
|
<h2 id="so-what">So, what ?</h2>
|
|
<p>Hmm, I'm sure you got it. I've wrote a piece of software to link
|
|
informations from PyPI to a CouchDB instance. Then you can replicate all
|
|
the PyPI index with only one HTTP request on the CouchDB server. You can
|
|
also access the informations from the index directly using a REST API,
|
|
speaking json. Handy.</p>
|
|
<p>So PyPIonCouch is using the PyPI XML/RPC API to get data from PyPI, and
|
|
generate records in the CouchDB instance.</p>
|
|
<p>The final goal is to avoid to rely on this "simple" API, and rely on a
|
|
REST insterface instead. I have set up a couchdb server on my server,
|
|
which is available at
|
|
<a href="http://couchdb.notmyidea.org/_utils/database.html?pypi">http://couchdb.notmyidea.org/_utils/database.html?pypi</a>.</p>
|
|
<p>There is not a lot to see there for now, but I've done the first import
|
|
from PyPI yesterday and all went fine: it's possible to access the
|
|
metadata of all PyPI projects via a REST interface. Next step is to
|
|
write a client for this REST interface in distutils2.</p>
|
|
<h2 id="example">Example</h2>
|
|
<p>For now, you can use pypioncouch via the command line, or via the python
|
|
API.</p>
|
|
<h3 id="using-the-command-line">Using the command line</h3>
|
|
<p>You can do something like that for a full import. This <strong>will</strong> take
|
|
long, because it's fetching all the projects at pypi and importing their
|
|
metadata:</p>
|
|
<div class="highlight"><pre><span></span><span class="err">$</span> <span class="n">pypioncouch</span> <span class="o">--</span><span class="n">fullimport</span> <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">your</span><span class="o">.</span><span class="n">couchdb</span><span class="o">.</span><span class="n">instance</span><span class="o">/</span>
|
|
</pre></div>
|
|
|
|
|
|
<p>If you already have the data on your couchdb instance, you can just
|
|
update it with the last informations from pypi. <strong>However, I recommend
|
|
to just replicate the principal node, hosted at
|
|
<a href="http://couchdb.notmyidea.org/pypi/">http://couchdb.notmyidea.org/pypi/</a></strong>, to avoid the duplication of
|
|
nodes:</p>
|
|
<div class="highlight"><pre><span></span>$ pypioncouch --update http://your.couchdb.instance/
|
|
</pre></div>
|
|
|
|
|
|
<p>The principal node is updated once a day by now, I'll try to see if it's
|
|
enough, and ajust with the time.</p>
|
|
<h3 id="using-the-python-api">Using the python API</h3>
|
|
<p>You can also use the python API to interact with pypioncouch:</p>
|
|
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="kn">from</span> <span class="nn">pypioncouch</span> <span class="kn">import</span> <span class="n">XmlRpcImporter</span><span class="p">,</span> <span class="n">import_all</span><span class="p">,</span> <span class="n">update</span>
|
|
<span class="o">>>></span> <span class="n">full_import</span><span class="p">()</span>
|
|
<span class="o">>>></span> <span class="n">update</span><span class="p">()</span>
|
|
</pre></div>
|
|
|
|
|
|
<h2 id="whats-next">What's next ?</h2>
|
|
<p>I want to make a couchapp, in order to navigate PyPI easily. Here are
|
|
some of the features I want to propose:</p>
|
|
<ul>
|
|
<li>List all the available projects</li>
|
|
<li>List all the projects, filtered by specifiers</li>
|
|
<li>List all the projects by author/maintainer</li>
|
|
<li>List all the projects by keywords</li>
|
|
<li>Page for each project.</li>
|
|
<li>Provide a PyPI "Simple" API equivalent, even if I want to replace
|
|
it, I do think it will be really easy to setup mirrors that way,
|
|
with the out of the box couchdb replication</li>
|
|
</ul>
|
|
<p>I also still need to polish the import mechanism, so I can directly
|
|
store in couchdb:</p>
|
|
<ul>
|
|
<li>The OPML files for each project</li>
|
|
<li>The upload_time as couchdb friendly format (list of int)</li>
|
|
<li>The tags as lists (currently it's only a string separated by spaces</li>
|
|
</ul>
|
|
<p>The work I've done by now is available on
|
|
<a href="https://bitbucket.org/ametaireau/pypioncouch/">https://bitbucket.org/ametaireau/pypioncouch/</a>. Keep in mind that it's
|
|
still a work in progress, and everything can break at any time. However,
|
|
any feedback will be appreciated !</p>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<label for="sidebar-checkbox" class="sidebar-toggle"></label>
|
|
|
|
<script>
|
|
(function(document) {
|
|
var i = 0;
|
|
// snip empty header rows since markdown can't
|
|
var rows = document.querySelectorAll('tr');
|
|
for(i=0; i<rows.length; i++) {
|
|
var ths = rows[i].querySelectorAll('th');
|
|
var rowlen = rows[i].children.length;
|
|
if (ths.length > 0 && ths.length === rowlen) {
|
|
rows[i].remove();
|
|
}
|
|
}
|
|
})(document);
|
|
</script>
|
|
|
|
<script>
|
|
/* Lanyon & Poole are Copyright (c) 2014 Mark Otto. Adapted to Pelican 20141223 and extended a bit by @thomaswilley */
|
|
(function(document) {
|
|
var toggle = document.querySelector('.sidebar-toggle');
|
|
var sidebar = document.querySelector('#sidebar');
|
|
var checkbox = document.querySelector('#sidebar-checkbox');
|
|
document.addEventListener('click', function(e) {
|
|
var target = e.target;
|
|
if(!checkbox.checked ||
|
|
sidebar.contains(target) ||
|
|
(target === checkbox || target === toggle)) return;
|
|
checkbox.checked = false;
|
|
}, false);
|
|
})(document);
|
|
</script>
|
|
<!-- Piwik -->
|
|
<script type="text/javascript">
|
|
var _paq = _paq || [];
|
|
_paq.push(['trackPageView']);
|
|
_paq.push(['enableLinkTracking']);
|
|
(function() {
|
|
var u="//tracker.notmyidea.org/";
|
|
_paq.push(['setTrackerUrl', u+'piwik.php']);
|
|
_paq.push(['setSiteId', 3]);
|
|
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
|
|
g.type='text/javascript'; g.async=true; g.defer=true; g.src=u+'piwik.js'; s.parentNode.insertBefore(g,s);
|
|
})();
|
|
</script>
|
|
<noscript><p><img src="//tracker.notmyidea.org/piwik.php?idsite=3" style="border:0;" alt="" /></p></noscript>
|
|
<!-- End Piwik Code -->
|
|
</div>
|
|
</body>
|
|
</html> |