mirror of
https://github.com/almet/notmyidea.git
synced 2025-04-28 19:42:37 +02:00
273 lines
No EOL
12 KiB
HTML
273 lines
No EOL
12 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
<head>
|
|
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
|
<meta http-equiv="content-type" content="text/html; charset=utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">
|
|
<link rel="shortcut icon" type="image/x-icon" href="favicon.ico" />
|
|
|
|
<title>Introducing the distutils2 index crawlers - Alexis - Carnets en ligne</title>
|
|
|
|
<meta charset="utf-8" />
|
|
<link href="https://blog.notmyidea.org/feeds/all.atom.xml" type="application/atom+xml" rel="alternate" title="Alexis - Carnets en ligne Full Atom Feed" />
|
|
<link rel="stylesheet" href="https://blog.notmyidea.org/theme/css/poole.css"/>
|
|
<link rel="stylesheet" href="https://blog.notmyidea.org/theme/css/syntax.css"/>
|
|
<link rel="stylesheet" href="https://blog.notmyidea.org/theme/css/lanyon.css"/>
|
|
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=PT+Serif:400,400italic,700%7CPT+Sans:400">
|
|
<link rel="stylesheet" href="https://blog.notmyidea.org/theme/css/styles.css"/>
|
|
|
|
|
|
|
|
<style>
|
|
|
|
h1 {
|
|
font-family: "Avant Garde", Avantgarde, "Century Gothic", CenturyGothic, "AppleGothic", sans-serif;
|
|
padding: 80px 50px;
|
|
text-align: center;
|
|
text-transform: uppercase;
|
|
text-rendering: optimizeLegibility;
|
|
color: #202020;
|
|
letter-spacing: .1em;
|
|
text-shadow:
|
|
-1px -1px 1px #111,
|
|
2px 2px 1px #eaeaea;
|
|
}
|
|
|
|
#main {
|
|
text-align: justify;
|
|
text-justify: inter-word;
|
|
}
|
|
#main h1 {
|
|
padding: 10px;
|
|
}
|
|
|
|
.post-headline {
|
|
padding: 15px;
|
|
text-align: center;
|
|
}
|
|
</style>
|
|
</head>
|
|
|
|
<body>
|
|
<!-- Target for toggling the sidebar `.sidebar-checkbox` is for regular
|
|
styles, `#sidebar-checkbox` for behavior. -->
|
|
<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox">
|
|
<!-- Toggleable sidebar -->
|
|
<div class="sidebar" id="sidebar">
|
|
<div class="sidebar-item">
|
|
<div class="profile">
|
|
<img src="https://blog.notmyidea.org/theme/img/profile.png"/>
|
|
</div>
|
|
</div>
|
|
|
|
<nav class="sidebar-nav">
|
|
<a class="sidebar-nav-item" href="/">Articles</a>
|
|
|
|
<a class="sidebar-nav-item" href="https://www.vieuxsinge.com">Brasserie du Vieux Singe</a>
|
|
<a class="sidebar-nav-item" href="http://blog.notmyidea.org/pages/about.html">A propos</a>
|
|
<a class="sidebar-nav-item" href="https://twitter.com/ametaireau">Messages courts</a>
|
|
<a class="sidebar-nav-item" href="https://github.com/almet">Code</a>
|
|
</nav>
|
|
</div> <div class="wrap">
|
|
<div class="masthead">
|
|
<div class="container">
|
|
<h3 class="masthead-title">
|
|
<a href="https://blog.notmyidea.org/" title="Home">Alexis - Carnets en ligne</a>
|
|
</h3>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="container content">
|
|
<div id="main" class="posts">
|
|
<h1 class="post-title">Introducing the distutils2 index crawlers</h1>
|
|
|
|
<span class="post-date">
|
|
06 juillet 2010, dans <a class="no-color" href="category/technologie.html">Technologie</a>
|
|
</span>
|
|
<img id="illustration" class="illustration-Technologie" src="" />
|
|
|
|
<div class="post article">
|
|
<div id="toc_container">
|
|
<div class="toc">
|
|
<ul>
|
|
<li><a href="#introducing-the-distutils2-index-crawlers">Introducing the distutils2 index crawlers</a><ul>
|
|
<li><a href="#general-feelings">General feelings</a></li>
|
|
<li><a href="#tasks">Tasks</a></li>
|
|
<li><a href="#pypi-index-crawling">PyPI index crawling</a><ul>
|
|
<li><a href="#using-the-simple-api">Using the simple API</a></li>
|
|
<li><a href="#using-xml-rpc">Using xml-rpc</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#processes">Processes</a></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
|
|
</div>
|
|
<h1>🌟</h1>
|
|
|
|
<p>I'm working for about a month for distutils2, even if I was being a bit
|
|
busy (as I had some class courses and exams to work on)</p>
|
|
<p>I'll try do sum-up my general feelings here, and the work I've made so
|
|
far. You can also find, if you're interested, my weekly summaries in <a href="http://wiki.notmyidea.org/distutils2_schedule">a
|
|
dedicated wiki page</a>.</p>
|
|
<h2 id="general-feelings">General feelings</h2>
|
|
<p>First, and it's a really important point, the GSoC is going very well,
|
|
for me as for other students, at least from my perspective. It's a
|
|
pleasure to work with such enthusiast people, as this make the global
|
|
atmosphere very pleasant to live.</p>
|
|
<p>First of all, I've spent time to read the existing codebase, and to
|
|
understand what we're going to do, and what's the rationale to do so.</p>
|
|
<p>It's really clear for me now: what we're building is the foundations of
|
|
a packaging infrastructure in python. The fact is that many projects
|
|
co-exists, and comes all with their good concepts. Distutils2 tries to
|
|
take the interesting parts of all, and to provide it in the python
|
|
standard libs, respecting the recently written PEP about packaging.</p>
|
|
<p>With distutils2, it will be simpler to make "things" compatible. So if
|
|
you think about a new way to deal with distributions and packaging in
|
|
python, you can use the Distutils2 APIs to do so.</p>
|
|
<h2 id="tasks">Tasks</h2>
|
|
<p>My main task while working on distutils2 is to provide an installation
|
|
and an un-installation command, as described in PEP 376. For this, I
|
|
first need to get informations about the existing distributions (what's
|
|
their version, name, metadata, dependencies, etc.)</p>
|
|
<p>The main index, you probably know and use, is PyPI. You can access it at
|
|
<a href="http://pypi.python.org">http://pypi.python.org</a>.</p>
|
|
<h2 id="pypi-index-crawling">PyPI index crawling</h2>
|
|
<p>There is two ways to get these informations from PyPI: using the simple
|
|
API, or via xml-rpc calls.</p>
|
|
<p>A goal was to use the version specifiers defined
|
|
in<a href="http://www.python.org/dev/peps/pep-0345/">PEP 345</a> and to provides a
|
|
way to sort the grabbed distributions depending our needs, to pick the
|
|
version we want/need.</p>
|
|
<h3 id="using-the-simple-api">Using the simple API</h3>
|
|
<p>The simple API is composed of HTML pages you can access at
|
|
<a href="http://pypi.python.org/simple/">http://pypi.python.org/simple/</a>.</p>
|
|
<p>Distribute and Setuptools already provides a crawler for that, but it
|
|
deals with their internal mechanisms, and I found that the code was not
|
|
so clear as I want, that's why I've preferred to pick up the good ideas,
|
|
and some implementation details, plus re-thinking the global
|
|
architecture.</p>
|
|
<p>The rules are simple: each project have a dedicated page, which allows
|
|
us to get informations about:</p>
|
|
<ul>
|
|
<li>the distribution download locations (for some versions)</li>
|
|
<li>homepage links</li>
|
|
<li>some other useful informations, as the bugtracker address, for
|
|
instance.</li>
|
|
</ul>
|
|
<p>If you want to find all the distributions of the "EggsAndSpam" project,
|
|
you could do the following (do not take so attention to the names here,
|
|
as the API will probably change a bit):</p>
|
|
<p>``` sourceCode python</p>
|
|
<blockquote>
|
|
<blockquote>
|
|
<blockquote>
|
|
<p>index = SimpleIndex()
|
|
index.find("EggsAndSpam")
|
|
[EggsAndSpam 1.1, EggsAndSpam 1.2, EggsAndSpam 1.3]</p>
|
|
</blockquote>
|
|
</blockquote>
|
|
</blockquote>
|
|
<div class="highlight"><pre><span></span><span class="n">We</span> <span class="n">also</span> <span class="n">could</span> <span class="n">use</span> <span class="k">version</span> <span class="n">specifiers</span><span class="p">:</span>
|
|
|
|
<span class="o">```</span> <span class="n">sourceCode</span> <span class="n">python</span>
|
|
<span class="o">>>></span> <span class="k">index</span><span class="p">.</span><span class="n">find</span><span class="p">(</span><span class="ss">"EggsAndSpam (< =1.2)"</span><span class="p">)</span>
|
|
<span class="p">[</span><span class="n">EggsAndSpam</span> <span class="mi">1</span><span class="p">.</span><span class="mi">1</span><span class="p">,</span> <span class="n">EggsAndSpam</span> <span class="mi">1</span><span class="p">.</span><span class="mi">2</span><span class="p">]</span>
|
|
</pre></div>
|
|
|
|
|
|
<p>Internally, what's done here is the following:</p>
|
|
<ul>
|
|
<li>it process the <a href="http://pypi.python.org/simple/FooBar/">http://pypi.python.org/simple/FooBar/</a> page,
|
|
searching for download URLs.</li>
|
|
<li>for each found distribution download URL, it creates an object,
|
|
containing informations about the project name, the version and the
|
|
URL where the archive remains.</li>
|
|
<li>it sort the found distributions, using version numbers. The default
|
|
behavior here is to prefer source distributions (over binary ones),
|
|
and to rely on the last "final" distribution (rather than beta,
|
|
alpha etc. ones)</li>
|
|
</ul>
|
|
<p>So, nothing hard or difficult here.</p>
|
|
<p>We provides a bunch of other features, like relying on the new PyPI
|
|
mirroring infrastructure or filter the found distributions by some
|
|
criterias. If you're curious, please browse the <a href="http://distutils2.notmyidea.org/">distutils2
|
|
documentation</a>.</p>
|
|
<h3 id="using-xml-rpc">Using xml-rpc</h3>
|
|
<p>We also can make some xmlrpc calls to retreive informations from PyPI.
|
|
It's a really more reliable way to get informations from from the index
|
|
(as it's just the index that provides the informations), but cost
|
|
processes on the PyPI distant server.</p>
|
|
<p>For now, this way of querying the xmlrpc client is not available on
|
|
Distutils2, as I'm working on it. The main pieces are already present
|
|
(I'll reuse some work I've made from the SimpleIndex querying, and <a href="http://github.com/ametaireau/pypiclient">some
|
|
code already set up</a>), what I
|
|
need to do is to provide a xml-rpc PyPI mock server, and that's on what
|
|
I'm actually working on.</p>
|
|
<h2 id="processes">Processes</h2>
|
|
<p>For now, I'm trying to follow the "documentation, then test, then code"
|
|
path, and that seems to be really needed while working with a community.
|
|
Code is hard to read/understand, compared to documentation, and it's
|
|
easier to change.</p>
|
|
<p>While writing the simple index crawling work, I must have done this to
|
|
avoid some changes on the API, and some loss of time.</p>
|
|
<p>Also, I've set up <a href="http://wiki.notmyidea.org/distutils2_schedule">a
|
|
schedule</a>, and the goal
|
|
is to be sure everything will be ready in time, for the end of the
|
|
summer. (And now, I need to learn to follow schedules ...)</p>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<label for="sidebar-checkbox" class="sidebar-toggle"></label>
|
|
|
|
<script>
|
|
(function(document) {
|
|
var i = 0;
|
|
// snip empty header rows since markdown can't
|
|
var rows = document.querySelectorAll('tr');
|
|
for(i=0; i<rows.length; i++) {
|
|
var ths = rows[i].querySelectorAll('th');
|
|
var rowlen = rows[i].children.length;
|
|
if (ths.length > 0 && ths.length === rowlen) {
|
|
rows[i].remove();
|
|
}
|
|
}
|
|
})(document);
|
|
</script>
|
|
|
|
<script>
|
|
/* Lanyon & Poole are Copyright (c) 2014 Mark Otto. Adapted to Pelican 20141223 and extended a bit by @thomaswilley */
|
|
(function(document) {
|
|
var toggle = document.querySelector('.sidebar-toggle');
|
|
var sidebar = document.querySelector('#sidebar');
|
|
var checkbox = document.querySelector('#sidebar-checkbox');
|
|
document.addEventListener('click', function(e) {
|
|
var target = e.target;
|
|
if(!checkbox.checked ||
|
|
sidebar.contains(target) ||
|
|
(target === checkbox || target === toggle)) return;
|
|
checkbox.checked = false;
|
|
}, false);
|
|
})(document);
|
|
</script>
|
|
<!-- Piwik -->
|
|
<script type="text/javascript">
|
|
var _paq = _paq || [];
|
|
_paq.push(['trackPageView']);
|
|
_paq.push(['enableLinkTracking']);
|
|
(function() {
|
|
var u="//tracker.notmyidea.org/";
|
|
_paq.push(['setTrackerUrl', u+'piwik.php']);
|
|
_paq.push(['setSiteId', 3]);
|
|
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
|
|
g.type='text/javascript'; g.async=true; g.defer=true; g.src=u+'piwik.js'; s.parentNode.insertBefore(g,s);
|
|
})();
|
|
</script>
|
|
<noscript><p><img src="//tracker.notmyidea.org/piwik.php?idsite=3" style="border:0;" alt="" /></p></noscript>
|
|
<!-- End Piwik Code -->
|
|
</div>
|
|
</body>
|
|
</html> |