blog.notmyidea.org/content/code/2011-01-20-pypioncouch.md
Alexis Métaireau 9df3b183b6 Enhanced UI & UX; Added New ISBN Plugin.
- Added the ability to display book cover for the category "Lectures" if ISBN cover is available.
- Moved author's name into a small tag for better hierarchy and readability.
- Implemented a feature to indicate link sizes depending on the number of articles associated with a given tag.
- Implemented a mini footer element displaying an RSS feed icon.
- Improved category display using description dictionary.
- Added a new plugin "isbn_downloader" to fetch ISBN information when needed.
- Included the count of articles for each category.
- Implemented changes for better layout and readability of tags and categories.
- Adjusted the layout of the webpage, improving the overall look of the page.
- Included "requests" in the requirements.txt for supplanting dependencies required by the new plugin and/or features.
2023-09-29 18:30:09 +02:00

105 lines
3.9 KiB
Markdown

# PyPI on CouchDB
By now, there are two ways to retrieve data from PyPI (the Python
Package Index). You can both rely on xml/rpc or on the "simple" API. The
simple API is not so simple to use as the name suggest, and have several
existing drawbacks.
Basically, if you want to use informations coming from the simple API,
you will have to parse web pages manually, to extract informations using
some black vodoo magic. Badly, magic have a price, and it's sometimes
impossible to get exactly the informations you want to get from this
index. That's the technique currently being used by distutils2,
setuptools and pip.
On the other side, while XML/RPC is working fine, it's requiring extra
work to the python servers each time you request something, which can
lead to some outages from time to time. Also, it's important to point
out that, even if PyPI have a mirroring infrastructure, it's only for
the so-called *simple* API, and not for the XML/RPC.
## CouchDB
Here comes CouchDB. CouchDB is a document oriented database, that knows
how to speak REST and JSON. It's easy to use, and provides out of the
box a replication mechanism.
## So, what ?
Hmm, I'm sure you got it. I've wrote a piece of software to link
informations from PyPI to a CouchDB instance. Then you can replicate all
the PyPI index with only one HTTP request on the CouchDB server. You can
also access the informations from the index directly using a REST API,
speaking json. Handy.
So PyPIonCouch is using the PyPI XML/RPC API to get data from PyPI, and
generate records in the CouchDB instance.
The final goal is to avoid to rely on this "simple" API, and rely on a
REST insterface instead. I have set up a couchdb server on my server,
which is available at
<http://couchdb.notmyidea.org/_utils/database.html?pypi>.
There is not a lot to see there for now, but I've done the first import
from PyPI yesterday and all went fine: it's possible to access the
metadata of all PyPI projects via a REST interface. Next step is to
write a client for this REST interface in distutils2.
## Example
For now, you can use pypioncouch via the command line, or via the python
API.
### Using the command line
You can do something like that for a full import. This **will** take
long, because it's fetching all the projects at pypi and importing their
metadata:
$ pypioncouch --fullimport http://your.couchdb.instance/
If you already have the data on your couchdb instance, you can just
update it with the last informations from pypi. **However, I recommend
to just replicate the principal node, hosted at
<http://couchdb.notmyidea.org/pypi/>**, to avoid the duplication of
nodes:
$ pypioncouch --update http://your.couchdb.instance/
The principal node is updated once a day by now, I'll try to see if it's
enough, and ajust with the time.
### Using the python API
You can also use the python API to interact with pypioncouch:
>>> from pypioncouch import XmlRpcImporter, import_all, update
>>> full_import()
>>> update()
## What's next ?
I want to make a couchapp, in order to navigate PyPI easily. Here are
some of the features I want to propose:
- List all the available projects
- List all the projects, filtered by specifiers
- List all the projects by author/maintainer
- List all the projects by keywords
- Page for each project.
- Provide a PyPI "Simple" API equivalent, even if I want to replace
it, I do think it will be really easy to setup mirrors that way,
with the out of the box couchdb replication
I also still need to polish the import mechanism, so I can directly
store in couchdb:
- The OPML files for each project
- The upload\_time as couchdb friendly format (list of int)
- The tags as lists (currently it's only a string separated by spaces
The work I've done by now is available on
<https://bitbucket.org/ametaireau/pypioncouch/>. Keep in mind that it's
still a work in progress, and everything can break at any time. However,
any feedback will be appreciated \!