mirror of
https://github.com/almet/notmyidea.git
synced 2025-04-28 19:42:37 +02:00
- Added the ability to display book cover for the category "Lectures" if ISBN cover is available. - Moved author's name into a small tag for better hierarchy and readability. - Implemented a feature to indicate link sizes depending on the number of articles associated with a given tag. - Implemented a mini footer element displaying an RSS feed icon. - Improved category display using description dictionary. - Added a new plugin "isbn_downloader" to fetch ISBN information when needed. - Included the count of articles for each category. - Implemented changes for better layout and readability of tags and categories. - Adjusted the layout of the webpage, improving the overall look of the page. - Included "requests" in the requirements.txt for supplanting dependencies required by the new plugin and/or features.
105 lines
3.9 KiB
Markdown
105 lines
3.9 KiB
Markdown
# PyPI on CouchDB
|
|
|
|
|
|
By now, there are two ways to retrieve data from PyPI (the Python
|
|
Package Index). You can both rely on xml/rpc or on the "simple" API. The
|
|
simple API is not so simple to use as the name suggest, and have several
|
|
existing drawbacks.
|
|
|
|
Basically, if you want to use informations coming from the simple API,
|
|
you will have to parse web pages manually, to extract informations using
|
|
some black vodoo magic. Badly, magic have a price, and it's sometimes
|
|
impossible to get exactly the informations you want to get from this
|
|
index. That's the technique currently being used by distutils2,
|
|
setuptools and pip.
|
|
|
|
On the other side, while XML/RPC is working fine, it's requiring extra
|
|
work to the python servers each time you request something, which can
|
|
lead to some outages from time to time. Also, it's important to point
|
|
out that, even if PyPI have a mirroring infrastructure, it's only for
|
|
the so-called *simple* API, and not for the XML/RPC.
|
|
|
|
## CouchDB
|
|
|
|
Here comes CouchDB. CouchDB is a document oriented database, that knows
|
|
how to speak REST and JSON. It's easy to use, and provides out of the
|
|
box a replication mechanism.
|
|
|
|
## So, what ?
|
|
|
|
Hmm, I'm sure you got it. I've wrote a piece of software to link
|
|
informations from PyPI to a CouchDB instance. Then you can replicate all
|
|
the PyPI index with only one HTTP request on the CouchDB server. You can
|
|
also access the informations from the index directly using a REST API,
|
|
speaking json. Handy.
|
|
|
|
So PyPIonCouch is using the PyPI XML/RPC API to get data from PyPI, and
|
|
generate records in the CouchDB instance.
|
|
|
|
The final goal is to avoid to rely on this "simple" API, and rely on a
|
|
REST insterface instead. I have set up a couchdb server on my server,
|
|
which is available at
|
|
<http://couchdb.notmyidea.org/_utils/database.html?pypi>.
|
|
|
|
There is not a lot to see there for now, but I've done the first import
|
|
from PyPI yesterday and all went fine: it's possible to access the
|
|
metadata of all PyPI projects via a REST interface. Next step is to
|
|
write a client for this REST interface in distutils2.
|
|
|
|
## Example
|
|
|
|
For now, you can use pypioncouch via the command line, or via the python
|
|
API.
|
|
|
|
### Using the command line
|
|
|
|
You can do something like that for a full import. This **will** take
|
|
long, because it's fetching all the projects at pypi and importing their
|
|
metadata:
|
|
|
|
$ pypioncouch --fullimport http://your.couchdb.instance/
|
|
|
|
If you already have the data on your couchdb instance, you can just
|
|
update it with the last informations from pypi. **However, I recommend
|
|
to just replicate the principal node, hosted at
|
|
<http://couchdb.notmyidea.org/pypi/>**, to avoid the duplication of
|
|
nodes:
|
|
|
|
$ pypioncouch --update http://your.couchdb.instance/
|
|
|
|
The principal node is updated once a day by now, I'll try to see if it's
|
|
enough, and ajust with the time.
|
|
|
|
### Using the python API
|
|
|
|
You can also use the python API to interact with pypioncouch:
|
|
|
|
>>> from pypioncouch import XmlRpcImporter, import_all, update
|
|
>>> full_import()
|
|
>>> update()
|
|
|
|
## What's next ?
|
|
|
|
I want to make a couchapp, in order to navigate PyPI easily. Here are
|
|
some of the features I want to propose:
|
|
|
|
- List all the available projects
|
|
- List all the projects, filtered by specifiers
|
|
- List all the projects by author/maintainer
|
|
- List all the projects by keywords
|
|
- Page for each project.
|
|
- Provide a PyPI "Simple" API equivalent, even if I want to replace
|
|
it, I do think it will be really easy to setup mirrors that way,
|
|
with the out of the box couchdb replication
|
|
|
|
I also still need to polish the import mechanism, so I can directly
|
|
store in couchdb:
|
|
|
|
- The OPML files for each project
|
|
- The upload\_time as couchdb friendly format (list of int)
|
|
- The tags as lists (currently it's only a string separated by spaces
|
|
|
|
The work I've done by now is available on
|
|
<https://bitbucket.org/ametaireau/pypioncouch/>. Keep in mind that it's
|
|
still a work in progress, and everything can break at any time. However,
|
|
any feedback will be appreciated \!
|