blog.notmyidea.org/using-dbpedia-to-get-languages-influences.html

105 lines
No EOL
5.5 KiB
HTML

<!DOCTYPE html>
<html lang="fr">
<head>
<title>
Using dbpedia to get languages&nbsp;influences - Alexis Métaireau </title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet"
href="https://blog.notmyidea.org/theme/css/main.css?v2"
type="text/css" />
<link href="https://blog.notmyidea.org/feeds/all.atom.xml"
type="application/atom+xml"
rel="alternate"
title="Alexis Métaireau ATOM Feed" />
</head>
<body>
<div id="content">
<section id="links">
<ul>
<li>
<a class="main" href="/">Alexis Métaireau</a>
</li>
<li>
<a class=""
href="https://blog.notmyidea.org/journal/index.html">Journal</a>
</li>
<li>
<a class="selected"
href="https://blog.notmyidea.org/code/">Code, etc.</a>
</li>
<li>
<a class=""
href="https://blog.notmyidea.org/weeknotes/">Notes hebdo</a>
</li>
<li>
<a class=""
href="https://blog.notmyidea.org/lectures/">Lectures</a>
</li>
<li>
<a class=""
href="https://blog.notmyidea.org/projets.html">Projets</a>
</li>
</ul>
</section>
<header>
<h1 class="post-title">Using dbpedia to get languages&nbsp;influences</h1>
<time datetime="2011-08-16T00:00:00+02:00">16 août 2011</time>
</header>
<article>
<p>While browsing the Python&#8217;s wikipedia page, I found information about
the languages influenced by python, and the languages that influenced
python&nbsp;itself.</p>
<p>Well, that&#8217;s kind of interesting to know which languages influenced
others, it could even be more interesting to have an overview of the
connexion between them, keeping python as the main&nbsp;focus.</p>
<p>This information is available on the wikipedia page, but not in a really
exploitable format. Hopefully, this information is provided into the
information box present on the majority of wikipedia pages. And… guess
what? there is project with the goal to scrap and index all this
information in a more queriable way, using the semantic web&nbsp;technologies.</p>
<p>Well, you may have guessed it, the project in question in dbpedia, and
exposes information in the form of <span class="caps">RDF</span> triples, which are way more easy
to work with than simple <span class="caps">HTML</span>.</p>
<p>For instance, let&#8217;s take the page about python:
<a href="http://dbpedia.org/page/Python_%28programming_language%29">http://dbpedia.org/page/Python_%28programming_language%29</a></p>
<p>The interesting properties here are &#8220;Influenced&#8221; and &#8220;InfluencedBy&#8221;,
which allows us to get a list of languages. Unfortunately, they are not
really using all the power of the Semantic Web here, and the list is
actually a string with coma separated values in&nbsp;it.</p>
<p>Anyway, we can use a simple rule: All wikipedia pages of programming
languages are either named after the name of the language itself, or
suffixed with &#8220;( programming language)&#8221;, which is the case for&nbsp;python.</p>
<p>So I&#8217;ve built <a href="https://github.com/ametaireau/experiments/blob/master/influences/get_influences.py">a tiny script to extract the information from
dbpedia</a>
and transform them into a shiny graph using&nbsp;graphviz.</p>
<p>After a&nbsp;nice:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>python<span class="w"> </span>get_influences.py<span class="w"> </span>python<span class="w"> </span>dot<span class="w"> </span><span class="p">|</span><span class="w"> </span>dot<span class="w"> </span>-Tpng<span class="w"> </span>&gt;<span class="w"> </span>influences.png
</code></pre></div>
<p>The result is the following graph (<a href="http://files.lolnet.org/alexis/influences.png">see it directly
here</a>)</p>
<p><img alt="Graph des influances des langages les uns sur les
autres." src="http://files.lolnet.org/alexis/influences.png"></p>
<p>While reading this diagram, keep in mind that it is a) not listing all
the languages and b) keeping a python&nbsp;perspective.</p>
<p>This means that you can trust the scheme by following the arrows from
python to something and from something to python, it is not trying to
get the matching between all the languages at the same time to keep
stuff&nbsp;readable.</p>
<p>It would certainly be possible to have all the connections between all
languages (and the resulting script would be easier) to do so, but the
resulting graph would probably be way less&nbsp;readable.</p>
<p>You can find the script <a href="https://github.com/ametaireau/experiments">on my github
account</a>. Feel free to adapt
it for whatever you want if you feel&nbsp;hackish.</p>
</article>
<footer>
<a id="feed" href="/feeds/all.atom.xml">
<img alt="RSS Logo" src="/theme/rss.svg" />
</a>
</footer>
</div>
</body>
</html>