blog.notmyidea.org/using-dbpedia-to-get-languages-influences.html
2019-07-02 22:54:50 +00:00

188 lines
No EOL
8.2 KiB
HTML

<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">
<link rel="shortcut icon" type="image/x-icon" href="favicon.ico" />
<title>Using dbpedia to get languages influences - Carnets Web</title>
<meta charset="utf-8" />
<link href="https://blog.notmyidea.org/feeds/all.atom.xml" type="application/atom+xml" rel="alternate" title="Carnets Web Full Atom Feed" />
<link rel="stylesheet" href="https://blog.notmyidea.org/theme/css/poole.css"/>
<link rel="stylesheet" href="https://blog.notmyidea.org/theme/css/syntax.css"/>
<link rel="stylesheet" href="https://blog.notmyidea.org/theme/css/lanyon.css"/>
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=PT+Serif:400,400italic,700%7CPT+Sans:400">
<link rel="stylesheet" href="https://blog.notmyidea.org/theme/css/styles.css"/>
<meta name="tags" contents="dbpedia" />
<meta name="tags" contents="sparql" />
<meta name="tags" contents="python" />
<style>
h1 {
font-family: "Avant Garde", Avantgarde, "Century Gothic", CenturyGothic, "AppleGothic", sans-serif;
padding: 80px 50px;
text-align: center;
text-transform: uppercase;
text-rendering: optimizeLegibility;
color: #202020;
letter-spacing: .1em;
text-shadow:
-1px -1px 1px #111,
2px 2px 1px #eaeaea;
}
#main {
text-align: justify;
text-justify: inter-word;
}
#main h1 {
padding: 10px;
}
.post-headline {
padding: 15px;
}
</style>
</head>
<body>
<!-- Target for toggling the sidebar `.sidebar-checkbox` is for regular
styles, `#sidebar-checkbox` for behavior. -->
<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox">
<!-- Toggleable sidebar -->
<div class="sidebar" id="sidebar">
<div class="sidebar-item">
<div class="profile">
<img src="https://blog.notmyidea.org/theme/img/profile.png"/>
</div>
</div>
<nav class="sidebar-nav">
<a class="sidebar-nav-item" href="/">Articles</a>
<a class="sidebar-nav-item" href="https://www.vieuxsinge.com">Brasserie du Vieux Singe</a>
<a class="sidebar-nav-item" href="http://blog.notmyidea.org/pages/about.html">A propos</a>
<a class="sidebar-nav-item" href="https://twitter.com/ametaireau">Messages courts</a>
<a class="sidebar-nav-item" href="https://github.com/almet">Code</a>
</nav>
</div> <div class="wrap">
<div class="masthead">
<div class="container">
<h3 class="masthead-title">
<a href="https://blog.notmyidea.org/" title="Home">Carnets Web</a>
</h3>
</div>
</div>
<div class="container content">
<div id="main" class="posts">
<h1 class="post-title">Using dbpedia to get languages influences</h1>
<span class="post-date">16 août 2011</span>
<img id="illustration" src="" />
<div class="post article">
<h1>🌟</h1>
<p>While browsing the Python's wikipedia page, I found information about the languages
influenced by python, and the languages that influenced python itself.</p>
<p>Well, that's kind of interesting to know which languages influenced others,
it could even be more interesting to have an overview of the connexion between
them, keeping python as the main focus.</p>
<p>This information is available on the wikipedia page, but not in a really
exploitable format. Hopefully, this information is provided into the
information box present on the majority of wikipedia pages. And… guess what?
there is project with the goal to scrap and index all this information in
a more queriable way, using the semantic web technologies.</p>
<p>Well, you may have guessed it, the project in question in dbpedia, and exposes
information in the form of RDF triples, which are way more easy to work with
than simple HTML.</p>
<p>For instance, let's take the page about python:
<a class="reference external" href="http://dbpedia.org/page/Python_%28programming_language%29">http://dbpedia.org/page/Python_%28programming_language%29</a></p>
<p>The interesting properties here are &quot;Influenced&quot; and &quot;InfluencedBy&quot;, which
allows us to get a list of languages. Unfortunately, they are not really using
all the power of the Semantic Web here, and the list is actually a string with
coma separated values in it.</p>
<p>Anyway, we can use a simple rule: All wikipedia pages of programming languages
are either named after the name of the language itself, or suffixed with &quot;(
programming language)&quot;, which is the case for python.</p>
<p>So I've built <a class="reference external" href="https://github.com/ametaireau/experiments/blob/master/influences/get_influences.py">a tiny script to extract the information from dbpedia</a> and transform them into a shiny graph using graphviz.</p>
<p>After a nice:</p>
<pre class="literal-block">
$ python get_influences.py python dot | dot -Tpng &gt; influences.png
</pre>
<p>The result is the following graph (<a class="reference external" href="http://files.lolnet.org/alexis/influences.png">see it directly here</a>)</p>
<img alt="Graph des influances des langages les uns sur les autres." src="http://files.lolnet.org/alexis/influences.png" style="width: 800px;" />
<p>While reading this diagram, keep in mind that it is a) not listing all the
languages and b) keeping a python perspective.</p>
<p>This means that you can trust the scheme by following the arrows from python to
something and from something to python, it is not trying to get the matching
between all the languages at the same time to keep stuff readable.</p>
<p>It would certainly be possible to have all the connections between all
languages (and the resulting script would be easier) to do so, but the resulting
graph would probably be way less readable.</p>
<p>You can find the script <a class="reference external" href="https://github.com/ametaireau/experiments">on my github account</a>. Feel free to adapt it for
whatever you want if you feel hackish.</p>
Vous pouvez également <a onclick="(function(){
let here = document.location;
document.location = `http://pdf.fivefilters.org/simple-print/url.php?size=A4#${here}`;
return false;
})();return false;">télécharger cet article en pdf</a>.
</div>
</div>
</div>
<label for="sidebar-checkbox" class="sidebar-toggle"></label>
<script>
(function(document) {
var i = 0;
// snip empty header rows since markdown can't
var rows = document.querySelectorAll('tr');
for(i=0; i<rows.length; i++) {
var ths = rows[i].querySelectorAll('th');
var rowlen = rows[i].children.length;
if (ths.length > 0 && ths.length === rowlen) {
rows[i].remove();
}
}
})(document);
</script>
<script>
/* Lanyon & Poole are Copyright (c) 2014 Mark Otto. Adapted to Pelican 20141223 and extended a bit by @thomaswilley */
(function(document) {
var toggle = document.querySelector('.sidebar-toggle');
var sidebar = document.querySelector('#sidebar');
var checkbox = document.querySelector('#sidebar-checkbox');
document.addEventListener('click', function(e) {
var target = e.target;
if(!checkbox.checked ||
sidebar.contains(target) ||
(target === checkbox || target === toggle)) return;
checkbox.checked = false;
}, false);
})(document);
</script>
<!-- Piwik -->
<script type="text/javascript">
var _paq = _paq || [];
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="//tracker.notmyidea.org/";
_paq.push(['setTrackerUrl', u+'piwik.php']);
_paq.push(['setSiteId', 3]);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.type='text/javascript'; g.async=true; g.defer=true; g.src=u+'piwik.js'; s.parentNode.insertBefore(g,s);
})();
</script>
<noscript><p><img src="//tracker.notmyidea.org/piwik.php?idsite=3" style="border:0;" alt="" /></p></noscript>
<!-- End Piwik Code -->
</div>
</body>
</html>