bertails.orghttps://bertails.org/2015-06-17T00:00:00-05:00An RDF abstraction for the JVM2015-06-17T00:00:00-05:00Alexandre Bertailstag:bertails.org,2015-06-17:2015/06/17/an-rdf-abstraction-for-the-jvm/<p><a href="http://commonsrdf.incubator.apache.org/">Commons RDF</a> is an effort from the <a href="https://jena.apache.org/">Jena</a> and <a href="http://rdf4j.org/">Sesame</a> communities <cite>to define a common library for RDF 1.1 on the JVM</cite>. In my opinion, the current proposal suffers from design issues which seriously limit interoperability despite the stated objective. In this article, I will explain the limits of the current design and discuss alternatives to address the flaws.</p>
<p>This article is as much about RDF on the JVM as it is about API design and abstractions in Java. No prior knowledge with RDF is required as I will introduce the RDF model itself. So you might end up learning what RDF is as a side-effect :-)</p>
<h2 id="the-problem">the problem RDF Commons wants to solve</h2>
<p>For a long time now, if you wanted to do RDF (and SPARQL) stuff in Java, you basically had the choice between <a href="https://jena.apache.org/">Jena</a> and <a href="http://rdf4j.org/">Sesame</a>. Those two libraries were developed independently and didn’t share much, despite the fact that they are both implementations of <a href="http://www.w3.org/standards/techs/rdf#w3c_all">well-defined Web standards</a>.</p>
<p>So people have come up with ways to go back-and-forth between those two worlds: object adapters, meta APIs, ad-hoc APIs, etc. For example, let’s say you wanted to use that awesome asynchronous parser library for <a href="http://www.w3.org/TR/turtle/">Turtle</a>. It returns a Jena graph while your stack is mainly Sesame? Well it’s too bad for you. So you use an adapter which wraps every single objects composing your graph.</p>
<p>So let’s say you have the opportunity to solve those problems. What would you do? If you have done software development for a while, especially if it was in Java, your first thought might be about defining a common <a href="http://math.hws.edu/javanotes/c5/s5.html#OOP.5.2">class hierarchy</a> coupled with an <a href="http://en.wikipedia.org/wiki/Abstract_factory_pattern">abstract factory</a>. Then you could go back to the author of the Turtle library with a Pull Request using the new common interfaces, and everybody is happy, right?</p>
<p>Let’s see how this pans out in the case of Commons RDF 0.1.</p>
<h2 id="commons-rdf">Commons RDF</h2>
<p><a href="http://commonsrdf.incubator.apache.org/">Commons RDF</a> closely follows <a href="http://www.w3.org/TR/rdf11-concepts/">the concepts defined in RDF 1.1</a>, including the terms used. It specifically targets plain RDF (as opposed to <a href="http://www.w3.org/TR/rdf11-concepts/#section-generalized-rdf">Generalized RDF</a>) and wants to be type-safe as much as possible e.g. only IRIs and blank nodes are accepted in the subject position for a triple.</p>
<p>Here is a overview of the design of Commons RDF:</p>
<div style="float: right; margin-left: 4em; margin-right: 2em;">
<a href="class-diagram.png">
<img src="class-diagram.png" alt="Commons RDF class diagram" style="height: 30em" />
</a>
<p style="text-align: center; margin-top: 0px;">link to <a href="http://commonsrdf.incubator.apache.org/images/class-diagram.png">original image</a></p>
</div>
<ul>
<li>
<p>each RDF concept is mapped onto a Java interface: <code>Graph</code>, <code>Triple</code>, <code>RDFTerm</code>, <code>IRI</code>, <code>BlankNode</code>, <code>Literal</code></p>
</li>
<li>
<p>there is an additional concept: <code>BlankNodeOrIRI</code></p>
</li>
<li>
<p>there are sub-type relationships between <code>RDFTerm</code>, <code>BlankNodeOrIRI</code>, <code>IRI</code>, <code>BlankNode</code>, and <code>Literal</code></p>
</li>
<li>
<p>the interfaces expose methods to access their components</p>
</li>
<li>
<p>the factory <code>RDFTermFactory</code> knows how to create concrete instances of the interfaces</p>
</li>
</ul>
<div style="clear: both;"></div>
<p>Here is a quick look at what RDF <em>actually is</em> in the Commons RDF world (this is basically copied from the <a href="https://git-wip-us.apache.org/repos/asf?p=incubator-commonsrdf.git;a=tree;f=api/src/main/java/org/apache/commons/rdf/api;h=8f2db3110cd11d18e6128eac40cdf597372b73d0;hb=9ee66b0078da61fed85b5fe0b6d5481e9300b140">source code</a>):</p>
<pre><code class="language-java">package org.apache.commons.rdf.api;
public interface Graph {
void add(Triple triple);
boolean contains(Triple triple);
Stream<? extends Triple> getTriples();
...
}
public interface Triple {
BlankNodeOrIRI getSubject();
IRI getPredicate();
RDFTerm getObject();
}
public interface RDFTerm {
String ntriplesString();
}
public interface BlankNodeOrIRI extends RDFTerm { }
public interface IRI extends BlankNodeOrIRI {
String getIRIString();
}
public interface BlankNode extends BlankNodeOrIRI {
String uniqueReference();
}
public interface Literal extends RDFTerm {
String getLexicalForm();
IRI getDatatype();
Optional<String> getLanguageTag();
}
public interface RDFTermFactory {
default Graph createGraph() throws UnsupportedOperationException { ... }
default IRI createIRI(String iri)
throws IllegalArgumentException, UnsupportedOperationException { ... }
/** The returned blank node MUST NOT be equal to any existing */
default BlankNode createBlankNode()
throws UnsupportedOperationException { ... }
/** All `BlankNode`s created with the given `name` MUST be equivalent */
default BlankNode createBlankNode(String name)
throws UnsupportedOperationException { ... }
default Literal createLiteral(String lexicalForm)
throws IllegalArgumentException, UnsupportedOperationException { ... }
default Literal createLiteral(String lexicalForm, IRI dataType)
throws IllegalArgumentException, UnsupportedOperationException { ... }
default Literal createLiteral(String lexicalForm, String languageTag)
throws IllegalArgumentException, UnsupportedOperationException { ... }
default Triple createTriple(BlankNodeOrIRI subject, IRI predicate, RDFTerm object)
throws IllegalArgumentException, UnsupportedOperationException { ... }
}
</code></pre>
<p>Everything looks actually good and pretty standard, right? So you might be wondering why I am not that thrilled by this approach. Keep on reading then :-)</p>
<h2 id="class-based-design">class-based design</h2>
<p>As a reminder, in most static languages, <strong>types are only a compile time information</strong>. In Java, classes and interfaces in Java are just a <a href="http://stackoverflow.com/a/5315433/1057315">reified</a> version of types (up to generics <a href="https://docs.oracle.com/javase/tutorial/java/generics/erasure.html">which get erased by the JVM</a>), meaning that they are an (incomplete <em>by design</em>) abstraction for types that can be manipulated at runtime.</p>
<p>RDF Commons decided to model the RDF types using interfaces. In Java, interfaces and classes rely on what we call <a href="http://en.wikipedia.org/wiki/Nominal_type_system#Nominal_subtyping">nominal subtyping</a>. It means that a concrete implementation is required to <em>explicitly</em> extend (or implement) an interface to be considered a subtype.</p>
<p>In other words, despite <code>java.lang.UUID</code> being a perfectly acceptable candidate for being a <code>BlankNode</code>, it is impossible to use it <em>directly</em> because <code>UUID</code> does not implement <code>BlankNode</code>, so <code>UUID</code> has to be wrapped. There are actually many other cases like that: <code>java.net.URI</code> or <code>akka.http.model.Uri</code> are acceptable candidates for <code>IRI</code>, <code>java.lang.String</code> or <code>java.lang.Integer</code> for <code>Literal</code>, etc.</p>
<p>So here is my first and main complaint about Commons RDF: <strong>it forces implementers to coerce their types into its own class hierarchy</strong> and there is no good reason for doing so.</p>
<h2 id="generics">generics</h2>
<p>How can be define abstract types and operations on them without relying on class/interface inheritance? You already know the answer, as it is the same story than with <code>java.util.Comparator<T></code> and <code>java.lang.Comparable<T></code>.</p>
<p>Let’s see what the factory would look like with this approach:</p>
<pre><code class="language-java">public interface RDFTermFactory<Graph,
Triple,
RDFTerm,
BlankNodeOrIRI extends RDFTerm,
IRI extends BlankNodeOrIRI,
BlankNode extends BlankNodeOrIRI,
Literal extends RDFTerm> {
/* same factory functions as before go here */
}
</code></pre>
<p>Instead of referring to Java interfaces, we now refer to the new introduced generics. In a way, <strong>generics are more abstract than interfaces</strong>. Also, generics let you express the subtype relationship using <code>extends</code>.</p>
<p>As you have probably already noticed, that only gives us a way to <em>create inhabitants for those types</em>. We also need a way to <em>access their components</em>.</p>
<h2 id="rdf-module">RDF module</h2>
<p>Accessing components was the role of the methods defined on the interfaces. So we just have to move them into the factory and make them <em>functions</em> instead. And because the factory is now made of all the operations actually defining the RDF model, we can refer to it as the <strong><code>RDF</code> module</strong>.</p>
<pre><code class="language-java">public interface RDF<Graph,
Triple,
RDFTerm,
BlankNodeOrIRI extends RDFTerm,
IRI extends BlankNodeOrIRI,
BlankNode extends BlankNodeOrIRI,
Literal extends RDFTerm> {
// from org.apache.commons.rdf.api.RDFTermFactory
BlankNode createBlankNode();
BlankNode createBlankNode(String name);
Graph createGraph();
IRI createIRI(String iri) throws IllegalArgumentException;
Literal createLiteral(String lexicalForm) throws IllegalArgumentException;
Literal createLiteral(String lexicalForm, IRI dataType) throws IllegalArgumentException;
Literal createLiteral(String lexicalForm, String languageTag) throws IllegalArgumentException;
Triple createTriple(BlankNodeOrIRI subject, IRI predicate, RDFTerm object) throws IllegalArgumentException;
// from org.apache.commons.rdf.api.Graph
Graph add(Graph graph, BlankNodeOrIRI subject, IRI predicate, RDFTerm object);
Graph add(Graph graph, Triple triple);
Graph remove(Graph graph, BlankNodeOrIRI subject, IRI predicate, RDFTerm object);
boolean contains(Graph graph, BlankNodeOrIRI subject, IRI predicate, RDFTerm object);
Stream<? extends Triple> getTriplesAsStream(Graph graph);
Iterable<Triple> getTriplesAsIterable(Graph graph, BlankNodeOrIRI subject, IRI predicate, RDFTerm object);
long size(Graph graph);
// from org.apache.commons.rdf.api.Triple
BlankNodeOrIRI getSubject(Triple triple);
IRI getPredicate(Triple triple);
RDFTerm getObject(Triple triple);
// from org.apache.commons.rdf.api.RDFTerm
<T> T visit(RDFTerm t,
Function<IRI, T> fIRI,
Function<BlankNode, T> fBNode,
Function<Literal, T> fLiteral);
// from org.apache.commons.rdf.api.IRI
String getIRIString(IRI iri);
// from org.apache.commons.rdf.api.BlankNode
String uniqueReference(BlankNode bnode);
// from org.apache.commons.rdf.api.Literal
IRI getDatatype(Literal literal);
Optional<String> getLanguageTag(Literal literal);
String getLexicalForm(Literal literal);
}
</code></pre>
<p>We are doing exactly the same thing as <a href="http://stackoverflow.com/questions/2709821/what-is-the-purpose-of-self-in-python">Python does with <code>self</code></a>: class methods are just functions where the first argument used to be the receiver (aka the object) of the methods.</p>
<p>For the sake of brevity, I am actually showing you the final result for the <code>RDF</code> module. Let’s discuss the other issues that were fixed at the same time.</p>
<h2 id="visitor">visitor</h2>
<p>In Commons RDF 0.1, an <code>RDFTerm</code> is either an <code>IRI</code> or a <code>BlankNode</code> or a <code>Literal</code>. It is not clear to me how a user can dispatch a function over an <code>RDFTerm</code> based on its actual nature.</p>
<p>My best guess is that one is expected to use <code>instanceof</code> to discriminate between the possible interfaces. In practice, this cannot really work. As a counter-example, consider this implementation of <code>RDFTerm</code> which relies on the <a href="http://www.w3.org/TR/n-triples/">N-Triples</a> encoding of the term:</p>
<pre><code class="language-java">public class NTriplesBased implements RDFTerm, IRI, BlankNode, Literal {
private String ntriplesRepresentation;
...
}
</code></pre>
<p>So how does one visit a class-hierarchy in Java? By using the <a href="http://en.wikipedia.org/wiki/Design_Patterns">Gang of Four</a>’s <a href="http://en.wikipedia.org/wiki/Visitor_pattern">Visitor Pattern</a> of course! Ah ah, just kidding. It’s 2015, we can now have a <a href="http://logji.blogspot.com/2012/02/correcting-visitor-pattern.html">stateless and polymorphic version of the visitor pattern</a>. Actually, we can do even better using Java 8’s lambdas.</p>
<p>The <code>RDF#visit</code> function defined above in the <code>RDF</code> module is a <a href="https://en.wikipedia.org/wiki/Catamorphism">visitor on steroids</a>:</p>
<pre><code class="language-java"><T> T visit(RDFTerm t,
Function<IRI, T> fIRI,
Function<BlankNode, T> fBNode,
Function<Literal, T> fLiteral);
</code></pre>
<p>The contract for <code>RDF#visit</code> is pretty simple: dispatch the right function – <code>fIRI</code> or <code>fBNode</code> or <code>fLiteral</code> – by case, depending on what the <code>RDFTerm t</code> actually is. Note that the function itself is parameterized on the return type, so that any computation can be defined. And finally, as explained before, the <a href="http://en.wikipedia.org/wiki/Visitor_pattern#Details"><em>element</em> part of the visitor</a> – the <code>RDFTerm</code> itself – has become the first argument of the function, instead of being the receiver of a method.</p>
<p>Finally, here is what it looks like on the user site:</p>
<pre><code class="language-java">RDFTerm term = ???;
String someString = rdf.visit(term,
iri -> rdf.getIRIString(iri),
bnode -> rdf.uniqueReference(bnode),
literal -> rdf.getLexicalForm(literal));
</code></pre>
<h2 id="downcasting">downcasting</h2>
<p><code>RDFTermFactory</code> follows the <a href="http://en.wikipedia.org/wiki/Abstract_factory_pattern">Abstract Factory pattern</a> which is in practice very limited. Pretty often, seeing the generic interface is just not enough and people end up <a href="http://stackoverflow.com/questions/380813/downcasting-in-java">downcasting</a> anyway because other functionalities may need to be exposed from the sub-types.</p>
<p>In my opinion, this is a big issue in something Commons RDF and <a href="https://www.artima.com/interfacedesign/PreferPoly.html">we can do better</a>. In fact, it comes for free in the <code>RDF</code> module defined above, as <strong>the user sees the types that were actually bound to the generics</strong>.</p>
<h2 id="immutable-graph">immutable graph</h2>
<p>If you look at <code>Graph#add(Triple)</code> you’ll see that it returns <code>void</code>: graphs in Commons RDF 0.1 <em>have to be mutated in place and there is no way around it</em>. This is wrong but do not expect me to use this post for making the case for alowing immutable graphs: it’s 2015 and I should not have to do that.</p>
<p>Especially when the fix is very simple: <strong>just make <code>add</code> return a new <code>Graph</code></strong>. That’s actually what <code>Graph RDF#add(Graph,Triple)</code> does.</p>
<p>Note that with this approach, one can still manipulate mutable graphs. It’s just that code using <code>RDF#add</code> should always use the returned <code>Graph</code>, even if it happens to have been mutated in place.</p>
<h2 id="stateless-bnode-generator">stateless blank node generator</h2>
<p>This is how one can create new blank nodes in Commons RDF 0.1 (<a href="https://git-wip-us.apache.org/repos/asf?p=incubator-commonsrdf.git;a=blob;f=api/src/main/java/org/apache/commons/rdf/api/RDFTermFactory.java;h=2801814832b5961769c6d2fbde02d1e494db1124;hb=9ee66b0078da61fed85b5fe0b6d5481e9300b140#l42">full javadoc here</a>):</p>
<pre><code class="language-java">/** The returned blank node MUST NOT be equal to any existing */
default BlankNode createBlankNode()
throws UnsupportedOperationException { ... }
/** All `BlankNode`s created with the given `name` MUST be equivalent */
default BlankNode createBlankNode(String name)
throws UnsupportedOperationException { ... }
</code></pre>
<p>The contract on the second <code>createBlankNode</code> is problematic as <em>a map from names to previously allocated <code>BlankNodes</code> has to be maintained somewhere</em>. Of course, I am ruling out strategies relying on hashes e.g. <code>UUID#nameUUIDFromBytes</code>, because the <code>BlankNode</code>s would no longer be scoped and two different blank nodes <code>_:b1</code> from two different <a href="http://www.w3.org/TR/turtle/">Turtle</a> documents would return the “equivalent <code>BlankNode</code>”. So that means that <strong><code>RDFTermFactory</code> is not stateless</strong> and whether the state is within the factory or in a shared state is not relevant.</p>
<p><strong>I believe that this is outside of the RDF model and that it has no place in the framework.</strong> The mapping from <code>name</code> to <code>BlankNode</code> can always be maintained on the user site, using the strategy that fits the best. Still you can see that I defined <code>BlankNode RDF#createBlankNode(String)</code>. It’s because I think another contract can be useful here, where a <code>String</code> can be passed as a hint to be retrieved later e.g. when using <code>RDF#uniqueReference</code>. But it’s only a hint, it has no impact on the model itself.</p>
<h2 id="UnsupportedOperationException">UnsupportedOperationException</h2>
<p>I just do not understand the value in specifying methods that can throw a <code>UnsupportedOperationException</code> in the context of Commons RDF. I mean, how am I expected to recover from such an exception? Does it make sense to allow for partial implementation?</p>
<p>Until I see a good use case for that, I have simply removed those exceptions declarations from the functions defined in the <code>RDF</code> module.</p>
<h2 id="user-side">user side</h2>
<p>Finally, let’s see how a library user could define a parser/serializer using the <code>RDF</code> module:</p>
<pre><code class="language-java">public class WeirdTurtle<Graph, Triple, RDFTerm, BlankNodeOrIRI extends RDFTerm, IRI extends BlankNodeOrIRI, BlankNode extends BlankNodeOrIRI, Literal extends RDFTerm> {
private RDF<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode, Literal> rdf;
WeirdTurtle(RDF<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode, Literal> rdf) {
this.rdf = rdf;
}
/* a very silly parser */
public Graph parse(String input) {
Triple triple =
rdf.createTriple(rdf.createIRI("http://example.com/Alice"),
rdf.createIRI("http://example.com/name"),
rdf.createLiteral("Alice"));
Graph graph = rdf.createGraph();
return rdf.add(graph, triple);
}
/* a very silly serializer */
public String serialize(Graph graph) {
Triple triple = rdf.getTriplesAsIterable(graph, null, null, null).iterator().next();
RDFTerm o = rdf.getObject(triple);
return rdf.visit(o,
iri -> rdf.getIRIString(iri),
bn -> rdf.uniqueReference(bn),
lit -> rdf.getLexicalForm(lit));
}
}
</code></pre>
<h2 id="summary">summary</h2>
<p>Please allow me to be harsh: <strong>I believe that Commons RDF is mostly useless</strong> in its current form as it suffers from the many flaws I have described in this article.</p>
<p>As you can expect, <a href="http://mail-archives.apache.org/mod_mbox/commonsrdf-dev/201505.mbox/%3CCANvn8kzfnDoA=sgZLmHyN=bE2Gmba=1oNO0Haq72mi_GFkLRng@mail.gmail.com%3E">I have already shared those concerns on the Commons RDF mailing list</a> but I was told that it would be <cite href="http://mail-archives.apache.org/mod_mbox/commonsrdf-dev/201505.mbox/%3CCAOfJQJ0Bnm0Z+J1F6NZMU8oDk0O7J1H+rM1fLL2kUpTtp_9ECQ@mail.gmail.com%3E">much more valuable to see a patch about your proposal than a quick hack from scratch</cite>. Sadly this is no “quick hack” and there is no small patch.</p>
<p>The good news is that <strong>the approach described here works with any RDF implementation on the JVM</strong>, including Jena, Sesame, or <a href="https://github.com/w3c/banana-rdf">banana-rdf</a>. And more importantly, <strong>it works today!</strong></p>
<p>So if you are interested in a classless – but still classy – and immutable-friendly RDF abstraction for the JVM, I invite you to get in touch with me so that we can define that abstraction together.</p>
How to read a PGP-encrypted email from the command-line2015-02-21T00:00:00-05:00Alexandre Bertailstag:bertails.org,2015-02-21:2015/02/21/decrypt-pgp-email-from-command-line/<p>I received a PGP-encrypted email a couple days ago with confidential information. As I use GMail, I do not have direct support for PGP. Here are the steps I followed in order to extract the message and the files in it, from the command-line.</p>
<p>Disclaimer: I already knew the <a href="https://www.gnupg.org/gph/en/manual.html">big picture for PGP</a> but it was the first time I had to effectively use it.</p>
<p>The message was empty with just a <code>msg.asc</code> file in it:</p>
<pre><code>$ head msg.asc
-----BEGIN PGP MESSAGE-----
Version: GnuPG v1
hQEMA5Rm9tOuXUEGAQgAlcrBh++K7tBf6UhLPR3MM1S3N94xfSRamHWLXMBj5dp6
9fg+a2GuQDRnta+QRgmlkgXha/6vU9eFzqx9Fh7neeFOC2aOc+8wq7KSNXjUaX0o
wRdm1Jbh7fKy9ygNKGcTkikrpuVtYj1GrLjKD5CJ0gdGvv9vQIr8bUuVE+WwKgOr
hIv4sWDXChiWahDtY8A/LktfAWd0eVZ47FzQQ/LKo89v8POxvqPACmyzDRNKkNhy
AJSu2kjA44k/f79n880lMKZ89GMYjzKISxkxWYi4ccZPOmXgYFIrx5SFDhJNPhaw
1gd3InrLpBdTYGuJZxwRcZ1SpY4v5siDLoXQHnuHONLtAZh02Viq/F0cwWuRyMk5
km2lb3OREW2bHEzHTL5U4/Vb71cup0U7js7J7WvxOR7TCzizShX4w+uRAbfuLmH+
</code></pre>
<p>So I basically knew this had been encrypted using PGP with <a href="http://keyserver.ubuntu.com/pks/lookup?op=vindex&search=bertails&fingerprint=on">my public PGP key</a>. I already had one because I need to <a href="http://www.scala-sbt.org/release/docs/Using-Sonatype.html#First+-+PGP+Signatures">sign artifacts when publishing on Sonatype</a>.</p>
<p>First, I install the GPG toolkit for Ubuntu:</p>
<pre><code>$ sudo apt-get install pgpgpg
</code></pre>
<p>Then I get the sender’s public key from his website and add it to my keyring (<a href="http://www.math.utah.edu/~beebe/PGP-notes.html">source</a>):</p>
<pre><code>$ pgp -ka send.pubkey
</code></pre>
<p>And I can finally extract the content.</p>
<pre><code>$ pgp msg.asc
</code></pre>
<p>The resulting file <code>msg</code> is a typical email attachment:</p>
<pre><code>Content-Type: multipart/mixed; boundary="a8Wt8u1KmwUX3Y2C"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
--a8Wt8u1KmwUX3Y2C
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
...
--a8Wt8u1KmwUX3Y2C
Content-Type: application/pdf
Content-Disposition: attachment; filename="the-file.pdf"
Content-Transfer-Encoding: base64
...
--a8Wt8u1KmwUX3Y2C--
</code></pre>
<p>I still need to extract the PDF. For that, I used <code>munpack</code>:</p>
<pre><code>$ sudo apt-get install mpack
$ munpack msg
the-file.pdf (application/pdf)
</code></pre>
<p>Et voilà!</p>
<h2>Bonus</h2>
<p>If you have forgotten your passphrase to unlock your PGP key, you can use this command <a href="http://stackoverflow.com/a/11484411">found on stackoverflow</a>:</p>
<pre><code>$ echo "1234" | gpg --no-use-agent -o /dev/null --local-user 'Alexandre Bertails <alexandre@bertails.org>' --no-greeting -as - && echo "The correct passphrase was entered for this key"
</code></pre>
<p>You can also use the <code>--passphrase</code> parameter if needed.</p>
Abstract Algebraic Data Type2015-02-15T00:00:00-05:00Alexandre Bertailstag:bertails.org,2015-02-15:2015/02/15/abstract-algebraic-data-type/<p>Scala’s sealed class hierarchies (aka. <a href="https://en.wikipedia.org/wiki/Algebraic_data_type">Algebraic Data Types</a>) are for sure one of its most praised features. Yet, they have one downside: they don’t let us abstract over the type hierarchy as <code>trait</code>s and <code>class</code>es are all about constructing new <strong>concrete</strong> types.</p>
<p>In this post, we will explore how we can relax this constraint so that we can get an abstracted version of <code>scala.Option</code>, which would allow us to switch implementations.</p>
<h2 id="deconstructing">Deconstructing Scala's algebraic data types</h2>
<p>As a reminder, here is <a href="https://github.com/scala/scala/blob/2.11.x/src/library/scala/Option.scala">how Scala’s <code>Option</code>s are implemented</a>:</p>
<pre><code class="language-scala">sealed abstract class Option[+A]
final case class Some[+A](x: A) extends Option[A]
case object None extends Option[Nothing]
</code></pre>
<p>Here is the corresponding scaladoc diagram:</p>
<p><a href="scala-option.png"><img src="scala-option.png" alt="scala.Option" /></a></p>
<p>There are quite a few things happening here: * we need to be able to speak about the <strong>types and their relationships</strong>; * then we need a way to <strong>inject</strong> values in those types; * finally we need a way to inspect the values for those types to <strong>extract</strong> their content.</p>
<h2 id="type-and-relationships">On types and subtyping</h2>
<p>There is a <strong>subtyping relationship</strong> between <code>Some</code>/<code>None</code> and <code>Option</code>.</p>
<p>Actually, <code>None</code> itself is not a type but a value, whose type is <code>None.type</code>, a subtype of <code>Option</code>. Also, <code>Option</code> and <code>Some</code> are not technically types, but <strong>type constructors</strong> (aka. <a href="https://stackoverflow.com/questions/6246719/what-is-a-higher-kinded-type-in-scala">higher kinded types</a>): we need to provide a type <code>A</code> to produce an <code>Option[A]</code> type.</p>
<p>Finally, <code>Option</code> is covariant in its parameterized type <code>A</code>, so that <code>Option[Nothing]</code> is a subtype of <code>Option[A]</code> because <a href="https://stackoverflow.com/questions/1728541/if-the-nothing-type-is-at-the-bottom-of-the-class-hierarchy-why-can-i-not-call"><code>Nothing</code> is a subtype of any type</a>.</p>
<h2 id="injectors">On injectors</h2>
<p>We have two (here somewhat equivalent) ways of <strong>injecting</strong> a value of type <code>A</code> into a <code>Some[A]</code>:</p>
<ul>
<li>we can do that through the class constructor, eg. <code>new Some(42)</code></li>
<li>or more natually through <code>Some.apply</code>, eg. <code>Some(42)</code></li>
</ul>
<p><code>None</code> is a singleton object, therefore it is the <em>only inhabitant</em> of <code>None.type</code>.</p>
<h2 id="extractors">On extractors</h2>
<p>Given an <code>Option[A]</code>, we can reason by cases using <strong>pattern matching</strong>.</p>
<p>This is achieved through the <a href="http://docs.scala-lang.org/tutorials/tour/extractor-objects.html"><code>unapply</code> extractor methods</a> on the <code>Option</code> companion object. And because <code>Option</code> is sealed, the type checker will be able to check for exhaustiveness.</p>
<p>Finally, given a <code>Some[A]</code>, we can retrieve its content through the <code>x</code> field accessor, or again using the <code>Some.unapply</code> extractor.</p>
<h2 id="abstracting-over-types">Abstracting over types</h2>
<p>My colleague <a href="https://twitter.com/dwhjames">Dan</a> explored <a href="http://io.pellucid.com/blog/scalas-modular-roots">how to encode modules in Scala</a> in a previous blog article. If you haven’t read it yet, I warmly recommend you to do it, even if not strictly necessary for understanding what is going on here. Also here, I will choose yet another encoding using a <a href="https://en.wikipedia.org/wiki/Type_class">typeclass</a> approach.</p>
<p>First, let’s define the entire type hierarchy in one place:</p>
<pre><code class="language-scala">import scala.language.higherKinds
trait OptionSig {
type Option[+_]
type Some[+A] <: Option[A]
type None <: Option[Nothing]
}
</code></pre>
<p>We used the <code>Sig</code> suffix as if <code>OptionSig</code> was an <a href="http://caml.inria.fr/pub/docs/oreilly-book/html/book-ora131.html">ML module signature</a> but this is <strong>not the complete signature</strong> as there are no function defined in this trait.</p>
<p>This is just a convenient way to gather several types into a single one, a bit like a record, but for types. Given an <code>OptionSig</code>, we can now speak about one of the types it contains using a type projection, eg. <code>OptionSig#Option[A]</code>.</p>
<h2 id="abstracting-over-operations">Abstracting over operations</h2>
<p>Now that we have a type hierarchy, we can complete our signature with the operations that must be defined over it:</p>
<pre><code class="language-scala">abstract class OptionOps[Sig <: OptionSig] {
def some[A](x: A): Sig#Some[A]
def none: Sig#None
def fold[A, B](opt: Sig#Option[A])(ifNone: => B, ifSome: A => B): B
}
</code></pre>
<p>You might be wondering why we need this <code>Sig</code> as a subtype for <code>OptionSig</code>, as this is usually not needed for typeclasses. It’s because we need to be able to project its inner types.</p>
<p><code>some[A]</code> is the injector for <code>Sig#Some[A]</code>. <code>none</code> doesn’t take any parameter, so it really acts as a singleton value for <code>Sig#None</code>.</p>
<p><code>fold[A, B]</code> is the essence of the <code>Sig#Option[A]</code> type: given the two passed functions, it can react on the actual type for <code>opt</code> at runtime:</p>
<ul>
<li>if <code>opt</code> was a <code>Sig#None</code>, then the value for <code>ifNone</code> is returned (notice that it is a <em>lazy parameter</em> which is only computed if needed)</li>
<li>if <code>opt</code> was a <code>Sig#Some[A]</code>, then the <code>ifSome</code> function has access to the contained value to compute its result</li>
</ul>
<p>By the way, an algebra defined through a <code>fold</code> is called a <a href="https://en.wikipedia.org/wiki/Catamorphism">catamorphism</a>!</p>
<p>Finally, we can define a helper to retrieve an instance of <code>OptionOps[Sig]</code> given a signature, if it is available:</p>
<pre><code class="language-scala">object OptionOps {
def apply[Sig <: OptionSig](implicit ops: OptionOps[Sig]): OptionOps[Sig] = ops
}
</code></pre>
<h2 id="functor">Functions over `OptionSig`/`OptionOps`</h2>
<p>We now want to define new structures that depends on our module. For this, we need something similar to an <a href="http://caml.inria.fr/pub/docs/oreilly-book/html/book-ora131.html">ML functor</a>.</p>
<p>For example, let’s define a functor that can construct instances of <code>scalaz.Show</code> for us:</p>
<pre><code class="language-scala">import scalaz.Show
class OptionShow[Sig <: OptionSig : OptionOps] {
def optionShow[A : Show]: Show[Sig#Option[A]] = {
// retrieving the typeclass instances
val showA = Show[A]
val ops = OptionOps[Sig]
val instance = new Show[Sig#Option[A]] {
override def shows(opt: Sig#Option[A]): String = ops.fold(opt)(
"none",
x => s"some(${showA.shows(x)})"
)
}
instance
}
}
object OptionShow {
implicit def apply[Sig <: OptionSig : OptionOps]: OptionShow[Sig] = new OptionShow[Sig]
}
</code></pre>
<p>That is a lot of weird Scala notations that you may not be familiar with. Let’s decompose them.</p>
<p><code>OptionShow[Sig <: OptionSig : OptionOps]</code> means that <code>OptionShow</code> is parameterized by a <code>Sig</code>, which is required to be a subtype of <code>OptionSig</code>. Also an instance of <code>OptionOps[Sig]</code> must be <strong>implicitly</strong> available.</p>
<p><code>def optionShow[A : Show]: Show[Sig#Option[A]]</code> means that if we can provide an instance of <code>Show[A]</code>, then <code>optionShow</code> can construct an instance of <code>Show[Sig#Option[A]]</code> for us.</p>
<p><code>scalaz.Show</code> is a simple yet powerful typeclass from Scalaz. It simply provides a <code>shows</code> function for instances of the provided type (here <code>Sig#Option[A]</code>). The trick here is that unlike <code>Object#toString()</code>, our <code>Show</code> instances are <strong>driven by types</strong>, so we can rely on a <code>Show[A]</code> being available.</p>
<h2 id="simple-implementation">A simple implementation</h2>
<p>We almost have everything we need in place. We just need to provide an implementation for our module.</p>
<p><code>scala.Option</code> looks like a good candidate for a first implementation, after all that’s where we started from:</p>
<pre><code class="language-scala">trait ScalaOption extends OptionSig {
type Option[+A] = scala.Option[A]
type Some[+A] = scala.Some[A]
type None = scala.None.type
}
object ScalaOption {
implicit object ops extends OptionOps[ScalaOption] {
def some[A](x: A): ScalaOption#Some[A] = scala.Some(x)
val none: ScalaOption#None = scala.None
def fold[A, B](opt: ScalaOption#Option[A])(ifNone: => B, ifSome: A => B): B =
opt match {
case scala.None => ifNone
case scala.Some(x) => ifSome(x)
}
}
}
</code></pre>
<p>Nothing fancy here. We just plugged (aka. <a href="http://docs.scala-lang.org/tutorials/tour/abstract-types.html">aliased</a>) our types to the concrete ones. <code>some</code> and <code>none</code> respectively delegate to the <code>Some.apply</code> function and the <code>None</code> singleton. Finally, the <code>fold</code> implementation relies on pattern matching.</p>
<p>Just note that the typeclass instance for <code>OptionOps[ScalaOption]</code> is made available in the companion object for <code>ScalaOption</code> so that <a href="http://eed3si9n.com/implicit-parameter-precedence-again">it will <strong>always</strong> be picked up by Scala when looking for such an implicit</a>.</p>
<h2 id="program">Using our option</h2>
<p>Finally, we can write a program using our shiny abstractions :-)</p>
<pre><code class="language-scala">class Program[Sig <: OptionSig : OptionOps] extends App {
val ops = OptionOps[Sig]
import ops._
// a little dance to derive our Show instance
import scalaz.std.anyVal.intInstance
val showOptOptInt = {
implicit val showOptInt = OptionShow[Sig].optionShow[Int]
OptionShow[Sig].optionShow[Sig#Option[Int]]
}
// scalaz's syntax tricks are awesome
import showOptOptInt.showSyntax._
val optOpt = some(some(42))
println("optOpt: " + optOpt.shows)
val optNone = some(none)
println("optNone: " + optNone.shows)
}
</code></pre>
<p>And we plug everything together:</p>
<pre><code class="language-scala">scala> object MainWithScalaOption extends Program[ScalaOption]
defined object MainWithScalaOption
scala> MainWithScalaOption.main(Array())
optOpt: some(some(42))
optNone: some(none)
</code></pre>
<h2 id="custom-implementation">Our own module implementation</h2>
<p>Turns out there are many ways to implement our module.</p>
<p>Here is a version of our module where we provide our own classes:</p>
<pre><code class="language-scala">object MyOption extends OptionSig {
sealed abstract class Option[+A]
final case class Some[+A](x: A) extends Option[A]
sealed abstract class None extends Option[Nothing]
case object None extends None
implicit object ops extends OptionOps[MyOption.type] {
def some[A](x: A): MyOption.type#Some[A] = Some(x)
val none: MyOption.type#None = None
def fold[A, B](opt: MyOption.type#Option[A])(ifNone: => B, ifSome: A => B): B =
opt match {
case None => ifNone
case Some(x) => ifSome(x)
}
}
}
</code></pre>
<p>Notice that our signature lies in the singleton type <code>MyOption.type</code>. Scala will have no issue finding the implicit instance in itself because <em>the companion object for a singleton object is itself</em>!</p>
<p>We have introduced an <code>abstract class None</code> so that we don’t need to define a type alias <code>type None = None.type</code>. It also is interesting to see that Scala doesn’t require us to define our classes outside of <code>MyOption</code> to later alias them: we just do everything at once.</p>
<h2 id="java8-implementation">Java8-based implementation</h2>
<p>Now, let’s reuse Java 8 <code>java.util.Optional</code>!</p>
<pre><code class="language-scala">import java.util.Optional
trait Java8Option extends OptionSig {
type Option[+A] = Optional[_ <: A]
type Some[+A] = Optional[_ <: A]
type None = Optional[Nothing]
}
object Java8Option {
implicit object ops extends OptionOps[Java8Option] {
def some[A](x: A): Java8Option#Some[A] = Optional.of(x)
val none: Java8Option#None = Optional.empty()
def fold[A, B](opt: Java8Option#Option[A])(ifNone: => B, ifSome: A => B): B = {
import java.util.function.{ Function => F, Supplier }
def f = new F[A, B] { def apply(a: A): B = ifSome(a) }
def supplier = new Supplier[B] { def get(): B = ifNone }
opt.map[B](f).orElseGet(supplier)
}
}
}
</code></pre>
<p><a href="http://docs.oracle.com/javase/8/docs/api/java/util/Optional.html">Java 8’s <code>Optional</code></a> has only one class for the two cases, and it was made invariant. Still, we can easily fix that on the Scala side with <code>[_ <: A]</code>.</p>
<h2 id="any-implementation">`Any`-based implementation</h2>
<p>Remember all the rage wars on <code>Option</code> vs <code>null</code>? Or the problem with boxing? Look at that:</p>
<pre><code class="language-scala">trait NullOption extends OptionSig {
type Option[+A] = Any
type Some[+A] = Any
type None = Null
}
object NullOption {
implicit object ops extends OptionOps[NullOption] {
def some[A](x: A): NullOption#Some[A] = x
val none: NullOption#None = null
def fold[A, B](opt: NullOption#Option[A])(ifNone: => B, ifSome: A => B): B = {
if (opt == null) ifNone
else ifSome(opt.asInstanceOf[A])
}
}
}
</code></pre>
<p>Yes, that’s right, we are relying on <code>null</code> for the <code>None</code> case while the <code>Some</code> case is the value itself :-)</p>
<p>But this is <strong>completely typesafe</strong> as it never leaks outside of the abstraction. The trick is that <code>Null</code> is a subtype of <code>Any</code>. And you can note that that there is <strong>no wrapping involved</strong>.</p>
<h2 id="final">Back to our program</h2>
<p>We now have four implementations of our option module, all behaving the same way:</p>
<pre><code class="language-scala">scala> object MainWithScalaOption extends Program[ScalaOption]
defined object MainWithScalaOption
scala> MainWithScalaOption.main(Array())
optOpt: some(some(42))
optNone: some(none)
scala> object MainWithJava8Option extends Program[Java8Option]
defined object MainWithScalaOption
scala> MainWithJava8Option.main(Array())
optOpt: some(some(42))
optNone: some(none)
scala> object MainWithMyOption extends Program[MyOption.type]
defined object MainWithMyOption
scala> MainWithMyOption.main(Array())
optOpt: some(some(42))
optNone: some(none)
scala> object MainWithNullOption extends Program[NullOption]
defined object MainWithNullOption
scala> MainWithNullOption.main(Array())
optOpt: some(some(42))
optNone: none
</code></pre>
<p>How cool is that?</p>
<h2 id="summary">Summary</h2>
<p>In the process, we have shown that typeclasses are a great alternative to the cake pattern when it comes to encode modules in Scala.</p>
<p>In practice, some variations are possible. For example, we could have ignored the subtyping relationships altogether. We would have end up with something closer to what happens in OCaml or Haskell as the constructors would both return a <code>OptionSig#Option[A]</code> instead of a subtype. Also, it would be easy to define some syntax enhancement, so that one could directly write something like <code>myOption.fold("42", x => x.toString)</code>.</p>
<p>Finally, if you are interested in a more complex example using the techniques described here, have a look at <a href="https://github.com/w3c/banana-rdf">Banana-RDF</a> and its <a href="https://github.com/w3c/banana-rdf/blob/master/rdf/common/src/main/scala/org/w3/banana/RDF.scala">data model for RDF</a>. The project provides five different implementations: (1) Jena and (2) Sesame, two competting Java libraries for RDF, (3) a pure Scala implementation that compiles down to JVM bytecode as well as (4) to Javascript through <a href="http://www.scala-js.org/">Scala-js</a>, and finally (5) a pure Javascript implementation bound to <a href="https://github.com/antoniogarrote/rdfstore-js">rdfstore-js</a>, again using Scala-js.</p>
Available for hire2015-02-04T00:00:00-05:00Alexandre Bertailstag:bertails.org,2015-02-04:2015/02/04/for-hire/<p>Yesterday, the whole engineering team at Pellucid got laid off. <strong>I am now looking for new adventures</strong>.</p>
<p><a href="http://bertails.org/resume">My resume is available here on my website</a>. Please contact me at <a href="mailto:alexandre@bertails.org">alexandre@bertails.org</a>.</p>
<p>I am open to relocation (almost) anywhere in the world, especially for interesting projects relying on Scala <strong>and</strong> RDF.</p>
<h2>why you should hire me</h2>
<p>I am a <strong>strong Scala developer</strong> with many years of experience and a good background in Computer Science. I love learning new skills and I get involved in <a href="https://github.com/betehess">Open Source projects</a>, leading <a href="https://github.com/w3c/banana-rdf">banana-rdf</a>. I frequently give presentations (my next talk is at <a href="http://event.scaladays.org/scaladays-sanfran-2015#eventid-6548">Scala Days San Francisco</a> in just a few weeks) and organize conferences (<a href="http://nescala.org/">nescala</a>).</p>
<p>I am a <strong>Linked Data expert</strong> who worked at the W3C closely with Director Tim Berners-Lee, <a href="http://www.w3.org/People/Berners-Lee/">inventor of the World Wide Web</a>, and gained a thorough and practical knowledge of HTTP and REST APIs. I am also the editor of two major <a href="http://www.w3.org/TR/tr-editor-all#tr_Alexandre_Bertails">Web standards: RDB2RDF Direct Mapping and Linked Data Patch Format</a>.</p>
<p>Bonus: I have a lovely French accent :-)</p>
<h2>what might make a difference</h2>
<p>My first interest is in your product and the technologies you use. I am looking for a position where I will bring my expertise to the table but also where I will be challenged.</p>
<p>Then I will look at how you work as a team. I have learned over the years how culture can shape teamwork and I am eager to discuss with you what values and attitudes you encourage and nurture in your workplace.</p>
Scala.JS will be for Javascript what Scala is for Java2015-02-01T00:00:00-05:00Alexandre Bertailstag:bertails.org,2015-02-01:2015/02/01/scala-js-prediction/<p>I am writing this article on my way back to New York, after a wonderful <a href="http://www.nescala.org/">nescala 2015</a> in Boston. Definitely a <em>grand cru</em>. One of the hot topics there was <a href="http://www.scala-js.org/">Scala.JS</a>, which is a technology we have recently started to use in <a href="https://github.com/w3c/banana-rdf">banana-rdf</a>. The various discussions and interactions I had during the conference made me realize this:</p>
<blockquote class="twitter-tweet tw-align-center" lang="en"><p>Prediction: Scala.JS will be for Javascript what Scala is for Java/JVM.</p>— Alexandre Bertails (@bertails) <a href="https://twitter.com/bertails/status/561765670019674113">February 1, 2015</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>As one could have expected when someone makes such a prediction about programming languages, this sparked an <a href="https://twitter.com/bertails/status/561765670019674113">interesting thread on Twitter</a> :-) So let me try to refine what I think the value proposition is for Scala.JS and how I base it on what happened to Scala.</p>
<p>I don’t know many people who got interested in Scala for its own merits (I am not sure I know any…). In fact, we hear many voices pointing out its quirks, and they are real, but that misses the point: I don’t think that Scala would have become as mainstream as it is today if it was not for Java. Many of us came to Scala from Java because it hit a sweet spot: <strong>1)</strong> it enables serious <strong>functional programming</strong> (no, lambdas are not enough…), <strong>2)</strong> it gives us a richer and more robust <strong>static type system</strong>, and <strong>3)</strong> it remains completely <strong>interoperable with Java</strong>. About that last point: we could code in Scala as if it were Java and easily interact with existing libraries.</p>
<p>My claim is that Scala.JS is doing something similar for Javascript, so let’s see how the previous points apply to it.</p>
<p><strong>1)</strong> Functional programming has become more prevalent in the IT industry. Developers not only know it exits, they learn its merits and are trained to practice it. Actually, we have seen this trend in Javascript itself and two examples come to my mind: imperative callbacks are being replaced by more composable <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise">Promises</a> and <a href="http://facebook.github.io/immutable-js/">immutable datastructures are now a thing</a>. Now, despite the fact that Javascript is becoming more functional, I don’t think it feels very natural yet for FP practitioners, while Scala is already offering a better solution in that area, both in the language itself and in its standard library.</p>
<p><strong>2)</strong> I would claim that functional programming becomes interesting only when you are given a way to speak <strong>statically</strong> about the things you manipulate. This is why a robust and powerful type system is so important for so many people. <a href="https://github.com/milessabin/shapeless">Scala shines in that area</a>. Look at projects like <a href="https://github.com/scala-js/scala-js-jquery">scala-js-jquery</a> and imagine how easy it becomes to write jQuery code, being guided by the types while having the compiler checking for you that you are using the library correctly.</p>
<p><strong>3)</strong> Scala is extremely versatile and captures surprisingly well Javascript’s specificities. At the language level, everything you can do in Javascript can be mapped to Scala almost 1-to-1, and Scala’s <a href="http://www.scala-js.org/api/scalajs-library/0.6.0-RC2/#scala.scalajs.js.Dynamic">dynamic compabilities</a> even let you interact with the lack of types when working with Javascript libraries. Based on my experience, writing typed facades for existing libraries is straightforward, and the main challenge is actually figuring out how to properly use the underlying libraries because there are no types to guide you.</p>
<p>Then there is the <em>obvious stuff</em>: all of a sudden, plenty of efficient immutable datastructures and libraries from the Scala world become available in the browser; tools like IDEs finally become usable with code completion and type checking; the code can be optimized because the types are statically known; and finally, Scala.JS being just Scala, it comes with a rich ecosystem and community.</p>
<blockquote class="twitter-tweet tw-align-center" data-conversation="none" lang="en"><p><a href="https://twitter.com/bertails">@bertails</a> Isn't that what Coffee was supposed to be?</p>— Robin Berjon (@robinberjon) <a href="https://twitter.com/robinberjon/status/561790252671844352">February 1, 2015</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<blockquote class="twitter-tweet tw-align-center" data-conversation="none" lang="en"><p><a href="https://twitter.com/bertails">@bertails</a> <a href="https://twitter.com/mandubian">@mandubian</a> Clojure(Script) looks to me a better candidate on top of JS than Scala. much more close (loosely typed, functional)</p>— Gaëtan Renaudeau (@greweb) <a href="https://twitter.com/greweb/status/561866804763820032">February 1, 2015</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>Just like with Java, many people will be happy to write plain Javascript for possibly quite a long time. But let’s say you disagree with one or more of my points above: you still have <a href="https://github.com/jashkenas/coffeescript/wiki/list-of-languages-that-compile-to-js">plenty of contenders to choose from</a>. But my gut feeling is that there is a huge community out there waiting for a compelling alternative that would bring the triptych functional-programming/static-typing/good-js-interop. Scala.JS just hits that sweet spot and the most exciting times are ahead!</p>
Why LD Patch2014-09-20T00:00:00-05:00Alexandre Bertailstag:bertails.org,2014-09-20:2014/09/20/why-ldpatch/<p>The <a href="http://www.w3.org/2012/ldp/">LDP Working Group</a> recently published <a href="http://www.w3.org/TR/ldpatch/">LD Patch</a>, <cite>a format for describing changes to apply to Linked Data. It is suitable for use with <a href="http://tools.ietf.org/html/rfc5789">HTTP PATCH</a>, a method to perform partial modifications to Web resources.</cite></p>
<p>After explaining the need for a PATCH format for Linked Data, I will go through all the <a href="http://www.w3.org/TR/2014/WD-ldpatch-20140918/#alternative-designs">other candidate technologies that the group considered</a>, before explaining the rationale behind LD Patch. It is fair to remind the reader that the group is still eager for feedback, and that <strong>not all the group participants would agree with the views expressed in this post</strong>.</p>
<h2 id="genesis">Genesis</h2>
<p>Despite strong interest from the group participants in a way to partially update LDP Resources with <a href="http://tools.ietf.org/html/rfc5789">HTTP PATCH</a>, settling on which format to use proved to be more difficult than expected. The group could only agree on standardising the use of PATCH over POST, and decided to wait for concrete proposals while allowing the main specification to reach completion.</p>
<p>Work on a PATCH format for LDP got on a limbo for a while, and concretely resumed during the <a href="http://www.w3.org/2012/ldp/wiki/F2F5#Day_3_-_Thursday_April_17">5th LDP face-to-face in Boston, MA</a>, where I presented all the proposals <a href="https://www.w3.org/2012/ldp/wiki/LDP_PATCH_Proposals">the group had gathered so far</a>. I had completed the implementations of both <a href="http://www.w3.org/People/Eric/">Eric</a>’s <a href="http://www.w3.org/2001/sw/wiki/SparqlPatch">SparqlPatch</a> and <a href="http://liris.cnrs.fr/~pchampin/en/">Pierre-Antoine</a>’s <a href="https://github.com/pchampin/rdfpatch">rdfpatch</a> in <a href="https://github.com/w3c/banana-rdf">banana-rdf</a> at that time. Those two proposals were for me the only two serious challengers.</p>
<h2 id="patch-format">A PATCH format for LDP</h2>
<p>Enough talking. What do we even mean by a <em>PATCH format for LDP</em>? Consider the following RDF graph:</p>
<pre><code>$ GET -S -H 'Accept: text/turtle' http://www.w3.org/People/Berners-Lee/card
200 OK
@prefix schema: <http://schema.org/> .
@prefix profile: <http://ogp.me/ns/profile#> .
@prefix ex: <http://example.org/vocab#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://www.w3.org/People/Berners-Lee/card#i> a schema:Person ;
schema:alternateName "TimBL" ;
profile:first_name "Tim" ;
profile:last_name "Berners-Lee" ;
schema:workLocation [ schema:name "W3C/MIT" ] ;
schema:performerIn _:b1, _:b2 ;
ex:preferredLanguages ( "en" "fr" ).
_:b1 schema:name "F2F5 - Linked Data Platform" ;
schema:url <https://www.w3.org/2012/ldp/wiki/F2F5> .
_:b2 a schema:Event ;
schema:name "TED 2009" ;
schema:startDate "2009-02-04" ;
schema:url <http://conferences.ted.com/TED2009/> .
</code></pre>
<p>Even if you are not well-versed in <a href="http://www.w3.org/TR/rdf11-primer/">RDF</a> and <a href="http://www.w3.org/TR/ldp/">Turtle</a>, I bet you can still understand that this piece of data is about a person named Tim Berners-Lee, identified by the URI <code><http://www.w3.org/People/Berners-Lee/card#i></code>. Also, TimBL seems to have been a participant in two events, each of them having some data attached to them. Also, do you see how those <code>_:b1</code> and <code>_:b2</code> identifiers give you more flexibility than plain JSON? They are <strong>identifiers local to this graph</strong> and are called <a href="http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-blank-nodes">blank nodes</a>.</p>
<p>Other blank nodes get handled by the Turtle syntax, as you can see if you click on the following graph for a full-size visual representation of the data:</p>
<p><a href="timbl-card.png"><img src="timbl-card.png" alt="TimBL's card" /></a></p>
<p>As a side note, let me bring your attention on the URIs being used here: they all resolve to actual documents on the Web, including the vocabularies from <a href="https://schema.org/">schema.org</a> and Facebook’s <a href="http://ogp.me/">Open Graph Protocol</a>.</p>
<p>Now, let’s imagine that TimBL wants to add some geo coordinates to the TED event.</p>
<h2 id="simple-patch">RDF Patch / TurtlePatch</h2>
<p>Here is what TimBL could do with RDF Patch:</p>
<pre><code>$ cat query.rdfp
Add _:b2 <http://schema.org/location> _:loc .
Add _:loc <http://schema.org/name> "Long Beach, California" .
Add _:loc <http://schema.org/geo> _:geo .
Add _:geo <http://schema.org/latitude> "33.7817" .
Add _:geo <http://schema.org/longitude> "-118.2054" .
$ cat query.rdfp | PATCH -S -c 'Content-Type: application/rdf-patch' http://www.w3.org/People/Berners-Lee/card
204 No Content
</code></pre>
<p>Well, this actually does not work.</p>
<p>Remember when I said that the blank node <code>_:b2</code> was a <strong>local identifier</strong> for the graph? This means that TimBL <strong>cannot refer directly</strong> to the TED event from outside the document. That would require for the server and the client to agree on a stable identifier for that blank node. That process is called <a href="http://www.w3.org/wiki/BnodeSkolemization">skolemization</a>. It brings a lot of burden on the server to manage those stable identifiers. Also, while the use of blank nodes is mostly transparent in Turtle and <a href="http://www.w3.org/TR/json-ld/">JSON-LD</a> as they are hidden in the syntax, skolemization would break the syntax.</p>
<p><a href="http://www.w3.org/2001/sw/wiki/TurtlePatch">TurtlePatch</a> has similar expressive power compared to RDF Patch, but it is defined as a subset of SPARQL Update. It also defines <a href="http://www.w3.org/2001/sw/wiki/TurtlePatch#Handling_Blank_Nodes">skolemization as being part of the protocol</a>, where the client can ask for a skolemized version of the graph, which would then be required before PATCHing.</p>
<p>Because <a href="http://www.websemanticsjournal.org/index.php/ps/article/view/365">blank nodes occur very frequently</a> and skolemization was a no-go for several participants of the group, the <a href="http://www.w3.org/2013/meeting/ldp/2014-04-17#line0243">results</a> of one of the strawpolls we had on this subject were welcomed with surprise:</p>
<blockquote>
<p><strong>STRAWPOLL:</strong> I’d rather have a solution that (a) doesn’t address certain pathological graphs, or (b) requires the server to maintain Skolemization maps</p>
</blockquote>
<p>The participants were largely in favor of (a), while (b) had basically no support. Knowing that, the group could now focus on alternative proposals, such as SparqlPatch.</p>
<h2 id="sparqlpatch">SparqlPatch</h2>
<p><a href="http://www.w3.org/2001/sw/wiki/SparqlPatch">SparqlPatch</a> was proposed by <a href="http://www.w3.org/People/Eric/">Eric Prud’hommeaux</a>, one of the editors for the <a href="http://www.w3.org/TR/sparql11-query/">SPARQL query language</a>. SparqlPatch is a profile for SPARQL Update, as it is defined as a subset of it: a valid SparqlPatch query will always be a valid SPARQL Update query, sharing the same semantics.</p>
<p>Why not full SPARQL Update? Well, SPARQL Update comes with a complex machinery for matching nodes in a graph store. Complexity is never a bad thing when it is justified, which is the case for most SPARQL applications. But it is definitely overkill in the context of LDP, hence Eric’s proposal.</p>
<p>With SparqlPatch, TimBL would be able to update his profile using the following query:</p>
<pre><code>$ cat query.sparql-patch
PREFIX schema: <http://schema.org/>
INSERT {
?ted <http://schema.org/location> _:loc .
_:loc <http://schema.org/name> "Long Beach, California" .
_:loc <http://schema.org/geo> _:geo .
_:geo <http://schema.org/latitude> "33.7817" .
_:geo <http://schema.org/longitude> "-118.2054" .
}
WHERE {
?ted schema:url <http://conferences.ted.com/TED2009/>
}
$ cat query.sparql-patch | PATCH -S -c 'Content-Type: text/sparqlpatch' http://www.w3.org/People/Berners-Lee/card
204 No Content
</code></pre>
<p>The <code>WHERE</code> clause binds the variable <code>?ted</code> to the node that satisfies the <code>schema:url</code> constraint, and that variable can now be used to <code>INSERT</code> new triples.</p>
<p>This is definitely better and worth considering, as we now have a way to PATCH graphs with blank nodes. But this is still not perfect…</p>
<p>The runtime complexity for matching nodes in a graph is known to be extremely bad in some cases. While SparqlPatch is better that SPARQL Update in that regard, there are still some issues, which become apparent only when you start implementing and thinking about the runtime semantics. The main data structure in the SPARQL semantics is the <a href="http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#sparqlSolutions">Solution Mapping</a>, which keeps track of which concrete nodes from the graph can be mapped to which variables, applying to each clause in the <code>WHERE</code> statement. So the <a href="http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#BGPsparql">semantics of the Basic Graph Pattern</a> (ie. all the clauses in the SparqlPatch’s <code>WHERE</code>) involves a lot of costly cartesian products.</p>
<p>Also, it would be nice to change the evaluation semantics of the Basic Graph Pattern such that the evaluation order of the clauses is <strong>exactly</strong> the one from the query. It makes a lot of sense to let the client have some control over the evaluation order in the context of a PATCH.</p>
<p><span id="confusing-semantics">SPARQL Update can also be confusing</span> in that <strong>if a graph pattern doesn’t match anything, the query still succeeds with no effect on the graph</strong>. I have seen many engineers get puzzled by this (perfectly well defined) behaviour, because they were expecting the query to fail: this would happen every time a predicate gets typoed. I am jumping a bit ahead but that is one reason why <strong>LD Patch cannot be compiled down to SPARQL Update while preserving the semantics</strong>.</p>
<p>Finally, SparqlPatch has no support for <code>rdf:list</code>s. On one hand, SPARQL is heavily triple-focused and has never played very well with <code>rdf:list</code>. List matching improved in SPARQL 1.1 with <a href="http://www.w3.org/TR/sparql11-query/#propertypaths">Property Paths</a> but their support is not native, in that <strong>common operations such as slice manipulation, update, or even a simple append, need to be encoded in the query</strong>.</p>
<p>On the other hand, lists are a common data structure in all applications. They come with native support in syntaxes like Turtle or JSON-LD. Append is a very common operation and the user should not have to think about the <a href="http://www.w3.org/2006/Talks/0524-Edinburgh-IH/#(54)">RDF list encoding</a> for such simple operations.</p>
<p>Limited node matching capabilities and native <code>rdf:list</code> support are two <strong>features</strong> of LD Patch.</p>
<h2 id="ld-patch">LD Patch</h2>
<p>LD Patch was originally proposed by <a href="http://liris.cnrs.fr/~pchampin/en/">Pierre-Antoine Champin</a>. The format described in the <a href="http://www.w3.org/TR/2014/WD-ldpatch-20140918/">First Public Working Draft</a> is very close to his original proposal. I became an editor for the specification to make some syntactical enhancements and to make sure that we could provide a <a href="https://github.com/w3c/banana-rdf/blob/2fb79a94c9cb52201daab4bc8608ea819706b5c1/ldpatch/src/main/scala/Semantics.scala#L13">clean formal semantics</a> for it.</p>
<p>Pierre-Antoine maintains a Python implementation. On my side, I have a Scala implementation working with <a href="https://jena.apache.org/">Jena</a>, <a href="http://www.openrdf.org/">Sesame</a>, and plain Scala. <a href="https://deiu.rww.io/profile/card#me">Andrei Sambra</a>, the third editor, is working on <a href="http://golang.org/">Go</a> and Javascript implementations.</p>
<p>A potential drawback for LD Patch is that some RDF graphs cannot be patched. They are deemed <a href="http://www.w3.org/TR/ldpatch/#pathological-graph">pathological</a> and are <a href="http://www.websemanticsjournal.org/index.php/ps/article/view/365">very rare in practice</a>: Linked Data applications should never be concerned. This may not be true for some SPARQL applications, but this is not our use-case here.</p>
<p>Let’s see what TimBL’s query would look like using LD Patch:</p>
<pre><code>$ cat query.ld-patch
@prefix schema: <http://schema.org/> .
Bind ?ted <http://conferences.ted.com/TED2009/> / ^schema:url .
Add ?ted schema:location _:loc .
Add _:loc schema:name "Long Beach, California" .
Add _:loc schema:geo _:geo .
Add _:geo schema:latitude "33.7817" .
Add _:geo schema:longitude "-118.2054" .
$ cat query.ld-patch | PATCH -S -c 'Content-Type: text/ldpatch' http://www.w3.org/People/Berners-Lee/card
204 No Content
</code></pre>
<p>Unlike SparqlPatch, the <code>Bind</code> statement does not operate on triples. Instead, an <a href="http://www.w3.org/TR/ldpatch/#path-expression">LD Path expression</a> (<code>/ ^schema:url</code>) is evaluated against a concrete starting node (<code><http://conferences.ted.com/TED2009/></code>). The result node gets bound to a variable (<code>?ted</code>) which can then be used in the following statements. That is the main difference when compared to SparqlPatch semantics.</p>
<p><span id="similarities">Note</span>: LD Path expressions are very similar to the <a href="http://tools.ietf.org/html/rfc6901">JSON Pointers</a> used in <a href="http://tools.ietf.org/html/rfc6902">JSON Patch</a>, and to the <a href="http://tools.ietf.org/html/rfc5261#ref-W3C.REC-xpath-19991116">XPath selectors</a> used in <a href="http://tools.ietf.org/html/rfc5261">XML Patch</a>.</p>
<p>The runtime semantics for LD Path expressions only rely on a node set. The final set must have a unique value to successfuly be bound to the variable, <strong>otherwise it results in an error</strong>. A path expression is processed from left to right, and can have nested paths for filtering nodes.</p>
<p>Given that semantics, you can imagine that it is 1. easy to reason about, 2. <a href="https://github.com/w3c/banana-rdf/blob/2fb79a94c9cb52201daab4bc8608ea819706b5c1/ldpatch/src/main/scala/Semantics.scala#L158-L192">easy to implement</a>, and 3. very efficient. I would even argue that you cannot remove functionalities from the path expressions without throwing away a whole class of interesting RDF graphs that LD Patch is able to patch.</p>
<p>Writing a parser for LD Patch proved to be of similar difficulty than for SparqlPatch, as they share most of their respective grammars with Turtle. Most of the code for the engine itself actually lies in the support for <code>rdf:list</code>, which basically encodes what users would have to do in their queries if they didn’t have native support for list manipulations. So this ends up being done in one place, once and for all, and that is indeed a very good thing.</p>
<p>The <code>UpdateList</code> operation is very similar to <a href="https://docs.python.org/3/reference/expressions.html#slicings">how slicing is done in Python</a>. I invite you to read the <a href="http://www.w3.org/TR/ldpatch/#update-list-statement">corresponding section in the specification</a> for more examples. LD Patch slicing is very intuitive and so far it has met no resistance in the Working Group.</p>
<h2 id="subjectivity">Subjectivity</h2>
<p>It took a very long time before the group was able to publish LD Patch. I still regret that <em>any opportunity</em> would be taken by few people to challenge the whole technology, often without even providing which requirements they would like to address.</p>
<p>For example, the main criticism seems to be about the syntax. Yes, it is a new one, even though 68% of the grammar is shared with Turtle. In particular, it is different from the SPARQL Update syntax. But apparently, it doesn’t matter to some folks if the semantics are not the same.</p>
<p>I have many, many times given my list of requirements (it is not only mine: those requirements are of course shared by others) on the LDP mailing list but somehow, they were never really challenged, and the arguments about syntax keep coming back. So for the record, here <span id="requirements">they</span> are:</p>
<ul>
<li>the context is Linked Data, and especially the Linked Data Platform</li>
<li>bare minimum for LDP Resource diff, that is, no high-level features</li>
<li>support for blank nodes, but pathological graphs are ok</li>
<li>no skolemization</li>
<li>first-class citizen <code>rdf:list</code> manipulations</li>
<li>reasonable runtime complexity</li>
<li>easy to implement <strong>without</strong> the need for an existing SPARQL Update implementation</li>
<li>not being able to bind a node is a failure</li>
<li>being a reasonable alternative for the <a href="https://web-payments.org/specs/source/identity-credentials/#h2_accessing-the-identity">JSON-LD folks using JSON Patch</a>, because they don’t have better</li>
</ul>
<p>If you want to make counter proposals, please make sure that those requirements are addressed. Also, you should accept the fact that if you have a different set of requirements, then LD Patch is probably not what you want. Finally, if you think that the above requirements are <strong>wrong in the context of LDP</strong>, then you should make an official complaint to the group explaining your reasoning.</p>
<p>I would like to emphasize that <strong>relying on an existing syntax (such as SPARQL) was never a requirement for me</strong>. While reusing bits of SPARQL Update in LD Patch whenever it makes sense is reasonable, it should be done sparingly. For example, <a href="http://lists.w3.org/Archives/Public/public-ldp-wg/2014Jul/thread.html#msg81">I argued on the LDP mailing list</a> that shared syntax with different (runtime) semantics could break some user expectations.</p>
<h2 id="faq">Frequently Asked Questions</h2>
<p><span id="dbooth-questions">Thanks to <a href="http://dbooth.org/">David Booth</a></span> for <a href="http://lists.w3.org/Archives/Public/public-ldp/2014Sep/0014.html">providing me with well formulated questions and concerns</a>. Here are some answers. They only complete the arguments in the other sections of this post.</p>
<p><em>Are there any concerns about inventing a new syntax?</em> What if SPARQL, or a profile of it, could <strong>not</strong> address <a href="#requirements">all the requirements</a>? What if a subset of the syntax was no longer aligned with the superset semantics?</p>
<p><em>Isn’t this yet another syntax similar to SPARQL, which ends up confusing newcomers?</em> Of course there are similar: exactly 68% of the grammar rules for LD Patch are directly coming from Turtle, and SPARQL made a similar choice.</p>
<p><em>Would using a single language decrease development and maintenance costs?</em> I would like to see actual evidence of that claim. Some people actually have <a href="http://martinfowler.com/bliki/OneLanguage.html">a more nuanced opinion</a> on that subject, and I tend to agree as I find myself using the language/framework that I find the most fit to a given job.</p>
<p><em>Can implementers simply plug in an existing general-purpose SPARQL engine to get a new system up and running quickly?</em> Not so easy. You still need to reject the valid SPARQL Update queries that are not valid LD Patch queries. And you can be sure that I will make sure that the test suite has tests for that :-) And because I have done it, I can claim that unlike full SPARQL, LD Patch is quick and easy to implement.</p>
<p><em>Would implementers have the option of supporting additional SPARQL 1.1 Update operations?</em> There is definitely a use-case for querying data in LDP Containers using SPARQL, or using a <a href="http://blog.pellucid.com/post/95282190715/exposing-resources-in-datomic-using-linked-data">more ad-hoc query language with support for ordering, filtering, and aggregation</a>. And it is true that bulk updating could be addressed with SPARQL Update. But those use-cases are different from PATCH.</p>
<h2 id="next-steps">Next steps</h2>
<p>The First Public Working Draft just got published. As expected, the document is getting reviewed by experts, who have already started to provide feedback to the group.</p>
<p>In the meantime, the editors are working on completing the semantics section of the document. A proposed approach was to provide a translation from LD Patch to SPARQL Update. While this is definitely useful for people with a SPARQL background, <a href="#confusing-semantics">this cannot be used as a formal semantics</a>. We are trying to find a good trade-off between the usual tooling from formal semantics theory, and something that could be read by people without such a theoretical background.</p>
<p>And finally, after the specification gets completed, we will focus on providing a test suite. The plan is to make it part of the <a href="https://github.com/w3c/ldp-testsuite">LDP one</a>.</p>
<p>That’s all folks. (and thanks Andrei for reviewing drafts for this post)</p>
Finally my own blog2014-09-16T00:00:00-05:00Alexandre Bertailstag:bertails.org,2014-09-16:2014/09/16/finally-my-own-blog/<p>I have finally found the time and the motivation to put together my own blog \o/ I had actually planned to do so for about 10 years, basically since I own <code>bertails.org</code>… It is not completely ready yet but I prefer to release it now and work out the issues later. Otherwise it would never happen.</p>
<p>Sooo, how does this work? I wanted something as easy to use as possible. So I have settled on <a href="http://docs.getpelican.com/">Pelican</a>. At least for now. As I don’t want to pollute my environment with Python dependencies, I am using <a href="https://www.docker.com/">Docker</a> to generate the static version of this website. The <a href="https://github.com/betehess/my-pelican/">project</a> started as a clone of <a href="https://github.com/jderuere/docker-pelican">https://github.com/jderuere/docker-pelican</a> but I quickly rewrote everything, including the <a href="https://github.com/betehess/my-pelican/blob/master/Dockerfile">Dockerfile</a>. I run Pelican within the container but against the mounted <code>website</code> directory, and I propagate my user from the host to avoid right issues (Docker uses root by default). So I do something like</p>
<pre><code class="language-bash">docker run --name=pelican -d -v `pwd`/website:/srv/pelican-website \
-p 8000:8000 betehess/pelican
</code></pre>
<p>The theme is directly based on <a href="http://paulrouget.com/">Paul Rouget’s website</a>, with few adaptations. Most important ones are the fixed font (<a href="https://www.google.com/fonts/specimen/Ubuntu+Mono">Ubuntu Mono</a>) and the greenish colour for the links. I only use 2 templates from Pelican: <code>index.html</code> and <code>article.html</code>. The blog now becomes the main entry point for <a href="http://bertails.org">http://bertails.org</a>. The previous index page has moved to <a href="http://bertails.org/alex">http://bertails.org/alex</a> as I intend to use <a href="http://bertails.org/alex#me">http://bertails.org/alex#me</a> as my <a href="http://www.w3.org/wiki/WebID">WebID</a>.</p>
<p>My mugshot <a href="https://www.flickr.com/photos/amyvdh/5837280596/">was taken by</a> my friend and ex <a href="http://www.w3.org">W3C</a> colleague <a href="https://twitter.com/amyvdh">Amy van der Hiel</a>. There are very few pictures of me on the Web :-) I cannot remember where the font icons are coming from though :-/</p>
<p>There are no comments at the moment. The reason is that I couldn’t find anything that I liked. I have the markup and the CSS ready though. So it should land here in just a few weeks.</p>
<p>What will you find on this blog? Mainly articles about <a href="http://www.scala-lang.org/">Scala</a> and <a href="http://en.wikipedia.org/wiki/Linked_data">Linked Data</a>. I will maintain RSS feeds for those subjects when the time comes.</p>
<p>Stay tuned!</p>