An RDF library for Haskell

1 Querying the semantic web with Haskell

haskell-sweb.png

  • Resource description framework (RDF): data interchange on the web
  • Web data interoperability
  • URI names relationships between things
  • RDF graphs are directed, labelled graphs where edge names links between nodes

2 Functionally programming the semantic web

Implementing simple RDF & SPARQL libraries in Haskell

  • API design for rdf4h and hsparql
  • Library tests does it do the right thing?
  • Graph implementation performance /graph implemantation tradeoffs?
  • Real world semantic web programming examples

3 Semantic web stack

Semantic_web_stack.svg.png

4 Semantic web stack

Semantic_web_stack-haskell.svg.png

5 rdf4h: a simple semantic web RDF library

  • RDF parsing and serialisation
  • RDF graph querying
  • Multiple graph implementations

RDF_example.png

6 API overview

  • Haskell types for RDF nodes and graphs
  • Querying graphs using (s,p,o) values
  • Adding and removing triples
  • Parsing from files and web URLs
  • Serialising graphs in RDF formats

7 RDF types

data Node = BNode Text 
          | BNodeGen Int
          | LNode LValue
          | UNode Text -- e.g. "http://www.example.com/rob"

data LValue = PlainL Text       -- "Family Guy"
            | PlainLL Text Text -- "Family Guy@en"
            | TypedL  Text Text -- "13"^^xsd:int

type Subject   = Node
type Predicate = Node
type Object    = Node
data Triple    = Triple Subject Predicate Object

plainL  :: Text -> LValue
plainLL :: Text -> Text -> LValue
unode   :: Text -> Node
bnode   :: Text ->  Node
lnode   :: LValue ->  Node

-- sanity checks, e.g. subject cannot be a literal
triple  :: Subject -> Predicate -> Object -> Triple

8 Modifying RDF graphs

addTriple    :: (Rdf a) => RDF a -> Triple -> RDF a
removeTriple :: (Rdf a) => RDF a -> Triple -> RDF a
> triplesOf gr1
[Triple (UNode "http://example.com/rob_stewart")
        (UNode "http://xmlns.com/foaf/0.1/interest")
        (UNode "http://dbpedia.org/resource/Haskell_(programming_language)")]

> let s = unode "http://example.com/rob_stewart"
> let p = unode "http://xmlns.com/foaf/0.1/interest"
> let o = unode "http://dbpedia.org/resource/Semantic_Web"
> let t = triple s p o
> let gr2 = addTriple gr1 t

> triplesOf gr2
[Triple (UNode "http://example.com/rob_stewart")
        (UNode "http://xmlns.com/foaf/0.1/interest")
        (UNode "http://dbpedia.org/resource/Semantic_Web")
,Triple (UNode "http://example.com/rob_stewart")
        (UNode "http://xmlns.com/foaf/0.1/interest")
        (UNode "http://dbpedia.org/resource/Haskell_(programming_language)")]

9 Querying RDF graphs

query :: (Rdf a) => RDF a -> Maybe Node -> Maybe Node -> Maybe Node -> Triples

> map objectOf $
   query gr2
     (Just (unode ("http://example.com/rob_stewart")))     -- subject
     (Just (unode ("http://xmlns.com/foaf/0.1/interest"))) -- predicate
     Nothing -- object

[ UNode "http://dbpedia.org/resource/Semantic_Web"
, UNode "http://dbpedia.org/resource/Haskell_(programming_language)"]

select :: (Rdf a) => RDF a -> NodeSelector -> NodeSelector -> NodeSelector -> Triples

> let objSelector = Just
    (\node -> case node of
         (UNode uri) -> Data.Text.isInfixOf "Haskell" uri;
         _           -> False)
> select gr2 Nothing Nothing selector
[Triple (UNode "http://example.com/rob_stewart")
        (UNode "http://xmlns.com/foaf/0.1/interest")
        (UNode "http://dbpedia.org/resource/Haskell_(programming_language)")]

10 Real world example: ESWC 2015

Returns full names of members on the ESWC 2015 conference programme committee.

eswcCommitteeURI, heldByProp :: T.Text
eswcCommitteeURI = "http://data.semanticweb.org/conference/eswc/2015/program-committee-member"
heldByProp       = "swc:heldBy"

eswcCommitteeMembers :: RDF TList -> [T.Text]
eswcCommitteeMembers graph =
  let triples = query graph (Just (unode eswcCommitteeURI)) (Just (unode heldByProp)) Nothing
      memberURIs = map objectOf triples
  in map
     (\memberURI ->
              let (LNode (PlainL firstName)) =
                    objectOf $ head $ query graph (Just memberURI) (Just (unode "foaf:firstName")) Nothing
                  (LNode (PlainL lastName))  =
                    objectOf $ head $ query graph (Just memberURI) (Just (unode "foaf:lastName")) Nothing
              in (T.append firstName (T.append (T.pack  " ") lastName)))
     memberURIs

11 ESWC 2015 committee members

main :: IO ()
main = do
  result <- parseURL
    (XmlParser Nothing Nothing)
    "http://data.semanticweb.org/dumps/conferences/eswc-2015-complete.rdf"
  case result of
    Left err -> error "Unable to parse RDF content from that URL"
    Right rdfGraph -> do
      let eswcMemberNames = eswcCommitteeMembers rdfGraph
      mapM_ (putStrLn . T.unpack) eswcMemberNames

> main
Vadim Ermolayev
Karl Aberer
Giorgos Stoilos
Birgitta König-Ries
Giuseppe Rizzo
...

12 Parsing RDF

parsing-triples.png

Serialisation Reading Writing
NTriples yes yes
Turtle yes yes
RDF/XML yes no

13 Parsing RDF

Parsing

class RdfParser p where
  parseString :: (Rdf a) => p -> T.Text -> Either ParseFailure (RDF a)
  parseFile   :: (Rdf a) => p -> String -> IO (Either ParseFailure (RDF a))
  parseURL    :: (Rdf a) => p -> String -> IO (Either ParseFailure (RDF a))

-- usage example
readRdf rdfFileame =
  parseFile (XmlParser Nothing Nothing) rdfFilename

Serialising

class RdfSerializer s where
  hWriteRdf :: (Rdf a) => s -> Handle -> RDF a -> IO ()

-- usage example
saveRdf rdfFilename rdfGraph =
  withFile rdfFilename WriteMode (\h -> hWriteRdf NTriplesSerializer h rdfGraph)

14 Implementing the RDF parsers

W3C provides BNF rules, and docs explaining how to modify parser state.

turtle-bnf.jpg

15 Parser combinators for BNF grammar rules

The library uses parsec, a Haskell parser combinator library.

The Turtle parser:

parseFile :: (Rdf a) => Maybe BaseUrl -> Maybe T.Text -> String -> IO (Either ParseFailure (RDF a))
parseFile bUrl docUrl fpath = do
  TIO.readFile fpath >>= \bstr -> return $
     handleResult bUrl (runParser t_turtleDoc initialState (maybe "" T.unpack docUrl) bstr)
  where initialState = (bUrl, docUrl, 1, PrefixMappings Map.empty, [], [], [], [], False, Seq.empty,Map.empty)

-- grammar rule: [1] turtleDoc
t_turtleDoc :: GenParser ParseState (Seq Triple, PrefixMappings)
t_turtleDoc = ...

-- [7] predicateObjectList ::= verb objectList (';' (verb objectList)?)*
t_predicateObjectList :: GenParser ParseState ()
t_predicateObjectList = do
  void (sepEndBy1
        (optional (try (do { t_verb
                           ; many1 t_ws
                           ; t_objectList
                           ; void popPred})))
        (try (many t_ws >> char ';' >> many t_ws)))

16 Rdf type class

Functions optimised for each RDF graph implementation:

class Rdf a where
  baseUrl           :: RDF a -> Maybe BaseUrl
  prefixMappings    :: RDF a -> PrefixMappings
  addPrefixMappings :: RDF a -> PrefixMappings -> Bool -> RDF a
  empty             :: RDF a
  mkRdf             :: Triples -> Maybe BaseUrl -> PrefixMappings -> RDF a
  addTriple         :: RDF a -> Triple -> RDF a
  removeTriple      :: RDF a -> Triple -> RDF a
  triplesOf         :: RDF a -> Triples
  uniqTriplesOf     :: RDF a -> Triples
  select            :: RDF a -> NodeSelector -> NodeSelector -> NodeSelector -> Triples
  query             :: RDF a -> Maybe Node -> Maybe Node -> Maybe Node -> Triples
  showGraph         :: RDF a -> String

Other utility query functions available in Data.RDF.Query

17 Graph implementations

Exposed as RDF type family instances

data family RDF a

-- function provided a (s,p,o) triples based RDF graph.
foo :: RDF TList -> Bool
foo rdfGraph = ...

-- function provided a hash based adjacency map RDF graph.
bar :: RDF AdjHashMap -> [Triple]
bar rdfGraph = ...

18 List based graph implementation

RDF type family instances for graph implementations

-- List based graph implementation
data instance RDF TList = TListC (Triples, Maybe BaseUrl, PrefixMappings)

-- Simple graph creation
mkRdf :: Triples -> Maybe BaseUrl -> PrefixMappings -> RDF TList
mkRdf ts baseURL pms = TListC (ts, baseURL, pms)

Places triples list into TListC constructor, i.e. no computation in mkRdf for TList.

19 Adjacency map based graph implementation

-- Adjacency map implementation
data instance RDF AdjHashMap = AdjHashMap (TMaps, Maybe BaseUrl, PrefixMappings)
type TMaps       = (TMap, TMap)
type TMap        = HashMap Node AdjacencyMap
type Adjacencies = HashSet Node

mkRdf :: Triples -> Maybe BaseUrl -> PrefixMappings -> RDF AdjHashMap
mkRdf ts baseURL pms = AdjHashMap (mergeTs (HashMap.empty, HashMap.empty) ts, baseURL, pms)

-- 20 lines of HashMap and HashSet API calls 
mergeTs :: TMaps -> [Triple] -> TMaps
mergeTs = ...

Computation needed to map triples list to graph implementation.

20 Adjacency map example

-- Triples in the graph
s: "http://example.com/rob_stewart", p: "http://xmlns.com/foaf/0.1/interest", o: "http://dbpedia.org/resource/Semantic_Web"
s: "http://example.com/rob_stewart", p: "http://xmlns.com/foaf/0.1/interest", o: "http://dbpedia.org/resource/Haskell_(programming_language)"

type TMaps       = (TMap, TMap) -- for (SPO map, OPS map)
type TMap        = HashMap Node AdjacencyMap
type Adjacencies = HashSet Node

-- SPO map
k: -1527025807618695980 -> v: (k: -2021146143382594279 -> v:[3091419593178925190,-8705743210846359529])

-- OPS map
k: -8705743210846359529 -> v: (k: -2021146143382594279 -> v:[-1527025807618695980])
k: 3091419593178925190  -> v: (k: -2021146143382594279 -> v:[-1527025807618695980])
URI Hash
"http://example.com/rob_stewart" -1527025807618695980
"http://xmlns.com/foaf/0.1/interest" -2021146143382594279
"http://dbpedia.org/resource/Semantic_Web" 3091419593178925190
"http://dbpedia.org/resource/Haskell_(programming_language)" -8705743210846359529

21 RDF graph implementation benchmarks

criterion-scaled.jpg

  • query AdjHashMap faster than TList
  • select TList faster than AdjHashMap
  • addTriples & removeTriples TList much faster than AdjHashMap

See: http://robstewart57.github.io/rdf4h/rdf4h-bench-13112016.html

22 Testing the library

  • Property based API tests using QuickCheck
    • does the rdf4h API perform as expected?
    • generate random tests and run them
  • Unit tests for RDF parsers using HUnit

23 RDF W3C parser tests

In manifest.ttl

<#prefixed_IRI_predicate> rdf:type rdft:TestTurtleEval ;
   mf:name      "prefixed_IRI_predicate" ;
   rdfs:comment "prefixed IRI predicate" ;
   rdft:approval rdft:Approved ;
   mf:action    <prefixed_IRI_predicate.ttl> ;
   mf:result    <IRI_spo.nt> ;
   .

Turtle to be parsed in prefixed_IRI_predicate.ttl

@prefix p: <http://a.example/>.
<http://a.example/s> p:p <http://a.example/o> .

Expected NTriples output in IRI_spo.nt

<http://a.example/s> <http://a.example/p> <http://a.example/o> .

24 Property based API tests

The property based approach:

  1. Define valid possible random RDF values
  2. Define properties that must hold
  3. QuickCheck generates 100's of RDF triples and graphs
  4. Checks properties against randomly generated data

25 Arbitrary RDF instances

  • Provide QuickCheck with valid RDF values
instance Arbitrary BaseUrl where
  arbitrary = oneof $ map
    (return . BaseUrl . T.pack)
    ["http://example.com/a", "http://asdf.org/b", "http://asdf.org/c"]

instance Arbitrary PrefixMappings where
  arbitrary = oneof
    [ return $ PrefixMappings Map.empty
    , return $ PrefixMappings $ Map.fromAscList
        [ (T.pack "eg1", T.pack "http://example.com/1")
        , (T.pack "eg2", T.pack "http://example.com/2")
        , (T.pack "eg3", T.pack "http://example.com/3") ]
    ]

arbitraryS, arbitraryP, arbitraryO :: Gen Node
arbitraryS = oneof $ map return $ unodes ++ bnodes
arbitraryP = oneof $ map return unodes
arbitraryO = oneof $ map return $ unodes ++ bnodes ++ lnodes

instance Arbitrary Triple where
  arbitrary = liftM3 triple arbitraryS arbitraryP arbitraryO

26 QuickCheck properties

  1. Add randomly generated list of triples to an empty graph with addTriple, 1 by 1.
  2. Remove triples from the populated graph with removeTriple, 1 by 1.
  3. Check that triplesOf returns [] for the emptied graph.
p_add_then_remove_triples
  :: (Rdf rdf)
  => RDF rdf -- an empty graph
  -> Triples -- triples to add and then remove
  -> Bool    -- should be True
p_add_then_remove_triples _empty generatedTriples =
  let emptyGraph = _empty
      populatedGraph = foldr (\triple gr -> addTriple gr triple)    emptyGraph     generatedTriples
      emptiedGraph   = foldr (\triple gr -> removeTriple gr triple) populatedGraph generatedTriples
  in triplesOf emptiedGraph == []

27 Test results

$ stack test --test-arguments="--pattern property-tests"
...
      remove_triple_from_singleton_graph_query_o: OK
        +++ OK, passed 100 tests.
      p_add_then_remove_triples:                  OK (0.01s)
        +++ OK, passed 100 tests.

All 56 tests passed (1.54s)
$ stack test --test-arguments="--pattern parser-w3c-tests"
...
      literal_with_UTF8_boundaries:                                       OK
      langtagged_string:                                                  OK
      lantag_with_subtag:                                                 OK
      minimal_whitespace:                                                 OK

64 out of 521 tests failed (1.19s)
  • Development priority: W3C unit test compliance for Turtle and RDF/XML parsers.

28 A SPARQL DSL inside Haskell

  • Query generation
  • SPARQL query execution
type EndPoint = String
data BindingValue = Bound Node | Unbound

-- SPARQL SELECT returns bound values
selectQuery :: EndPoint -> Query SelectQuery -> IO (Maybe [[BindingValue]])

-- SPARQL CONSTRUCT returns RDF graph
constructQuery :: Rdf a => EndPoint -> Query ConstructQuery -> IO (RDF a)

-- SPARQL ASK
askQuery :: EndPoint -> Query AskQuery -> IO Bool

-- SPARQL UPDATE
updateQuery :: EndPoint -> Query UpdateQuery -> IO Bool

-- SPARQL DESCRIBE
describeQuery :: Rdf a => EndPoint -> Query DescribeQuery -> IO (RDF a)

29 SPARQL query generation

data ConstructQuery = ConstructQuery { queryConstructs :: [Pattern] }
data AskQuery       = AskQuery       { queryAsk :: [Pattern] }
data UpdateQuery    = UpdateQuery    { queryUpdate :: [Pattern] }
data SelectQuery    = SelectQuery    { queryVars :: [Variable] }
data DescribeQuery  = DescribeQuery  { queryDescribe :: IRIRef }

-- Permit variables and values for 'triple' and similar functions
class TermLike a where ...

prefix :: Text -> IRIRef -> Query Prefix

-- Some examples...

-- query actions
triple :: (TermLike a, TermLike b, TermLike c) => a -> b -> c -> Query Pattern
union  :: Query a -> Query b -> Query Pattern

-- limit handling
limit :: Int -> Query Limit

-- order handling
orderNextAsc :: TermLike a => a -> Query ()

-- relations
(.<.) :: (TermLike a, TermLike b) => a -> b -> Expr

30 SPARQL SELECT example

Web browsers according to Wikipedia:

simpleSelect :: Query SelectQuery
simpleSelect = do
    resource <- prefix "dbprop"  (iriRef "http://dbpedia.org/resource/")
    dbpprop  <- prefix "dbpedia" (iriRef "http://dbpedia.org/property/")
    foaf     <- prefix "foaf"    (iriRef "http://xmlns.com/foaf/0.1/")
    x    <- var
    name <- var
    triple x (dbpprop .:. "genre") (resource .:. "Web_browser")
    triple x (foaf .:. "name") name
    return SelectQuery { queryVars = [name] }

main = do
  -- get a list of bound variables
  (Just boundValues) <- selectQuery "http://dbpedia.org/sparql" simpleSelect
  mapM_ print boundValues 

> main
[[Bound (LNode (PlainLL "Amaya" "en"))]
,[Bound (LNode (PlainLL "Sleipnir" "en"))]
,[Bound (LNode (PlainLL "Midori" "en"))]
....

31 SPARQL CONSTRUCT example

Web browsers according to Wikipedia, as triples:

simpleConstruct :: Query ConstructQuery
simpleConstruct = do
    resource <- prefix "dbpedia" (iriRef "http://dbpedia.org/resource/")
    dbpprop  <- prefix "dbprop"  (iriRef "http://dbpedia.org/property/")
    foaf     <- prefix "foaf"    (iriRef "http://xmlns.com/foaf/0.1/")
    example  <- prefix "example" (iriRef "http://www.example.com/")
    x    <- var
    name <- var
    construct <- constructTriple x (example .:. "hasName") name
    triple x (dbpprop .:. "genre") (resource .:. "Web_browser")
    triple x (foaf .:. "name") name
    return ConstructQuery { queryConstructs = [construct] }

main = do
  (rdfGraph :: RDF TList) <- constructQuery "http://dbpedia.org/sparql" simpleConstruct
  mapM_ print (triplesOf rdfGraph)

> main
Triple (UNode "http://dbpedia.org/resource/Amaya_(web_editor)") (UNode "http://www.example.com/hasName") (LNode (PlainLL "Amaya" "en"))
Triple (UNode "http://dbpedia.org/resource/Sleipnir_(web_browser)") (UNode "http://www.example.com/hasName") (LNode (PlainLL "Sleipnir" "en"))
Triple (UNode "http://dbpedia.org/resource/Midori_(web_browser)") (UNode "http://www.example.com/hasName") (LNode (PlainLL "Midori" "en"))
...

32 Using the libraries

Both compatible with cabal and stack tools.

Download libraries manually, e.g.:

$ cabal install rdf4h

Or depend on them in applications in <projectName>.cabal

Build-depends: rdf4h >= 3.0.1

An in your stack.yaml

extra-deps:
- rdf4h-3.0.1

33 Finding the libraries

34 Future work

  • 100% compliance with W3C parser tests for Turtle and RDF/XML
  • New high performance RDF graph implementations
  • Parser performance improvements, e.g. ticket 35
  • Criterion profile guided API performance optimisations
  • Support for RDF quads
  • Support for JSON RDF serialisation ticket 34
  • RDFS/OWL inference engines above rdf4h

Date: November 14, 2016

Author: Rob Stewart

Created: 2016-11-15 Tue 22:33

Validate