An RDF library for Haskell
1 Querying the semantic web with Haskell
- Resource description framework (RDF): data interchange on the web
- Web data interoperability
- URI names relationships between things
- RDF graphs are directed, labelled graphs where edge names links between nodes
2 Functionally programming the semantic web
Implementing simple RDF & SPARQL libraries in Haskell
- API design for
rdf4h
andhsparql
- Library tests does it do the right thing?
- Graph implementation performance /graph implemantation tradeoffs?
- Real world semantic web programming examples
3 Semantic web stack
4 Semantic web stack
5 rdf4h: a simple semantic web RDF library
- RDF parsing and serialisation
- RDF graph querying
- Multiple graph implementations
6 API overview
- Haskell types for RDF nodes and graphs
- Querying graphs using (s,p,o) values
- Adding and removing triples
- Parsing from files and web URLs
- Serialising graphs in RDF formats
7 RDF types
data Node = BNode Text | BNodeGen Int | LNode LValue | UNode Text -- e.g. "http://www.example.com/rob" data LValue = PlainL Text -- "Family Guy" | PlainLL Text Text -- "Family Guy@en" | TypedL Text Text -- "13"^^xsd:int type Subject = Node type Predicate = Node type Object = Node data Triple = Triple Subject Predicate Object plainL :: Text -> LValue plainLL :: Text -> Text -> LValue unode :: Text -> Node bnode :: Text -> Node lnode :: LValue -> Node -- sanity checks, e.g. subject cannot be a literal triple :: Subject -> Predicate -> Object -> Triple
8 Modifying RDF graphs
addTriple :: (Rdf a) => RDF a -> Triple -> RDF a removeTriple :: (Rdf a) => RDF a -> Triple -> RDF a
> triplesOf gr1 [Triple (UNode "http://example.com/rob_stewart") (UNode "http://xmlns.com/foaf/0.1/interest") (UNode "http://dbpedia.org/resource/Haskell_(programming_language)")] > let s = unode "http://example.com/rob_stewart" > let p = unode "http://xmlns.com/foaf/0.1/interest" > let o = unode "http://dbpedia.org/resource/Semantic_Web" > let t = triple s p o > let gr2 = addTriple gr1 t > triplesOf gr2 [Triple (UNode "http://example.com/rob_stewart") (UNode "http://xmlns.com/foaf/0.1/interest") (UNode "http://dbpedia.org/resource/Semantic_Web") ,Triple (UNode "http://example.com/rob_stewart") (UNode "http://xmlns.com/foaf/0.1/interest") (UNode "http://dbpedia.org/resource/Haskell_(programming_language)")]
9 Querying RDF graphs
query :: (Rdf a) => RDF a -> Maybe Node -> Maybe Node -> Maybe Node -> Triples > map objectOf $ query gr2 (Just (unode ("http://example.com/rob_stewart"))) -- subject (Just (unode ("http://xmlns.com/foaf/0.1/interest"))) -- predicate Nothing -- object [ UNode "http://dbpedia.org/resource/Semantic_Web" , UNode "http://dbpedia.org/resource/Haskell_(programming_language)"] select :: (Rdf a) => RDF a -> NodeSelector -> NodeSelector -> NodeSelector -> Triples > let objSelector = Just (\node -> case node of (UNode uri) -> Data.Text.isInfixOf "Haskell" uri; _ -> False) > select gr2 Nothing Nothing selector [Triple (UNode "http://example.com/rob_stewart") (UNode "http://xmlns.com/foaf/0.1/interest") (UNode "http://dbpedia.org/resource/Haskell_(programming_language)")]
10 Real world example: ESWC 2015
Returns full names of members on the ESWC 2015 conference programme committee.
eswcCommitteeURI, heldByProp :: T.Text eswcCommitteeURI = "http://data.semanticweb.org/conference/eswc/2015/program-committee-member" heldByProp = "swc:heldBy" eswcCommitteeMembers :: RDF TList -> [T.Text] eswcCommitteeMembers graph = let triples = query graph (Just (unode eswcCommitteeURI)) (Just (unode heldByProp)) Nothing memberURIs = map objectOf triples in map (\memberURI -> let (LNode (PlainL firstName)) = objectOf $ head $ query graph (Just memberURI) (Just (unode "foaf:firstName")) Nothing (LNode (PlainL lastName)) = objectOf $ head $ query graph (Just memberURI) (Just (unode "foaf:lastName")) Nothing in (T.append firstName (T.append (T.pack " ") lastName))) memberURIs
11 ESWC 2015 committee members
main :: IO () main = do result <- parseURL (XmlParser Nothing Nothing) "http://data.semanticweb.org/dumps/conferences/eswc-2015-complete.rdf" case result of Left err -> error "Unable to parse RDF content from that URL" Right rdfGraph -> do let eswcMemberNames = eswcCommitteeMembers rdfGraph mapM_ (putStrLn . T.unpack) eswcMemberNames > main Vadim Ermolayev Karl Aberer Giorgos Stoilos Birgitta König-Ries Giuseppe Rizzo ...
12 Parsing RDF
Serialisation | Reading | Writing |
---|---|---|
NTriples | yes | yes |
Turtle | yes | yes |
RDF/XML | yes | no |
13 Parsing RDF
Parsing
class RdfParser p where parseString :: (Rdf a) => p -> T.Text -> Either ParseFailure (RDF a) parseFile :: (Rdf a) => p -> String -> IO (Either ParseFailure (RDF a)) parseURL :: (Rdf a) => p -> String -> IO (Either ParseFailure (RDF a)) -- usage example readRdf rdfFileame = parseFile (XmlParser Nothing Nothing) rdfFilename
Serialising
class RdfSerializer s where hWriteRdf :: (Rdf a) => s -> Handle -> RDF a -> IO () -- usage example saveRdf rdfFilename rdfGraph = withFile rdfFilename WriteMode (\h -> hWriteRdf NTriplesSerializer h rdfGraph)
14 Implementing the RDF parsers
W3C provides BNF rules, and docs explaining how to modify parser state.
- NTriples: https://www.w3.org/TR/n-triples/
- Turtle: https://www.w3.org/TR/turtle/
- RDF/XML: https://www.w3.org/TR/rdf-syntax-grammar/
15 Parser combinators for BNF grammar rules
The library uses parsec, a Haskell parser combinator library.
The Turtle parser:
parseFile :: (Rdf a) => Maybe BaseUrl -> Maybe T.Text -> String -> IO (Either ParseFailure (RDF a)) parseFile bUrl docUrl fpath = do TIO.readFile fpath >>= \bstr -> return $ handleResult bUrl (runParser t_turtleDoc initialState (maybe "" T.unpack docUrl) bstr) where initialState = (bUrl, docUrl, 1, PrefixMappings Map.empty, [], [], [], [], False, Seq.empty,Map.empty) -- grammar rule: [1] turtleDoc t_turtleDoc :: GenParser ParseState (Seq Triple, PrefixMappings) t_turtleDoc = ... -- [7] predicateObjectList ::= verb objectList (';' (verb objectList)?)* t_predicateObjectList :: GenParser ParseState () t_predicateObjectList = do void (sepEndBy1 (optional (try (do { t_verb ; many1 t_ws ; t_objectList ; void popPred}))) (try (many t_ws >> char ';' >> many t_ws)))
16 Rdf type class
Functions optimised for each RDF graph implementation:
class Rdf a where baseUrl :: RDF a -> Maybe BaseUrl prefixMappings :: RDF a -> PrefixMappings addPrefixMappings :: RDF a -> PrefixMappings -> Bool -> RDF a empty :: RDF a mkRdf :: Triples -> Maybe BaseUrl -> PrefixMappings -> RDF a addTriple :: RDF a -> Triple -> RDF a removeTriple :: RDF a -> Triple -> RDF a triplesOf :: RDF a -> Triples uniqTriplesOf :: RDF a -> Triples select :: RDF a -> NodeSelector -> NodeSelector -> NodeSelector -> Triples query :: RDF a -> Maybe Node -> Maybe Node -> Maybe Node -> Triples showGraph :: RDF a -> String
Other utility query functions available in Data.RDF.Query
17 Graph implementations
Exposed as RDF
type family instances
data family RDF a -- function provided a (s,p,o) triples based RDF graph. foo :: RDF TList -> Bool foo rdfGraph = ... -- function provided a hash based adjacency map RDF graph. bar :: RDF AdjHashMap -> [Triple] bar rdfGraph = ...
18 List based graph implementation
RDF
type family instances for graph implementations
-- List based graph implementation data instance RDF TList = TListC (Triples, Maybe BaseUrl, PrefixMappings) -- Simple graph creation mkRdf :: Triples -> Maybe BaseUrl -> PrefixMappings -> RDF TList mkRdf ts baseURL pms = TListC (ts, baseURL, pms)
Places triples list into TListC
constructor, i.e. no computation in mkRdf
for TList
.
19 Adjacency map based graph implementation
-- Adjacency map implementation data instance RDF AdjHashMap = AdjHashMap (TMaps, Maybe BaseUrl, PrefixMappings) type TMaps = (TMap, TMap) type TMap = HashMap Node AdjacencyMap type Adjacencies = HashSet Node mkRdf :: Triples -> Maybe BaseUrl -> PrefixMappings -> RDF AdjHashMap mkRdf ts baseURL pms = AdjHashMap (mergeTs (HashMap.empty, HashMap.empty) ts, baseURL, pms) -- 20 lines of HashMap and HashSet API calls mergeTs :: TMaps -> [Triple] -> TMaps mergeTs = ...
Computation needed to map triples list to graph implementation.
20 Adjacency map example
-- Triples in the graph s: "http://example.com/rob_stewart", p: "http://xmlns.com/foaf/0.1/interest", o: "http://dbpedia.org/resource/Semantic_Web" s: "http://example.com/rob_stewart", p: "http://xmlns.com/foaf/0.1/interest", o: "http://dbpedia.org/resource/Haskell_(programming_language)" type TMaps = (TMap, TMap) -- for (SPO map, OPS map) type TMap = HashMap Node AdjacencyMap type Adjacencies = HashSet Node -- SPO map k: -1527025807618695980 -> v: (k: -2021146143382594279 -> v:[3091419593178925190,-8705743210846359529]) -- OPS map k: -8705743210846359529 -> v: (k: -2021146143382594279 -> v:[-1527025807618695980]) k: 3091419593178925190 -> v: (k: -2021146143382594279 -> v:[-1527025807618695980])
URI | Hash |
---|---|
"http://example.com/rob_stewart" | -1527025807618695980 |
"http://xmlns.com/foaf/0.1/interest" | -2021146143382594279 |
"http://dbpedia.org/resource/Semantic_Web" | 3091419593178925190 |
"http://dbpedia.org/resource/Haskell_(programming_language)" | -8705743210846359529 |
21 RDF graph implementation benchmarks
- query
AdjHashMap
faster thanTList
- select
TList
faster thanAdjHashMap
- addTriples & removeTriples
TList
much faster thanAdjHashMap
See: http://robstewart57.github.io/rdf4h/rdf4h-bench-13112016.html
22 Testing the library
- Property based API tests using QuickCheck
- does the
rdf4h
API perform as expected? - generate random tests and run them
- does the
- Unit tests for RDF parsers using HUnit
- do the parsers consume RDF content correctly?
- use W3C unit tests (currently 521 tests)
- https://github.com/w3c/rdf-tests
23 RDF W3C parser tests
In manifest.ttl
<#prefixed_IRI_predicate> rdf:type rdft:TestTurtleEval ; mf:name "prefixed_IRI_predicate" ; rdfs:comment "prefixed IRI predicate" ; rdft:approval rdft:Approved ; mf:action <prefixed_IRI_predicate.ttl> ; mf:result <IRI_spo.nt> ; .
Turtle to be parsed in prefixed_IRI_predicate.ttl
@prefix p: <http://a.example/>. <http://a.example/s> p:p <http://a.example/o> .
Expected NTriples output in IRI_spo.nt
<http://a.example/s> <http://a.example/p> <http://a.example/o> .
24 Property based API tests
The property based approach:
- Define valid possible random RDF values
- Define properties that must hold
- QuickCheck generates 100's of RDF triples and graphs
- Checks properties against randomly generated data
25 Arbitrary RDF instances
- Provide QuickCheck with valid RDF values
instance Arbitrary BaseUrl where arbitrary = oneof $ map (return . BaseUrl . T.pack) ["http://example.com/a", "http://asdf.org/b", "http://asdf.org/c"] instance Arbitrary PrefixMappings where arbitrary = oneof [ return $ PrefixMappings Map.empty , return $ PrefixMappings $ Map.fromAscList [ (T.pack "eg1", T.pack "http://example.com/1") , (T.pack "eg2", T.pack "http://example.com/2") , (T.pack "eg3", T.pack "http://example.com/3") ] ] arbitraryS, arbitraryP, arbitraryO :: Gen Node arbitraryS = oneof $ map return $ unodes ++ bnodes arbitraryP = oneof $ map return unodes arbitraryO = oneof $ map return $ unodes ++ bnodes ++ lnodes instance Arbitrary Triple where arbitrary = liftM3 triple arbitraryS arbitraryP arbitraryO
26 QuickCheck properties
- Add randomly generated list of triples to an empty graph with
addTriple
, 1 by 1. - Remove triples from the populated graph with
removeTriple
, 1 by 1. - Check that
triplesOf
returns[]
for the emptied graph.
p_add_then_remove_triples :: (Rdf rdf) => RDF rdf -- an empty graph -> Triples -- triples to add and then remove -> Bool -- should be True p_add_then_remove_triples _empty generatedTriples = let emptyGraph = _empty populatedGraph = foldr (\triple gr -> addTriple gr triple) emptyGraph generatedTriples emptiedGraph = foldr (\triple gr -> removeTriple gr triple) populatedGraph generatedTriples in triplesOf emptiedGraph == []
27 Test results
$ stack test --test-arguments="--pattern property-tests" ... remove_triple_from_singleton_graph_query_o: OK +++ OK, passed 100 tests. p_add_then_remove_triples: OK (0.01s) +++ OK, passed 100 tests. All 56 tests passed (1.54s)
$ stack test --test-arguments="--pattern parser-w3c-tests" ... literal_with_UTF8_boundaries: OK langtagged_string: OK lantag_with_subtag: OK minimal_whitespace: OK 64 out of 521 tests failed (1.19s)
- Development priority: W3C unit test compliance for Turtle and RDF/XML parsers.
28 A SPARQL DSL inside Haskell
- Query generation
- SPARQL query execution
type EndPoint = String data BindingValue = Bound Node | Unbound -- SPARQL SELECT returns bound values selectQuery :: EndPoint -> Query SelectQuery -> IO (Maybe [[BindingValue]]) -- SPARQL CONSTRUCT returns RDF graph constructQuery :: Rdf a => EndPoint -> Query ConstructQuery -> IO (RDF a) -- SPARQL ASK askQuery :: EndPoint -> Query AskQuery -> IO Bool -- SPARQL UPDATE updateQuery :: EndPoint -> Query UpdateQuery -> IO Bool -- SPARQL DESCRIBE describeQuery :: Rdf a => EndPoint -> Query DescribeQuery -> IO (RDF a)
29 SPARQL query generation
data ConstructQuery = ConstructQuery { queryConstructs :: [Pattern] } data AskQuery = AskQuery { queryAsk :: [Pattern] } data UpdateQuery = UpdateQuery { queryUpdate :: [Pattern] } data SelectQuery = SelectQuery { queryVars :: [Variable] } data DescribeQuery = DescribeQuery { queryDescribe :: IRIRef } -- Permit variables and values for 'triple' and similar functions class TermLike a where ... prefix :: Text -> IRIRef -> Query Prefix -- Some examples... -- query actions triple :: (TermLike a, TermLike b, TermLike c) => a -> b -> c -> Query Pattern union :: Query a -> Query b -> Query Pattern -- limit handling limit :: Int -> Query Limit -- order handling orderNextAsc :: TermLike a => a -> Query () -- relations (.<.) :: (TermLike a, TermLike b) => a -> b -> Expr
30 SPARQL SELECT example
Web browsers according to Wikipedia:
simpleSelect :: Query SelectQuery simpleSelect = do resource <- prefix "dbprop" (iriRef "http://dbpedia.org/resource/") dbpprop <- prefix "dbpedia" (iriRef "http://dbpedia.org/property/") foaf <- prefix "foaf" (iriRef "http://xmlns.com/foaf/0.1/") x <- var name <- var triple x (dbpprop .:. "genre") (resource .:. "Web_browser") triple x (foaf .:. "name") name return SelectQuery { queryVars = [name] } main = do -- get a list of bound variables (Just boundValues) <- selectQuery "http://dbpedia.org/sparql" simpleSelect mapM_ print boundValues > main [[Bound (LNode (PlainLL "Amaya" "en"))] ,[Bound (LNode (PlainLL "Sleipnir" "en"))] ,[Bound (LNode (PlainLL "Midori" "en"))] ....
31 SPARQL CONSTRUCT example
Web browsers according to Wikipedia, as triples:
simpleConstruct :: Query ConstructQuery simpleConstruct = do resource <- prefix "dbpedia" (iriRef "http://dbpedia.org/resource/") dbpprop <- prefix "dbprop" (iriRef "http://dbpedia.org/property/") foaf <- prefix "foaf" (iriRef "http://xmlns.com/foaf/0.1/") example <- prefix "example" (iriRef "http://www.example.com/") x <- var name <- var construct <- constructTriple x (example .:. "hasName") name triple x (dbpprop .:. "genre") (resource .:. "Web_browser") triple x (foaf .:. "name") name return ConstructQuery { queryConstructs = [construct] } main = do (rdfGraph :: RDF TList) <- constructQuery "http://dbpedia.org/sparql" simpleConstruct mapM_ print (triplesOf rdfGraph) > main Triple (UNode "http://dbpedia.org/resource/Amaya_(web_editor)") (UNode "http://www.example.com/hasName") (LNode (PlainLL "Amaya" "en")) Triple (UNode "http://dbpedia.org/resource/Sleipnir_(web_browser)") (UNode "http://www.example.com/hasName") (LNode (PlainLL "Sleipnir" "en")) Triple (UNode "http://dbpedia.org/resource/Midori_(web_browser)") (UNode "http://www.example.com/hasName") (LNode (PlainLL "Midori" "en")) ...
32 Using the libraries
Both compatible with cabal
and stack
tools.
Download libraries manually, e.g.:
$ cabal install rdf4h
Or depend on them in applications in <projectName>.cabal
Build-depends: rdf4h >= 3.0.1
An in your stack.yaml
extra-deps: - rdf4h-3.0.1
33 Finding the libraries
- rdf4h
- API docs: http://hackage.haskell.org/package/rdf4h
- Repository: https://github.com/robstewart57/rdf4h
- GitHub pages: http://robstewart57.github.io/rdf4h
- Contributors: Calvin Smith, Slava Kravchenko
- hsparql
- API docs: http://hackage.haskell.org/package/hsparql
- Repository: https://github.com/robstewart57/hsparql
- Contributors: Jeff Wheeler, Slava Kravchenko
34 Future work
- 100% compliance with W3C parser tests for Turtle and RDF/XML
- New high performance RDF graph implementations
- Parser performance improvements, e.g. ticket 35
- Criterion profile guided API performance optimisations
- Support for RDF quads
- Support for JSON RDF serialisation ticket 34
- RDFS/OWL inference engines above rdf4h