When working with Linked Data directly from a thin client (which involves loading tons of published RDF files) it is often the case that you end up loading and parsing tens of thousands of triples into memory of which you only actually require a small amount for display and processing.
For example, most generated RDF documents from DBPedia contain multiple translations of their literal values.
If you are loading hundreds of these documents, this can become a serious issue from a memory management and processing standpoints.
We are now tweaking the parsers in SemanticFlash to support some sort of granular tuning so that they can be instructed to skip complete sub-structures in the AST processing phase according to simple declarative rules.
How important is this?
Well, suppose you just want to meshup some markers on a map with labels and a small description from wikipedia. In this particular case ( Cambridge ), you would end up processing and storing 546 triples instead of just two!
( want to see how these 546 triples look like? )
And now multiply that for 100 markers. That's 54600 triples, and probably a couple of MBs of memory (consider the literals). And that is assuming you are not running any inferences or rule-based processing on the data, which would make this even more expensive.
OK, you get the point.
In the future we will explore some literal compression techniques, Bitmap Indexing and Storage and how we can make get this sort of behavior interact with incremental loading and federated querying to offload working memory to a maximum.
But it is a tricky problem in the end.
I have experimented with this in the past and the Open World Assumption's non-determinism starts knocking on the door as soon as you throw in some inference and smushing. Bottomline is that it is impossible to stay totally monotonic AND OWA friendly AND cope with finite memory resources. Old problem. New face.
Remember: The Web of Linked Data is Huge. Your User Agent is Not.
