
XML is now widely accepted as a markup standard for data exchange. Given a large collection of closely related DTDs, and an XML document which doesn't specify which DTD it conforms to, a basic problem is to identify the DTD against which the document can be validated. This problem is challenging when the DTD identification needs to performed over a high-volume stream of XML documents.In this paper, we present the DTDMatch system, which solves this problem by dynamically maintaining a set of sub-structures that are common to DTDs and documents. By using a sequence of filters that combine positive and negative information (which sub-structures in a newly arriving document can or cannot be instantiated by a DTD), the DTDMatch system quickly reduces the number of DTDs relevant to each incoming document, while guaranteeing no false negatives in the resulting set of candidates DTDs.
Page Count:
32
Publication Date:
2005-01-01
No comments yet. Be the first to share your thoughts!