||
Background |
The increaing number of XML data proposes a requirement for efficient storage and retrieval. A straight forward solution is to map XML data into relatioanl table,and XML query into SQL. During the mapping, a heavy cost is raised and some semantics are lost. However,a native solution benefits that there is no needs to map your XML data to some other data structure. You just store the data as XML and retrieve it as XML. This is especially valuable when you have very complex XML structures which would be difficult or impossible to map to a more structured database. |
Acknowledgements |
|
Berkeley DB XML is implemented as C++ library on top of Berkeley DB. BDB XML is distributed as a shared library that is embedded into the client application. The BDB XML library exposes API's that enable C++ and Java applications to interact with the XML data containers. Figure 1 illustrates the Berkeley DB XML system architecture.
BDB XML uses Berkeley DB for data storage and transaction management. Client applications can also store data directly to a Berkeley DB database. Although BDB XML hides much of the internal use of Berkeley DB, some understanding of the underlying Berkeley DB API is required, as some BDB XML API methods accept Berkeley DB object handles as parameters. In particular, transactional applications need to fully understand the Berkeley DB database management interfaces for operations such as backup and restore, archiving, database recovery, etc.
The BDB XML library comprises several main components: document storage, XML indexing and index management, query optimization, and query execution.
----------------------------------------------------------------------------------------------------------------------------------------------------
ARCHITECTURE OF ORIENTX |
OrientX adopts client-server architecture. Client provides graphical interfaces for user managing and retrieving data.Server provides an API interface to access database. The communication between them is implemented by socket technique. The overall architecture of OrientX is shown in Figure 1. We introduce in brief some modules here, and some important modules are focused on in the following sections. |
Architecture |
File Manager: The underlying file manager communicateswith file system to create, delete, open and close data les,in units of fixed size such as 8 MB.
Storage Manager: The storage manager manages the storage space of the file in units of a physical page, which is set to 8 KB. The main tasks include: apply/free physical page,create/delete dataset, etc. Buffer Manager: There are two layers of our Buffer Mechanism: the lower layer is page buffer, and the higher layer is record buffer. Like RDBMS, page buffer manager managing the physical pages with LRU(Least Recently Used)method. Unlike RDBMS, the record in OrientX is tree structure, and need to be generated from the byte stream, which may cost some CPU time. Record buffer cached such tree structures to reduce the generating time. Another main target of record buffer is to enable OrientX query large documents. Through record buffer, documents can be read in peaces(records), and the unoccupied record can be freed to accommodate new records. In OirentX system, the record buffer is called treefrog, which means the current cursor can jump from records to records on the XML tree. Access Manager: The access manager provides a uniform access interface to data manager, index manager, and schema manager. Details of the buffer manager and storage manager are hidden. Data Manager: The data manager provides functions for importing, exporting, and retrieving the root of a document,etc. It formats a record(memory object) into (and from) a byte-stream. Schema Manager: Schema-independent system can import XML data without schema. But for accelerating query processing, the system need to extract the schema form the data. That may make the schema even more huge and complex than the data. Moreover, the schema has not the function of constraining data, which will limit the use cases of schema, such as type checking in query and update. Like traditional database, OrientX is schema-based. Schema strictly constraint the type and structure of data. So, data retrieving, updating and storing are all under the schema's guidance.Schema information can be used in data layout, in choice of index, in type checking, in user access control, and in query optimization. Schema in OrientX is consistent with the XML Schema standard. Schema information is stored as a special data set in the database. Meanwhile, schema saved by tree structure is semi-structure itself, so it can restrict XML data without breaking features of XML data. Schema manager provides a uniform interface for other modules to access the schema information. Data Processor: The data processor includes query evaluator and data updater. The former will be described in Section 5. Now we introduce the later in brief. In RDBMS,relationship between the records is represented by foreign key, and in OODBMS, relationship between objects is represented by object containment. While XML supports both of them: identity reference and nesting structure. OrientX keeps the reference integrity within updating. While deleting a complex element, all of the nested elements and values will be removed. While deleting an element referenced by other elements, the corresponding reference will be found by the value index and then deleted. The deleting of reference directly is also supported. In our storage prototype, the elements are stored as variable length records. Each record has its parent record's or neighbor sibling record's pointer. The records may change their address because of increase or decrease contents during update operations, thus leads to the changes of the pointer.In order to decrease the modification of the pointers we introduce the oid(object id). Each element has a unique id.We use the oid table to store the oid and its corresponding storage address. In the system the record stores its parent and children oid as the pointer rather than their storage address. Therefore if the storage address of one record is changed due to update, we just to update the oid table. To decrease the address modification of the updating record,we set a preserve factor of each page to preserve space for updating record. We supply garbage collection mechanismfor space reuse. |
OrientX (Native XML Database Management System) XML Group, WAMDM, Renmin University of China |
[Home] | [System] | [Publication] | [Download] | [Documentation] | [People] |
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-5-18 12:00
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社