dc.contributor.author
Faulstich, Lukas C.
dc.date.accessioned
2018-06-08T00:41:57Z
dc.date.available
2000-03-16T00:00:00.649Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/12312
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-16510
dc.description
Title page, contents
1 Introduction
1.1 Integration of semistructured information sources Virtual Web Sites 1.2
The HyperView approach 1.2.1 Data Model and View Mechanism 1.2.2 Architecture
1.2.3 Application of the HyperView Technology 1.3 Related Work (Overview) 1.4
Overview 2 HyperView by Example: Wrapping Publisher Web Sites
2.1 Digital Libraries of Electronic Journals 2.1.1 The DARWIN project 2.1.2
Use cases 2.2 Modeling publisher Web Sites 2.2.1 Generic approach 2.2.2 Graph
Schemata 2.2.3 The HyperView Database Schema 2.2.4 ACR Schemata of Example Web
Sources 2.2.5 Representing HTML Pages as HTML graphs 2.3 Building Views on
publisher Web Sites 2.3.1 Queries and Rules 2.3.2 Defining a View over the
HTML Graphs 2.3.3 Defining a View over the ACR Graphs 2.3.4 Querying the
HyperView system 2.4 The Architecture of DARWIN 2.5 Summary 3 Formal
Framework
3.1 Clustered Graph Data Model (CGDM) 3.1.1 Motivation 3.1.2 Basic definitions
3.1.3 Schemata and instances 3.2 Rules 3.2.1 Rule application 3.3 Queries and
Oracles 3.3.1 Applying a rule to a virtual data graph 3.3.2 Hyperviews 3.3.3
Using a rule to answer a subquery 3.3.4 Chaining rules to answer a query 3.4
Reuse of existing subgraphs 3.5 Bibliography on Graph-Transformation 3.6
Summary 4 The HyperView System
4.1 Encoding of Graphs 4.1.1 Plain Graphs 4.1.2 Clustered Graphs 4.1.3 Type
checking 4.2 Encoding of Queries 4.3 Encoding of Rules 4.4 Rule Activation 4.5
Query execution 4.6 Complexity and Performance 4.7 Metadata management 4.7.1
Schema clusters 4.7.2 The `meta` cluster 4.7.3 WWW meta data 4.8 The HyperView
System prototype 4.9 Summary 5 The HVQL Query Language
5.1 Introduction 5.2 Basic Notations 5.3 Graph Patterns 5.4 Graph Literals 5.5
Queries 5.5.1 Syntax 5.5.2 Semantics 5.5.3 Implementation 5.6 Rules 5.6.1
Syntax 5.6.2 Semantics 5.6.3 Implementation 5.6.4 Example 5.7 Meta Edges 5.8
HTML Edges 5.9 Embedding of HVQL in the HyperView System 5.10 Summary 6
Support for Web Interfaces
6.1 Introduction 6.2 Architecture of the HyperView Web server 6.3 Conceptual
model of the virtual HyperView Web site 6.4 HTML Code Generation 6.4.1 Phase
1: Preparation 6.4.2 Phase 2: Generation of a HTML skeleton 6.4.3 Phase 3:
HTML dump and generation of variable HTML code 6.4.4 HVQL notation for HTML
rules 6.5 The HyperView Browser 6.5.1 Customization 6.6 Summary 7 Case Study:
Town Information
7.1 Introduction 7.2 Scenario 7.2.1 Use Case 7.3 Developing a cultural event
calendar 7.3.1 Conceptual schema 7.3.2 Wrapping town information sites 7.4 The
cultural calendar Web site 7.5 Summary 8 The HyperView Methodology
8.1 User roles 8.2 Content Specification 8.3 The Design Space of HyperView 8.4
Schema development 8.4.1 HTML layer 8.4.2 ACR layer 8.4.3 Database layer 8.4.4
UI layer 8.5 View development 8.5.1 Implementing HTML views 8.5.2 ACR Views
8.5.3 DB Views 8.6 Maintenance 8.6.1 Robustness 8.6.2 Error detection 8.6.3
Adaption 8.7 Summary 9 Discussion and Outlook
9.1 Related Work 9.1.1 Data models and schemata for semistructured data 9.1.2
Data Extraction from Semistructured Documents 9.1.3 Querying the Web 9.1.4
Integration of Heterogeneous Data Sources 9.1.5 Related applications of Graph-
Transformation techniques 9.1.6 Comparison with HyperView 9.2 Future
Applications: XML & RDF 9.2.1 XML 9.2.2 XML Parsing 9.2.3 XML DTD s and
schemata 9.2.4 XPointer and XQL 9.2.5 Extensible Stylesheet Language 9.2.6
Channel Definition Format 9.2.7 Resource Description Framework (RDF) 9.2.8 RDF
Schemata 9.2.9 Summary 9.3 Open Issues 9.3.1 Theoretical Issues 9.3.2
Integration Issues 9.3.3 Implementation and Performance Issues 9.3.4 Interface
Issues 9.4 Contributions and Outlook 9.5 Acknowledgments Bibliography Table
of Mathematical Symbols Zusammenfassung der Ergebnisse Lebenslauf Verwendete
Hilfsmittel
dc.description.abstract
Using the World Wide Web to answer a specific question often requires
information to be collected from multiple heterogeneous Web sites. Virtual Web
sites are a promising approach to automate this task for particular, focused
application domains.
A virtual Web site serves pages containing concentrated information that has
been extracted, homogenized, and combined from several underlying Web sites.
The HyperView approach to the integration of semistructured data presented in
this thesis provides a methodology, a formal framework, and a software
environment for building such virtual Web sites.
The HyperView approach treats the three steps of data extraction, integration,
and presentation uniformly as consecutive views that map between different
levels of abstraction. These levels are reflected by the architectural layers
of the system. The contents of Web sites as well as the consecutive views are
represented as graphs. Views are defined by sets of graph transformation
rules. A demand-driven rule activation mechanism has been formally described
and implemented. This mechanism incrementally materializes views in response
to queries issued against them.
The HyperView System has been implemented in Prolog. Graph transformation
rules are compiled into efficient Prolog predicates. Java servlets are used to
support virtual Web sites.
The main contributions of this thesis are:
1\. the key idea of applying the same view mechanism uniformly to the
problems of extraction, integration, and presentation, 2\. the HyperView
methodology for modeling and integrating Web sites, 3\. the formal
framework defining the data model, rule concept, and the demand-driven view
materialization mechanism of HyperView, 4\. the HyperView System prototype
providing a platform for building virtual integrated Web sites 5\. the
validation of the HyperView methodology and system in case studies on Digital
Libraries and Town Information.
de
dc.description.abstract
Die Beantwortung konkreter Fragen per World Wide Web erfordert häufig das
Zusammentragen und Kombinieren von Informationen aus mehreren Web-Sites.
Virtuelle Web Sites versprechen, diese Aufgabe zumindest für begrenzte
Anwendungsbereiche zu automatisieren. Ein virtueller Web Site bietet
Informationen, die aus zugrundeliegenden Web Sites extrahiert,
vereinheitlicht, und integriert wurden.
Der HyperView-Ansatz zur Integration von semistrukturierten Daten besteht aus
einer Methodik, einem mathematischen Formalismus und einer Software-Umgebung
für die Realisierung virtueller Web Sites. Im HyperView-Ansatz werden die drei
Schritte der Extrahierung, Integration und Präsentation der Daten als
aufeinanderfolgende Sichten (Views) aufgefaßt, welche die Abstraktionsebenen
der HyperView-Architektur aufeinander abbilden. Der Inhalt jeder Schicht wird
durch Graphen repräsentiert. Sichten werden durch Mengen von
Graphtransformationsregeln definiert. Ein bedarfsgesteuerter Mechanismus zur
Aktivierung dieser Regeln wurde formal beschrieben und implementiert. Dieser
Mechanismus materialisiert Sichten inkrementell, in Reaktion auf Anfragen.
Das HyperView System ist in Prolog implementiert. Graphtransformationsregeln
werden in effiziente Prolog-Prädikate kompiliert. Java Servlets werden für die
Generierung von HTML-Seiten genutzt.
Die Hauptergebnisse dieser Arbeit sind:
1\. der Nachweis, daß die Probleme der Daten-Extraktion, -Integration, und
-Präsentation mit einem einheitlichen Abbildungs-Mechanismus gelöst werden
können, 2\. die HyperView-Methodik für die Modellierung und Integration
von Web-Sites, 3\. die formale Definition des Datenmodells, des
Regelkonzepts und des bedarfsgesteuerten Mechanismus für die Materialisierung
von Sichten, 4\. die Implementierung des HyperView System s als einer
Plattform für die Errichtung virtueller Web-Sites, und 5\. die Validierung
der HyperView-Methodik und des HyperView System s in Fallstudien zu Digitalen
Bibliotheken und Stadtinformationen.
de
dc.rights.uri
http://www.fu-berlin.de/sites/refubium/rechtliches/Nutzungsbedingungen
dc.subject
data integration
dc.subject
semistructured data
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
The HyperView Approach to the Integration of Semistructured Data
dc.contributor.firstReferee
Prof. Dr. Heinz Schweppe
dc.contributor.furtherReferee
Prof. Dr. Herbert Weber
dc.contributor.furtherReferee
Prof. Dr. Hartmut Ehrig
dc.date.accepted
2000-02-15
dc.date.embargoEnd
2000-08-24
dc.identifier.urn
urn:nbn:de:kobv:188-2000000333
dc.title.translated
Der HyperView-Ansatz zur Integration semistrukturierter Daten
de
refubium.affiliation
Mathematik und Informatik
de
refubium.mycore.fudocsId
FUDISS_thesis_000000000226
refubium.mycore.transfer
http://www.diss.fu-berlin.de/2000/33/
refubium.mycore.derivateId
FUDISS_derivate_000000000226
dcterms.accessRights.dnb
free
dcterms.accessRights.openaire
open access