ICEIS 2003 Abstracts

 

Abstract of Accepted Papers

Program Committee

Case Studies

Keynote Lectures

Tutorials

Workshops

Paper Templates

Proceedings

Social Activities

Transportation and Accomodation

Local Information

Organizing Committee

Steering Committee

Sponsors

Hall of Fame

Links


Co-organized by:

École Supérieure d' Électronique de l' Ouest
École Supérieure
d' Électronique de
l' Ouest

and
Escola Superior de Tecnologia
Departamento de Sistemas 
e Informática
da
EST-Setúbal/IPS 
Escola Superior de 
Tecnologia de Setúbal 

 Instituto Politécnico de Setúbal

 

ICEIS 2003 Sites
www.est.ips.pt/iceis/

www.iceis.org

DBLP bibliography

 

Area 1 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
Area 2 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
Area 3 - INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION
Area 4 - Software Agents and Internet Computing

Area 1 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

Title:

O2PDGS: AN APPROACH FOR UNDERSTANDING OBJECT ORIENTED PROGRAMS

Author(s):

Hamed  Al-Fawareh

Abstract: In this paper, we provide a description of dependence graphs for representing meaningful dependencies between components of object-oriented programs. A formal description of the dependence relations of interest is given before giving a representative illustration of object-oriented program dependence graphs (O2PDGs). The paper also discusses an approach for understanding object-oriented programs through the use of O2PDGs.

Title:

ERP SYSTEMS IMPLEMENTATION DETERMINANTS AND SUCCESS MEASURES IN CHINA: A CASE STUDY APPROACH

Author(s):

Christy Cheung, Zhe Zhang, Matthew Lee, Liang Zhang

Abstract: With the growing intensive global competition and integration of the world economy, manufacturing firms have to reduce inventory level and operation costs, improve customer service to obtain competitive advantage against their competitors. Manufacturing companies are forced to adopt new methods to achieve the above objectives. Enterprise resource planning (ERP) system is one of the most widely accepted choices. AMR predicts the total ERP market will reach $66.6 billion by 2003, growing an estimated 32% annually over the next five years. Significant benefits such as improved customer service, better production scheduling, and reduced manufacturing costs can accrue from the successful implementation of ERP (Ang et al, 1995). However, the successful implementation rate is extremely low especially in China and many firms didn’t achieve intended goals. Thus, it’s necessary for ERP practitioners and researchers to investigate the reasons why the implementation success rate of ERP systems in China is so low. Prior studies mainly focus on critical success factors or single ERP implementation success measure without theoretical support. This study attempts to combine Ives, Hamilton, and Davis (1980) MIS research model and DeLone & McLean’s (1992) IS success model to develop an ERP implementation success model, identifying both generic and unique factors that affect ERP systems implementation success in China and using multiple ERP implementation success measures to assess whether an ERP implementation is a success or failure. Through multiple case study research method, more detailed information about ERP implementations could be collected. Moreover, it solves problems of validity and reliability of constructs occurring frequently in a single case study. The results of this research can help ERP-related researchers, practitioners, and companies to get more comprehension of ERP systems implementation issues and the chance of ERP implementation success could be increased given enough attention to these issues.

Title:

DATA WAREHOUSING: A REPOSITORY MODEL FOR METADATA STORAGE AND RETRIEVAL BASED ON THE HUMAN INFORMATION PROCESSING

Author(s):

Enrique Luna-Ramírez, Félix García-Merayo, Covadonga Fernández-Baizán

Abstract: The information on the creation, management and use of a data warehouse is stored in what is called the metadata repository, making this repository the single most important component of the data warehouse. Accordingly, the metadata repository plays a fundamental role in the construction and maintenance of the data warehouse, as well as for accessing the data it stores. In this paper, we propose a repository model conceived to store and retrieve the metadata of a corporation data warehouse. With a view to achieving this objective, the model, composed of an approach for modelling the repository structure and by a metamodel for retrieving metadata, is based on the human information processing paradigm. So, the model considers a series of distinctive functionalities that can be built into a repository system to assure that it works efficiently. These functionalities refer to the use of two memories for storing the repository metadata and a set of structures and processes for retrieving the information passing from one memory to another. One of the memories in particular is used to store the most recurrent metadata in a corporate environment, which can be rapidly retrieved with the help of the above-mentioned structures and processes. These structures and processes also serve to contextualise the information of a data warehouse according to the projects or business areas to which it refers.

Title:

HOSPITAL CASE RECORDS INFORMATION SYSTEM: CASE STUDY OF A KNOWLEDGE-BASED PRODUCT

Author(s):

A. Neelameghan, M. Vasudevan

Abstract: Briefly discusses knowledge management and use of knowledge-based products in enterprises. Enumerates the information resources of a hospital and describes the design and development of a patients’ case records system, specifically for a hospital specializing in surgical cases of tumors of the central nervous system. Each case record has data / information on over 150 attributes of patient, facility for hypertext linking relevant images (CT scan, X-ray, NMR, etc.) and access to electronic documents from other websites. The collaborative roles of the hospital doctors and a consultant information specialist in the development of the system are indicated. Output of a case record with links to related CT scan pictures and a web document is presented as example. Concludes mentioning the various uses of the system.

Title:

MODELS FOR IMPLEMENTATION OF ONLINE REAL TIME IT-ENABLED SERVICE FOR ENTRY TO PROFESSIONAL EDUCATION

Author(s):

Natesan  T.R, V. Rhymend  Uthariaraj, George  Washington .D.

Abstract: Any agency selecting candidates for admission to any professional education has to administer a common entrance examination, evaluate the responses and offer seats accoring to their merit. This task has two parts viz., conduct of examination and admission. In this paper a process oriented data model for the conduct of examination and admission process has been developed and implemented, based on statistical and mathematical models. The schedule for online real time registration for the examination at various centres is based on a statistical model and the centres for the conduct of counselling are selected based on a mathematical programming model. This system has been implemented through online real time distributed database with secured Virtual Private Network (VPN)

Title:

STORAGE OF COMPLEX BUSINESS RULES IN OBJECT DATABASES

Author(s):

Dalen Kambur, Mark Roantree

Abstract: True integration of large systems requires sharing of information stored in databases beyond sharing of pure data: business rules associated with this data must be shared also. This research focuses on providing a mechanism for defining, storing and sharing business rules across different information systems, in an area where existing technologies are weak. In this paper, we present the pre-integration stage where individual business rules are stored in the database for subsequent exchange applications and information systems.

Title:

A GRAPHICAL LANGUAGE FOR DEFINING VIEWS IN OBJECT ORIENTED DATABASES

Author(s):

Elias Choueiri, Marguerite Sayah

Abstract: Within the framework of an Object Oriented Database Graphical Query Environment for casual end users, a View Definition Mechanism conceived for users who are expert in their application domain, but not necessarily computer specialists, is proposed in this paper. In this mechanism, a concentration is made on the strength of the graphical view definition language and on the conviviality of the user interface. The view definition language offers adaptation operations to the work context and restructuring operations on both attributes and classes that take into consideration the structure’s nesting and inheritance of the database classes. The user interface conviviality is based on the graphical visualization of the portion of the database schema that represents the domain of interest for a user group, and on the use of the graphical language for view definition. To eliminate crossings between different links of the visualized composition hierarchy, a method for graphical visualization is introduced.

Title:

A TRANSPARENT CLIENT-SIDE CACHING APPROACH FOR APPLICATION SERVER SYSTEMS

Author(s):

Daniel Pfeifer, Zhenyu Wu

Abstract: In recent years, application server technology has become very popular for building complex but mission-critical systems. However, the resulting solutions tend to suffer from serious performance and scalability bottlenecks, because of their distributed nature and their various software layers. This paper deals with the problem by presenting a new approach about transparently caching results of a service interface's read-only methods on the client side. Cache consistency is provided by a descriptive cache invalidation model which may be specified by an application programmer. As the cache layer is transparent to the server as well as to the client code, it can be integrated with relatively low effort even in systems that have already been implemented. Early experimental results show that the approach is effective in improving a server's response times and its transactional throughput. Roughly speaking, the overhead for cache maintenance is small when compared to the cost for method invocations on the server side. The cache's performance improvements are dominated by the fraction of read method invocations and the cache hit rate. Moreover, the cache can be smoothly integrated with traditional caching strategies acting on other system layers (e. g. caching of dynamic Web pages on a Web server). The presented approach as well as the related prototype are not restricted to application server scenarios but may be applied to any kind of interface-based software layers.

Title:

EFFICIENT STORAGE FOR XML DATABASES

Author(s):

Weiyi Ho, Dave Elliman, Li Bai

Abstract: The widespread activity involving the Internet and the Web causes huge amount of electronic data to be generated everyday. This includes, in particular, semi-structured textual data such as electronic documents, computer programs, log files, transaction records, literature citations, and emails. Storing and manipulating the data thus produced has proven difficult. As conventional DBMSs are not suitable for handling semi-structured data, there is a strong demand for systems that are capable of handling large volumes of complex data in an efficient and reliable way. The Extensible Markup Language (XML) provides such solution. In this paper, we present the concept of ‘vertical view model’ and its uses as a mapping mechanism for converting complex XML data to relational database tables, and as a standalone data model for storing complex XML data.

Title:

DATA MANAGEMENT: THE CHALLENGE OF THE FUTURE

Author(s):

Alan Hodgett

Abstract: There has been an explosion in the generation of data in organizations. Much of this data is both unstructured and decentralized. This raises a number of issues for data management in organizations. This paper reports on an investigation that was undertaken in Australia to study the way in which organizations were dealing with the growth and proliferation of data and are planning for the future. The results show a high level of consciousness of the issues but indicate a prevalent optimism that technology will continue to provide solutions to present and future problems facing organizations. It appears that much magnetically recorded data will inevitably be lost over the next few decades unless positive actions are taken now to preserve the data.

Title:

TOWARDS A TIMED-PETRI NET BASED APPROACH FOR THE SYNCHRONIZATION OF A MULTIMEDIA SCENARIO

Author(s):

Abdelghani GHOMARI

Abstract: This article proposes a new approach for the synchronization of a multimedia scenario based on a new class of p-temporal Petri nets called p-RdPT+. One essential phase during the synchronization of multimedia scenario is related to the characterization of their logical and temporal structure. This structure is expressed through a set of composition rules and synchronization constraints depend on user interactions. An inconsistent situation is detected when some of the constraints specified by the author can not be met during the presentation. Hence, our approach permits verification of the specification by temporal simulation of the Petri net automatically generated or by analysing the graph of accessibility derived from the generated p-RdPT+ model.

Title:

PLANNING FOR ENTERPRISE COMPUTING SERVICES: ISSUES AND NECESSITIES ANALYZED

Author(s):

Jason Tseng, Emarson Victoria

Abstract: While planning, simulation and modeling tools exist for fields like network management and capacity/workload planning, little is known about automated planning tools for computing services. Considering the complexities and difficulties in deploying and managing computing infrastructure and services, we need to examine their planning processes instead, to augment existing enterprise management and planning solutions. In this paper, we present the motivation and advantages of a planning tool that automates the planning of computing services. This requires us to consider the issues and problems in deploying and managing computing services and their infrastructure. It allows us to understand why and how, such a planning tool can be used to alleviate, if not eliminate some of these problems. The planning tool works by actively abstracting properties of actual computing components using an information model/framework and formulating rules to analyze and automate the planning activity, using only abstracted component representations. This will pave the way for plans that closely reflect the actual computing environment, thus allowing users to leverage the flexibility and virtualization in the planning environment

Title:

EXTENDING GROUPWARE FOR OLAP

Author(s):

Sunitha Kambhampati, Daniel Ford, Vikas  Krishna, Stefan Edlund

Abstract: While applications built on top of groupware systems are capable of managing mundane tasks such as scheduling and email, they are not optimised for certain kinds of applications, for instance generating aggregated summaries of scheduled activities. Groupware systems are primarily designed with online transaction processing in mind, and are highly focused on maximizing throughput when clients concurrently access and manipulate information on a shared store. In this paper, we give an overview and discuss some of the implementation details of a system that transforms groupware Calendaring & Scheduling (C&S) data into a relational OLAP database optimised for these kinds of analytical applications. We also describe the structure of the XML documents that carry incremental update information between the source groupware system and the relational database, and show how the generic structure of the documents enables us to extend the infrastructure to other groupware systems as well.

Title:

REPCOM: A CUSTOMISABLE REPORT GENERATOR COMPONENT SYSTEM USING XML-DRIVEN, COMPONENT-BASED DEVELOPMENT APPROACH

Author(s):

Sai Peck Lee, Chee Hoong Leong

Abstract: It is undeniable that report generation is one of the most important tasks in many companies regardless of the size of the company. A good report generation mechanism can increase a company’s productivity in terms of effort and time. This is more obvious in some startup companies, which normally use some in-house report generators. Application development could be complex and thus software developers might require substantial efforts in maintaining application program code. In addition, most of the report generators use a different kind of format to store the report model. An application is no longer considered an enterprise-level product if XML is not being used elsewhere. This paper introduces a XML-driven and Component-based development approach to report generation with the purpose of promoting portability, flexibility and genericity. In this approach, report layout is specified using user-defined XML elements together with queries that retrieve data from different databases. A report is output as an HTML document, which can be viewed using an Internet browser. This paper presents the approach using an example and discusses the usage of the XML-driven report schema and how the proposed reusable report engine of a customisable report generator component system works to output an HTML report format. The customisable report generator component system is implemented to support heterogeneous database models

Title:

E-LEARNING INFORMATION MANAGEMENT ISSUES IN XML-BASED MEDIATION

Author(s):

Boris Rousseau, Eric Leray, Micheal O'Foghlu

Abstract: The advancement in XML-based mediation has made a significant impact on the area of E-Learning. Search engines have now been provided with new ways to improve resource discovery and new tools to customise resulting content. In the early days of XML, this work was undertaken within the context of the European funded project GESTALT (Getting Educational System Talk Across Leading Edge Technologies). Building on this experience, new improvement came from the European funded project GUARDIANS (Gateway for User Access to Remote Distributed Information And Network Services). However, due to the lack of support for native XML databases and XML querying languages, search facilities were limited. This paper builds upon the achievements of both projects and proposes a solution for XML querying in XQuery.

Title:

THE KINDS OF IT SERVICES MOST APPROPRIATE FOR A PARTICULAR SOURCING STRATEGY

Author(s):

Patrick Wall, Larry Stapleton

Abstract: IT processes and services often differ with regard to which sourcing strategies suits them best. The significance of IT within any given organization and the ability of that organization to provide an efficient and innovative information system on its own often determines what sourcing strategy it chooses. However, it is viewed as a better strategy to identify certain IT processes that can be maintained internally and then outsource those that the firm sees would be maintained better by an external vendor. This paper identifies the most commonly insourced, outsourced and selectively sourced IT activities and then asks the question of why is this the case.

Title:

ERP IMPLEMENTATION, CROSS-FUNCTIONALITY AND CRITICAL CHANGE FACTORS

Author(s):

Rolande Marciniak, Redouane El Amrani, Frantz Rowe, Marc Bidan, Bénédicte  Geffroy-Maronnat

Abstract: ERP (Enterprise Resource Planning) systems are characterised by particular features such as functional coverage, interdependent relationships, single database and standard management and processing rules; all of which are capable of bringing about various degrees of change within the company and, potentially, encourage a more cross-functional overview of it. However, few quantitative studies have been conducted to measure these effects. This is the background to this paper, which studied 100 French companies to arrive at the following assessment of ERP adoption. It then goes on to test the relationships between the factors influencing the ERP lifecycle ((preparation (organizational vision, process re-engineering), engineering (specific developments), implementation strategy (functional coverage and speed)), the perception of a more cross-functional overview of the company and, more globally, the scope of the change this technology brings about within the company. All these factors play significant roles, with functional coverage appearing to be a particularly important consideration, which should be addressed in future research.

Title:

LAB INFORMATION MANAGEMENT SYSTEM FOR QUALITY CONTROL IN WINERIES

Author(s):

Manuel Urbano Cuadrado, Maria Dolores Luque de Castro, Pedro Perez Juan

Abstract: The great number of analysis necessary to carry out during the wine production, as well as the storage, treatment and careful study and discussion of the data these analyses provide is of paramount importance for taking correct decisions for a better quality of both the winery and the wine it produces. We describe a system devote to overall management of information generate in the wine production processes. The system based on otirntation to objects technology allows quality control of the wine production in wineries and enables the integration of semiautomated and automated analytical processes.

Title:

INFORMATION SYSTEMS IN MEDICAL IMAGERY: CASE OF THE HOSPITAL OF BAB EL OUED

Author(s):

Abdelkrim MEZIANE

Abstract: The digital medical images got by the different existing modalities, and processed by powerful computers, became a very powerful means of diagnosis and economy. In Algeria, the patient is responsible of the images which are delivered to him. These images are most of the time, lost, not identified (name, date,…), or simply damaged for many reasons. Doctors and radiologists are sometimes, if not most of the time, obliged to ask the same patient to make the same radiography several times. The Algerian park of medical images tools is not well known or exhaustively assessed. The Algerian government reserves an important part of its budget to health medical care. A part of this budget goes to complementary medical tests, such as very expensive images paid by the taxpayer. Some solutions do exist in order to reduce these costs, by investing a small amount of money at the beginning.

Title:

SHIFTING FROM LEGACY SYSTEMS TO A DATA MART AND COMPUTER ASSISTED INFORMATION RESOURCES NAVIGATION FRAMEWORK

Author(s):

Nikitas Karanikolas, Christos Skourlas

Abstract: Computer Assisted Information Resources Navigation (CAIRN) was specified, in the past, as a framework that allows the end-users to import and store full text and multimedia documents and then retrieve information using Natural Language or field based queries. Our CAIRN system is a general tool that has focused on medical information covering the needs of physicians. Today, concepts related to Data Mining and Data Marts have to be incorporated into such a framework. In this paper a CAIRN-DAMM (Computer Assisted Medical Information Resources Navigation & Diagnosis Aid Based On Data Marts & Data Mining) environment is proposed and discussed. This integrated environment offers: document management, multimedia documents retrieval, a Diagnosis–aid subsystem and a Data Mart subsystem that permits the integration of legacy system’s data. The diagnosis is based on the International Classification of Diseases and Diagnoses, 9th revision (ICD-9). The document collection stored in the CAIRN-DAMM system consists of data imported from the Hospital Information System (HIS), laboratory tests extracted from the Laboratory Information System (LIS), patient discharge letters, ultrasound, CT and MRI images, statistical information, bibliography, etc. There are also methods permitting us to propose, evaluate and organize in a systematic way uncontrolled terms and to propose relationships between these terms and ICD-9 codes. Finally, our experience from the use of the tool for creating a Data Mart at the ARETEION University Hospital is presented. Experimental results and a number of interesting observations are also discussed.

Title:

ON OPERATIONS TO CONFORM OBJECT-ORIENTED SCHEMAS

Author(s):

Alberto Abelló, Elena Rodriguez, Elena Rodríguez, Marta Oliva, José Samos, Fèlix Saltor, Eladio Garví

Abstract: To build a Cooperative Information System from several pre-existing heterogeneous systems, the schemas of these systems must be integrated. Operations used for this purpose include conforming operations, which change the form of a schema. In this paper, a set of primitive conforming operations for Object-Oriented schemas are presented. These operations are organized in matrixes according to the Object-Oriented dimensions -Generalization/Specialization, Aggregation/Decomposition- on which they operate.

Title:

A MULTI-LEVEL ARCHITECTURE FOR DISTRIBUTED OBJECT BASES

Author(s):

Markus Kirchberg

Abstract: The work described in this article arises from two needs. First, there is still a need for providing more sophisticated database systems than just relational ones. Secondly, there is a growing need for distributed databases. These needs are adressed by fragmenting schemata of a generic object data model and providing an architecture for its implementation. Key features of the architecture are the use of abstract communicating agents to realize database transactions and queries, the use of an extended remote procedure call to enable remote agents to communicate with one another, and the use of multi-level transactions. Linguistic reflection is used to map database schemata to the level of the agents. Transparency for the users is achieved by using dialogue objects, which are extended views on the database.

Title:

INVESTIGATING THE EFFECTS OF IT ON ORGAISATIONAL DESIGN VARIABLES , TOWARDS A THEORETICAL FRAMEWORK

Author(s):

Rahim Ghasemiyeh, Feng Li

Abstract: Over the past decades many papers have been published about the effects of Information Technology (IT) on organisations. However despite the facts that IT has become a fundamental variable for organisational design very few studies have been done to explore this vital issue in a systematic and convincing fashion. The small amount of information and few theories available on the effects of IT on organisational design is surprising. Also one major efficiency of previous studies is the lack of empirical evidence. This has led researchers to describe IT in general ways and resulted in different and very often contradictory findings. Many researchers have become very concerned about the shortfall of comprehensive study on organizational design and IT which has been apparent for decades. One objective of this research is to fill this gap. This study will investigate three questions, aiming to develop a theoretical framework to evaluate the effects of IT on organisational design,. What are the effects of IT on organisational design variables? How IT influences organisational design variables? Which effects are resulted from which IT technologies? These could be considered as the most important features of this study, which are different with respect to previous literature.

Title:

SERVICES PROVIDERS’ PATTERNS FOR CLIENT/SERVER APPLICATIONS

Author(s):

Samar TAWBI, Bilal CHEBARO

Abstract: In this paper, we define two patterns that fall under the category of the architectural patterns described in (Shaw, 1996), to provide solutions for client-server applications. The first pattern defines the structure of a client-server application by defining the server's functionality in the form of standardized services, and the second defines the structure of a service in this type of application. The solution follows the patterns’ definition prototype used in (Gamma, 1995).

Title:

A DISTRIBUTED JOB EXECUTION ENVIRONMENT USING ASYNCHRONOUS MESSAGING AND WEB TECHNOLOGIES

Author(s):

Rod Fatoohi, Nihar Gokhale

Abstract: This is a project for developing an asynchronous approach to distributed job execution of legacy code. A job execution environment is a set of tools used to run jobs, generated to execute a legacy code, and handles different input and output values for each run. Current job execution and problem solving environments are mostly based on synchronous messaging and customized API that needs to be ported to different platforms. Here we are introducing an Internet-based job execution environment using off-the-shelf J2EE (Java 2 Enterprise Edition) components. The environment allows the execution of computational algorithms utilizing standard Internet technologies such as Java, XML, and asynchronous communication protocols. Our environment is based on four-tier client/server architecture and uses Java messaging, for inter-process communication, and XML fro job specification. It has been tested successfully using several legacy simulation codes on pools of Windows 2000 and Solaris systems.

Title:

DRUID: COUPLING USER WRITTEN DOCUMENTS AND DATABASES

Author(s):

André  Flory, Frédérique  Laforest, Youakim BADR

Abstract: Most database applications capture their data using graphical forms. Text fields have limited size and predefined types. Although data in fields are associated with constrains, it should be modeled in a suitable way to conform to a rigid schema. Unfortunately, too much constrains on data are not convenient in human activities where most activities are document-centric. In fact, documents become a natural way for human production and consumption. Nowadays, an increased interest is put on managing data with irregular structures, exchanging documents over the net, and manipulating their contents as efficiently as with structured data. In this paper, we introduce DRUID, a comprehensive document capturing and wrapping system. It ensures flexible and well-adapted information capture based on a Document User Interface and at the same time information retrieval based on databases. DRUID relies on a wrapper that transforms documents contents into relevant data. Also, it provides an expressive specification language for end-users to write domain-related extraction patterns. We validate our information system with a prototype of different modules, the primary realization is promising for a wide range of applications that use documents as a mean to store, exchange and query information.

Title:

TOWARD A FRAMEWORK FOR MANAGING INTERNET-ORIENTED DATABASE RESOURCES

Author(s):

Guozhou Zheng, Chang Huang, Zhaohui Wu

Abstract: The term “Grid” is used to describe those architectures that manage the distributed resources across the Internet. This paper is intended to introduce the Database Grid, an Internet oriented resource management architecture for database resource. We identify the basic requirements on database in two major application domains: e-science and e-business. Next, we illustrate how a layered service architecture can fulfil these emerging data sharing and data management requirements from Grid computing application. We introduce a series of protocols to define the proposed services.

Title:

A FRAMEWORK FOR GENERATING AND MAINTAINING GLOBAL SCHEMAS IN HETEROGENEOUS MULTIDATABASE SYSTEMS

Author(s):

Rehab Duwairi

Abstract: The problem of creating a global schema over a set of heterogeneous databases is becoming more and more important due the availability of multiple databases within organizations. The global schema should provide a unified representation of local (possibly heterogeneous) local schemas by analyzing them (to exploit their semantic contents), resolving semantic and schematic discrepancies among them, and producing a set of mapping functions that translate queries posed on the global schema to queries posed on the local schemas. In this paper, we provide a general framework that supports the integration of local schemas into a global one. The framework takes into consideration the fact that local schemas are autonomous and may evolve over time, which makes the definition of the global schema obsolete. We define a set of integration operators that integrates local schemas, based on the semantic relevance of their classes, into a set of virtual classes that constitute the global schema. We also define a set of modifications that can be applied to local schemas as a consequence of their local autonomy. For every local modification, we define a propagation rule that will automatically disseminate the effects of that modification to the global schema without having to regenerate it from scratch via integration.

Title:

A SCALABLE DISTRIBUTED SEARCH ENGINE FOR INTRANET INFORMATION RETRIEVAL

Author(s):

Minoru Uehara, Minoru Udagawa, Yoshifumi Sakai, Hideki Mori, Nobuyoshi Sato

Abstract: Intranet information retrieval is very important for corporations in business. They are trying to discover the useful knowledge from hidden web pages by using data mining, knowledge discovery and so on. In this process, search engine is useful. However, conventional search engines, which are based on centralized architecture, are not suited for intranet information retrieval because intranet information is frequently updated. Centralized search engines take a long time to collect web pages by crawler, robots and so on. So, we have developed a distributed search engine, called Cooperative Search Engine (CSE), in order to retrieve fresh information. In CSE, a local search engine located in each Web server makes an index of local pages. And, a Meta search server integrates these local search engines in order to realize a global search engine. In such a way, the communication delay occurs at retrieval time. So, we have developed several speedup techniques in order to realize fast retrieval. As this result, we have succeeded in increasing the scalability of CSE. In this paper, we describe speedup techniques and evaluate them.

Title:

A WEB APPLICATION FOR ENGLISH-CHINESE CROSS LANGUAGE PATENT RETRIEVAL

Author(s):

Wen-Yuan Hsiao, Jiangping Chen, Elizabeth Liddy

Abstract: This paper describes an English-Chinese cross language patent retrieval system built on a commercial database management software. The system makes use of various software products and lexical resources for the purpose of helping English native speakers to search for Chinese patent information. This paper reports the overall system design and cross language information retrieval (CLIR) experiments conducted for performance evaluation. The experimental results and the follow-up analysis demonstrated that commercial database systems could be used as an IR system with reasonable performance. Better performance could be achieved if the translation resources were customized to the document collection of the system, or more sophisticated translation disambiguation strategies were applied.

Title:

TRIGGER-BASED COMPENSATION IN WEB SERVICE ENVIRONMENTS

Author(s):

Randi Karlsen, Thomas Strandenaes

Abstract: In this paper we describe a technique for implementing compensating transactions, based on the active database concept of triggers. This technique enables specification and enforcement of compensation logic in a manner that facilitates consistent and semi-automatic compensation. A web service, with its loosely-coupled nature and autonomy requirements, represents an environment well suited for this compensation mechanism.

Title:

AN ARCHITECTURE OF A SECURE DATABASE FOR NETWORKED COLLABORATIVE ACTIVITIES

Author(s):

Akira  Baba, Michiharu Kudo, Kanta Matsuura, Kanta Matsuura

Abstract: Open network can be used for many purposes, e-commerce or e-government, etc. Different from those conventional applications, we consider networked collaborative activities, for example networked research activities. This application might be very useful and research activities could be significantly promoted. However, we must care about many security problems. Among those problems, we focus on an architecture of a secure database in this paper. The design of such an architecture is not a trivial task, since the data sets in database could be composed of wide range of data types, and each data type needs to satisfy its own security properties, including not only security but also an appropriate management of intellectual-property right, and so on. Thus, we design an architecture of a secure database, considering data types and various security operations.

Title:

USING INFORMATION TECHNOLOGIES FOR MANAGING COOPERATIVE INFORMATION AGENT-BASED SYSTEMS

Author(s):

Nacereddine ZAROUR, Mahmoud BOUFAIDA, Lionel SEINTURIER

Abstract: One of the most important problems encountered by the cooperation among distributed infomation systems is that of heterogeneity that is often not easy to deal with. This problem requires the use of the best combination of software and hardware components for each organization. However, the few suggested approaches for managing virtual factories have not led to satisfaction. Along with motivating the importance of such systems, this paper describes the major design goals of agent-based architecture for supporting the cooperation of heterogeneous information systems. It also shows how this architecture can be implemented using the combination of XML and CORBA technologies. This combination guarantees the interoperability of legacy systems regardless respectiveley of their data models and platforms heterogeneity and, therefore, improves the cooperation process. Examples are given from the supply chains of manufacturing enterprises.

Title:

MODELING A MULTIVERSION DATA WAREHOUSE: A FORMAL APPROACH

Author(s):

Tadeusz Morzy, Robert Wrembel

Abstract: A data warehouse is a large centralized repository that stores a collection of data integrated from external data sources (EDSs). The purpose of building a data warehouse is: to provide an integrated access to distributed and usually heterogeneous information, to provide a platform for data analysis and decision making. EDSs are autonomous in most of the cases. In a consequence, their content and structure change in time. In order to keep the content of a data warehouse up to date, after source data changed, various warehouse refreshing techniques have been developed, mainly based on an incremental view maintenance. A data warehouse will also need refreshing after a schema of an EDS changed. This problem has, however, received little attention so far. Few approaches have been proposed and they tackle the problem by using mainly temporal extensions to a data warehouse. Such techniques expose their limitations in multi–period quering. Moreover, in order to support predictions of trends by decision makers what–if analysis is often required. For these purposes, multiversion data warehouses seem to be very promising. In this paper we propose a model of a multiversion data warehouse, and show our prototype implementation of such a multiversion data warehouse.

Title:

TRADDING PRECISION FOR TIMELINESS IN DISTRIBUTED REAL-TIME DATABASES

Author(s):

Bruno SADEG

Abstract: Many information systems need not to obtain complete or exact answers to queries submitted via a DBMS (Database Management System). Indeed, in certain real-time applications, incomplete results obtained timely are more interesting than complete results obtained late. When the applications are distributed, DBMSs on which these applications are based have a main problem of managing the transactions (concurrency control and commit processes). Since these processes must be done timely (such as each transaction meets its deadline), committing transactions timely seems to be the main issue. In this paper, we deal with the global distributed transaction commit and the local concurrency control problems in applications where transactions may be decomposed into a mandatory part and an optional part. In our model, the means to determine these parts is based on a weight parameter which is assigned to each subtransaction. It is used to help the coordinator process to execute the commit phase when a transaction is close to its deadline. An other parameter, the estimated execution time, is used by each participant site in combination with the weight to solve the possible conflicts that may occur between local subtransactions. The mechanisms used to deal with these issues is called RT-WEP (Real-Time-Weighted Early Prepare) protocol. Some simulation have made to compare RT-WEP protocol with two other protocols designed to the same purpose. The results have shown that RT-WEP protocol may be applied efficiently in a distributed real-time context by allowing more transactions to meet their deadlines.

Title:

A MODEL-DRIVEN APPROACH FOR ITEM SYNCHRONIZATION AND UCCNET INTEGRATION IN LARGE E-COMMERCE ENTERPRISE SYSTEMS

Author(s):

Santhosh Kumaran, Fred Wu, Simon Cheng, Mathews Thomas, Santhosh Kumaran, Amaresh Rajasekharan, Ying Huang

Abstract: The pervasive connectivity of the Internet and the powerful architecture of the WWW are changing many market conventions and creating a tremendous opportunity for conducting business on the Internet. Digital marketplace business models and the advancement of Web related standards are tearing down walls within and between different business artifacts and entities at all granularities and at all levels, from devices, operating systems and middleware to directory, data, information, application, and finally the business processes. As a matter of fact, business process integration (BPI), which entails the integration of all the facets of business artifacts and entities, is emerging as a key IT challenge. In this paper, we describe our effort in exploring a new approach to address the complexities of BPI. More specifically, we study how to use a solution template based approach for BPI and explore the validity of this approach with a frequently encountered integration problem, the item synchronization problem for large enterprises. The proposed approach can greatly reduce the complexities of the business integration task and reduce the time and amount of effort of the system integrators. Different customers are deploying the described Item Synchronization system.

Title:

DATA POSITION AND PROFILING IN DOMAIN-INDEPENDENT WAREHOUSE CLEANING

Author(s):

Ajumobi Udechukwu, Christie Ezeife

Abstract: A major problem that arises from integrating different databases is the existence of duplicates. Data cleaning is the process for identifying two or more records within the database, which represent the same real world object (duplicates), so that a unique representation for each object is adopted. Existing data cleaning techniques rely heavily on full or partial domain knowledge. This paper proposes a positional algorithm that achieves domain independent de-duplication at the attribute level. The paper also proposes a technique for field weighting through data profiling, which, when used with the positional algorithm, achieves domain-independent cleaning at the record level. Experiments show that the positional algorithm achieves more accurate de-duplication than existing algorithms.

Title:

OPTIMIZING ACCESS IN A DATA INTEGRATION SYSTEM WITH CACHING AND MATERIALIZED DATA

Author(s):

Bernadette Farias Lóscio, Ana Carolina Salgado, Maria da Conceição Moraes Batista

Abstract: Data integration systems are planned to offer uniform access to data from heterogeneous and distributed sources. Two basic approaches have been proposed in the literature to provide integrated access to multiple data sources. In the materialized approach, data are previously accessed, cleaned, integrated and stored in the data warehouse and the queries submitted to the integration system are evaluated in this repository without direct access to the data sources. In the virtual approach, the queries posed to the integration system are decomposed into queries addressed directly to the sources. The data obtained from the sources are integrated and returned to the user. In this work we present a data integration environment to integrate data distributed on multiple web data sources which combines features of both approaches supporting the execution of virtual and materialized queries. Other distinguished feature of our environment is that we also propose the use of a cache system in order to answer the most frequently asked queries. All these resources are put together with the goal of optimizing the overall query response time.

Title:

GLOBAL QUERY OPTIMIZATION BASED ON MULTISTATE COST MODELS FOR A DYNAMIC MULTIDATABASE SYSTEM

Author(s):

Qiang Zhu

Abstract: Global query optimization in a multidatabase system (MDBS) is a challenging issue since some local optimization information such as local cost models may not be available at the global level due to local autonomy. It becomes even more difficult when dynamic environmental factors are taken into consideration. In our previous work, a qualitative approach was suggested to build so-called multistate cost models to capture the performance behavior of a dynamic multidatabase environment. It has been shown that a multistate cost model can give a good cost estimate for a query run in any contention state in the dynamic environment. In this paper, we present a technique to perform query optimization based on multistate cost models for a dynamic MDBS. Two relevant algorithms are proposed. The first one selects a set of representative system environmental states for generating an execution plan with multiple versions for a given query at compile time, while the second one efficiently determines the best version to invoke for the query at run time. Experiments demonstrate that the proposed technique is quite promising for performing global query optimization in a dynamic MDBS. Compared with related work on dynamic query optimization, our approach has an advantage of avoiding the high overhead for modifying or re-generating an execution plan for a query based on dynamic run-time information.

Title:

A DATA, COMPUTATION, KNOWLEDGE GRID THE CASE OF THE ARION SYSTEM

Author(s):

Spyros  Lalis, Manolis  Vavalis, Kyriakos  Kritikos, Antonis  Smardas, Dimitris  Plexousakis, Marios Pitikakis, Catherine Houstis, Vassilis Christophides

Abstract: The ARION system provides basic e-services of search and retrieval of objects in scientific collections, such as, datasets, simulation models and tools necessary for statistical and/or visualization processing. These collections may represent application software of scientific areas, they reside in geographically disperse organizations and constitute the system content. The user may invoke on-line computations of scientific datasets when the latter are not found into the system. Thus, ARION provides the basic infrastructure for accessing and deriving scientific information in an open, distributed and federated system.

Title:

SCANNING A LARGE DATABASE ONCE TO MINE ASSOCIATION RULES

Author(s):

Frank Wang

Abstract: Typically 95% of the data in the transaction databases are zero. When it comes to sparse, the performance quickly degrades due to the heavy I/O overheads in sorting and merging intermediate results. In this work, we first introduce a list representation in main memory for storing and computing datasets. The sparse transaction dataset is compressed as the empty cells are removed Accordingly we propose a ScanOnce algorithm for association rule mining on the platform of list representation, which just needs to scan the transaction database once to generate all the possible rules. In contrast, the well-known Apriori algorithm requires repeated scans of the databases, thereby resulting in heavy I/O accesses particularly when considering large candidate datasets. Attributing to its integrity in data structure, the complete itemset counter tree can be stored in a (one-dimensional) vector without any missing gap, whose direct-addressing capability ensures fast access to any counter. In our opinion, this new algorithm using list representation economizes storage space and accesses. The experiments show that this ScanOnce algorithm beats classic Apriori algorithm for large problem sizes, by factors ranging from 2 to more than 6.

Title:

INTEGRATION OF DISTRIBUTED SOFTWARE PROCESS MODELS

Author(s):

Mohamed Ahmed-nacer, Nabila Lardjane

Abstract: Developing software-in-the-large involves many developers, with experts in various aspects of software development and in various aspects of the application area. This paper presents an approach to integrate software process models in a distributed context. It is based on the fusion of process fragments (components) defined with the UML notation (Unified Modelling Language). The integration methodology presented allows unifying the various fragments both at the static level as well as at the dynamic level (behavioural). We consider various possible semantic conflicts; formal definitions of the inter-fragments properties are formulated and solutions for these conflicts are proposed. This integration approach provides multiple solutions for the integration conflicts and gives the possibility to improve and design new software process models by a merging of reusable process fragments.

Title:

A BITEMPORAL STORAGE STRUCTURE FOR A CORPORATE DATA WAREHOUSE

Author(s):

Alberto Abelló, Carme Martín

Abstract: This paper brings together two research areas, i.e. Data Warehouses and Temporal Databases, involving representation of time. Looking at temporal aspects within a data warehouse, more similarities than differences between temporal databases and data warehouses have been found. The first closeness between these areas consists in the possibility of a data warehouse redefinition in terms of a bitemporal database. A bitemporal storage mechanism is proposed along this paper. In order to meet this goal, a temporal study of data sources is developed. Moreover, we will show how Object-Oriented temporal data models contribute to add the integration and subject-orientation that is required by a data warehouse.

Title:

TOWARD A DOCUMENTARY MEMORY

Author(s):

Christine JULIEN, Max CHEVALIER, Kais Khrouf

Abstract: An organisation must enable to share knowledge and information within its employees to optimise their tasks. However, the volume of information contained in documents represents a major importance for these companies. Indeed, companies may be fully reactive to any new information and must follow the fast evolution of spread information. So, a documentary memory, which store this information and allow end-user to access or analyse it, constitutes a necessity for every enterprise. We propose, in this paper, the architecture of such a system, based on a document warehouse, allowing the storage of relevant documents and their exploitation via the techniques of information retrieval, factual data interrogation and information multidimensional analysis.

Title:

DISTRIBUTED OVERLOAD CONTROL FOR REAL-TIME REPLICATED DATABASE SYSTEMS

Author(s):

Samia Saad-Bouzefrane, C. Kaiser

Abstract: In order to meet their temporal constraints, current applications such as Web-based services and electronic commerce use the technique of data replication. To take the replication benefit, we need to develop con-currency control mechanisms with high performance even when the distributed system is overloaded. In this paper, we present a protocol that uses a new notion called importance value which is associated with each real-time transaction. Under conditions of overload, this value is used to select the most important transactions with respect to the application transactions in order to pursue their execution ; the other transactions are aborted. Our protocol RCCOS (Replica Concurrency-Control for Overloaded Systems) augments the protocol MIR-ROR, a concurrency control protocol designed for firm-deadline applications operating on replicated real-time databases in order to manage efficiently transactions when the distributed system is overloaded. A platform has been developped to measure the number of transactions that meet their deadlines when the processor load of each site is controlled.

Title:

INCREMENTAL HORIZONTAL FRAGMENTATION OF DATABASE CLASS OBJECTS

Author(s):

Christie Ezeife, Pinakpani Dey

Abstract: Horizontal fragments of a class in an object-oriented database system contain subsets of the class extent or instance objects. These fragments are created with a set of system input data consisting of the application queries, their access frequencies, the object database schema with components - class inheritance and class composition hierarchies as well as instance objects of classes. When these system input to the fragmentation process change enough to affect system performance, a re-fragmentation is usually done from scratch. This paper proposes an incremental re-fragmentation method that uses mostly the updated part of input data and previous fragments to define new fragments more quickly, saving system resources and making the data at distributed sites more available for network and web access.

Title:

GEONIS - FRAMEWORK FOR GIS INTEROPERABILITY

Author(s):

Leonid Stoimenov, Slobodanka Djordjevic-Kajan

Abstract: This paper presents research in Geographic Information Systems interoperability. Also, paper describes our work in development, introduces interoperability framework called GeoNis, which uses proposed technologies to perform integration task between GIS applications and legacy data sources over the Internet. Our approach provides integration of distributed GIS data sources and legacy information systems in local community environment.

Title:

BUSINESS CHANGE IMPACTS ON SYSTEM INTEGRATION

Author(s):

Fabio Rollo, Gabriele Venturi, Gerardo Canfora

Abstract: Large organizations have disparate legacy systems, applications, processes, and data sources, which interact by means of various kinds of interconnections. Merging of companies can increase the complexity of system integration, with the need to integrate applications like Enterprise Resource Planning and Customer Relationship Management. Even if sometimes these applications provide a kind of access to their underlying data and business logic, Enterprise Application Integration (EAI) is still a challenge. In this paper we analyse the needs that drive EAI with the aim of identifying the features that EAI platforms must exhibit to enable companies to compete in the new business scenarios. We discuss the limitations of current EAI platforms and their evaluation methods, mainly economies of scale and economies of scope, and argue that a shift is needed towards the economies of learning model. Finally, we outline an EAI architecture that addresses current limitations enabling economies of learning.

Title:

TECHNICAL USE QUALITY IN A UNIVERSITY ENTERPRISE RESOURCE PLANNING SYSTEM: PERCEPTIONS OF RESPONSE TIME AND ITS STRATEGIC IMPORTANCE

Author(s):

Michelle Morley

Abstract: Enterprise Resource Planning Systems (ERPs) are large, complex enterprise-wide information system that offer benefits of integration and data-richness to organisations. This paper explores the quality issue of response times, and the impact of poor response times on the ability of the organisation studied to achieve their strategy. The PeopleSoft ERP was implemented within the International Centre (for international student recruitment and support) at an Australian University, as part of a University-wide implementation. To achieve the goal of increased international student enrolments, fast turnaround times on student applications are critical. The ERP offers poor response times and this makes it difficult for the International Centre to achieve high conversion rates (from applications to enrolments) and hence reduces the perceived value, or ‘business quality’ (Salmela 1997), of the system to the organisation. The paper uses a quality model developed from Eriksson and Toern’s (1990) SOLE model, Lindroos’ (1997) Use Quality and Salmela’s (1997) Business Quality model.

Title:

INTEGRATING AUTOMATION DESIGN INFORMATION WITH XML

Author(s):

Seppo Kuikka, Mika Viinikkala

Abstract: Due to the number of parties participating in the design phase of an automation project, various design, engineering and operational systems are needed. At the moment, the means to transfer information from one system to another system, so that it can be further processed or reused, are not efficient. An integration approach in which XML technologies are utilized for implementing systems integration is introduced. Data content of systems are defined by XML Schema instances. XML messages containing automation design information are transformed using transformation stylesheets employing a generic standard vocabulary. Loosely coupled, platform independent, data content-oriented integration is enabled by XML technologies. A case study that proceeds according to the approach is also described. It consists of both a software prototype responsible for communication and data content including XML Schema instances and transformation stylesheets for the systems covered in the study. It is found that XML technologies seem to be a part of the right solution. However, some issues related to schema design and transformations are problematic. If complex systems are integrated, XML technologies alone are not sufficient. Future developments include a general purpose web-service solution that is to answer questions that were not dealt with by this case study.

Title:

IMPRECISION BASED QUERIES OVER MATERIALIZED AND VIRTUAL INTEGRATED VIEWS

Author(s):

Alberto Trombetta, Danilo Montesi

Abstract: The Global-As-View approach to data integration has focused on the (semi-automatic) definition of a global schema starting from a given set of known information sources. In this paper, we investigate how to employ concepts and techniques to model imprecision in defining mappings between the global schema and the source schemas and to answer queries posed over the global schema. We propose an extended relational algebra using fuzzy sets for defining SQL-like query mappings. Such mappings explicitly take into account the similarities between global and source schemas to discard source data items with low similarity and to express the relevance of different sources in populating the global schema. In the case the global schema is not materialized, we propose a query rewriting technique for expressing over the sources the queries posed over the global schema

Title:

THE HAMLET DILEMMA ON EXTERNAL DATA IN DATA WAREHOUSES

Author(s):

Mattias Strand, Marcus Olsson

Abstract: Data warehouses are currently given a lot of attention; both by academics and practitioners, and the amount of literature describing different aspects of data warehousing is ever-increasing. Much of this literature is covering the characteristics and the origin of the data in the data warehouse and the importance of external data is often pinpointed. Still, the descriptions of external data are on a general level and the extent of external data usage is not given much attention. Therefore, in this paper, we describe the results of an interview study, partly aimed at outlining the current usage of external data in data warehouses. The study was directed towards Swedish data warehouse developers and the results shows that the usage of external data in data warehouses is not as frequent as expected. Only 58 % of the respondents had been working in projects that had an objective of integrating external data. Reasons given for rather low usage were problems on assuring the quality of the external data and lack of data warehouse maturity amongst the user organizations.

Title:

PERFORMANCE IMPROVEMENT OF DISTRIBUTED DATABASE MANAGEMENT SYSTEMS

Author(s):

Josep Maria Muixi, August Climent

Abstract: Distributed databases offer a complete range of desirable features: availability, reliability, and responsiveness. However, all of these benefits are at the expense of some extra management; main issues considered in literature as the base of a tuned distributed database system could be data replication and synchronization, concurrency access, distributed query optimization or performance improvement. Work presented here tries to provide some clues to the last point considering an issue which has not been taken enough into account under our humble opinion: load balancing of these distributed systems. It is tried to be shown how the right load balancing policy influences the performance of a distributed database management system, and more concretely a shared-nothing one.

Title:

EMPIRICAL VALIDATION OF METRICS FOR UML STATECHART DIAGRAMS

Author(s):

David Miranda, Marcela Genero, Mario Piattini

Abstract: It is widely recognised that the quality of Object Oriented Software Systems (OOSS) must be assessed from the early stages of their development. OO Conceptual models are key artifacts produced at these early phases, which cover not only static aspects but also dynamic aspects. Therefore, focusing on quality aspects of conceptual models could contribute to produce better quality OOSS. While quality aspects of structural diagrams, such as class diagrams, have being widely researched, the quality of behavioural diagrams such as statechart diagrams have been neglected. This fact leaded us to define a set of metrics for measuring their structural complexity. In order to gather empirical evidence that the structural complexity of statechart diagrams are closed with their understandability we carried out a controlled experiment in a previous work. The aim of this paper is to present a replication of that experiment. The findings obtained in the replication corroborate the results of the first experiment in the sense that at some extent, the number of transitions, the number of states and the number of activities influence statechart diagrams understandability.

Title:

A SOLUTION FOR CONTEXTUAL INTEGRATION BASED ON THE CALCULATION OF A SEMANTIC DISTANCE

Author(s):

Fabrice JOUANOT, Kokou Yétongnon, Nadine Cullot

Abstract: To achieve the interoperation of heterogeneous data sources with respect to their context and rich semantics keeps yet a real challenge. Users need to integrate useful information and query coupled data sources in a transparent way. We propose a solution to help the integration of heterogeneous sources according to their context. We present a model to define contextual information associated to local data and a mechanism which uses this semantics to compare local contexts and integrate relevant data. Our contextual integration approach, using a rule based language, allows us to build virtual objects in a semi-automatic way. They play roles of transparent interfaces for end-users.

Title:

DATA WAREHOUSE – PROCESS TO DEVELOP

Author(s):

Prasad  N. Sivalanka , Rakesh Agarwal

Abstract: Building a data warehouse involves complex details of analysis and design of an enterprise-having wide decision support system. Dimensional modeling can be used to design effective and usable data warehouses. The paper highlights the steps in the implementation of data warehouse in a client project. All the observations and phases mentioned in this document are with reference to the project carried out for medium-to-large multi-dimensional databases for a client in a controlled test environment. The recommendations, conclusions and observations made in this document may not be generalized for all cases unless verified and tested.

Title:

CREATING THE DOCSOUTH PUBLISHER

Author(s):

Tony Bull

Abstract: In this Case Study, a Systems Integration problem is solved using Object-Oriented Perl, XML/XSLT, and Java. Over the last two years, the world-renowned Digitization Project ‘Documenting the American South’ has been slowly converting its SGML-based Legacy system to an XML-centric system. As of September 2002, the “DocSouth Publisher” has been the latest change in realizing the new XML environment.

Title:

A COMPARISON OF DATABASE SYSTEMS FOR STORING XML DOCUMENTS

Author(s):

Roger Davies, Miguel Mira da Silva, Rui Cerveira Nunes

Abstract: As the need to store large quantities of increasingly complex XML documents augments, the requirements for database products that claim to support XML also increases. For example, it is no longer acceptable to store XML documents without using indices for efficient retrieval of large collections. In this paper we analyse the current versions of products representing the three main approaches to XML storage: native XML databases, XML support by relational databases, and object-oriented databases with XML support. Several products are analysed and compared, including performance tests. Our main conclusion is that the market urgently needs a standard query language and API, analogous to SQL and ODBC, which were probably the main drivers for the success of relational databases.

Title:

AUTOMATED DATA MAPPING FOR CROSS ENTERPRISE DATA INTEGRATION

Author(s):

Stefan Böttcher, Sven  Groppe

Abstract: Currently, there are multiple different classifications for product descriptions used in enterprise-internal applications and cross-enterprise applications, e.g. E-procurement systems. A key problem is to run applications developed for one catalogue on product descriptions that are stored in a different classification. A common solution is that a catalogue specialist manually maps different classifications onto each other. Our approach avoids unnecessary manual work for mapping and automatically generates mappings between different classifications wherever possible. This allows us to run E-procurement applications on different catalogues with a fairly reduced manual work needed for mapping, what we consider to be an important step towards enterprise application integration.

Title:

XML-BASED OLAP QUERY PROCESSING IN A FEDERATED DATA WAREHOUSES

Author(s):

Wolfgang Essmayr, Edgar Weippl, Johannes Huber, Oscar  Mangisengi

Abstract: Today, XML is the format of choice to implement interoperability between systems. This paper addresses the XML-based query processing for heterogeneous OLAP data warehouses in a federated architecture. In our approach, XML, as an intermediary representation, can be used as a basis for federated queries and queries for local OLAP data warehouses, whereas XML DTD can be used for query language definition and validation of a XML federated query.

Title:

THE ENHANCED GREEDY INTERCHANGE ALGORITHM FOR THE SELECTION OF MATERIALIZED VIEWS UNDER A MAINTENANCE COST CONSTRAINT IN DATA WAREHOUSES

Author(s):

Omar Karam, Osman Ibrahim, Rasha Ismail, Mohamed El-Sharkawy

Abstract: A Data Warehouse is a central repository of integrated information available for the purpose of efficient decision support or OLAP queries. One of the important decisions when designing a data warehouse is the selection of views to materialize and maintain in a data warehouse. The goal is to select an appropriate set of materialized views so as to minimize the total query response time and the cost of maintaining the selected views under the constraint of a given total view maintenance time. In this paper, the maintenance cost is incorporated to the Greedy Interchange Algorithm (GIA). The performance and behavior of the Greedy Algorithm considering the maintenance costs (GAm) and the proposed Greedy Interchange Algorithm considering maintenance cost (GIAm) are examined through experimentation. The GIAm improves the results over the GAm by 56.5%, 60.6% and 80% for different maintenance time constraints 100%, 75% and 40% of total maximum maintenance time. An enhancement to the GIAm is proposed, the enhancement introduced depends on selecting a subset of views to which the GIA is applied rather than all the views of a view graph. This selection is based upon views dependencies and result in substantial run time.

Title:

RANKING AND SELECTING COMPONENTS TO BUILD SYSTEMS

Author(s):

Alberto Sillitti, Paolo Predonzani, Giampiero Granatella, Tullio Vernazza, Giancarlo Succi

Abstract: Component-Based Software Engineering (CBSE) allows developers to build systems using existing components. Developers need to find the best set of components that implements most of required features. Retrieving components manually can be very complicated and time expensive. Tools that partially automate this task help developers to build better systems with less effort. This paper proposes a methodology for ranking and selecting components to build an entire system instead of retrieving just a single component. This methodology was developed in the European project CLARiFi (CLear And Reliable Information For Integration).

Title:

A CASE STUDY FOR A QUERY-BASED WAREHOUSING TOOL

Author(s):

Rami Rifaieh, Nabila Aicha Benharkat

Abstract: Data warehousing is an essential element of decision support. In order to supply a decisional database, meta-data is needed to enable the communication between various function areas of the warehouse and, an ETL tool (Extraction, Transformation, and Load) is needed to define the warehousing process. The developers use a mapping guideline to specify the ETL tool with the mapping expression of each attribute. In this paper, we will define a model covering different types of mapping expressions. We will use this model to create an active ETL tool. In our approach, we use queries to achieve the warehousing process. SQL queries will be used to represent the mapping between the source and the target data. Thus, we allow DBMS to play an expanded role as a data transformation engine as well as a data store. This approach enables a complete interaction between mapping meta-data and the warehousing tool. In addition, this paper investigates the efficiency of a Query-based data warehousing tool. It describes a query generator for reusable and more efficient data warehouse (DW) processing. Besides exposing the advantages of this approach, this paper shows a case study based on real scale commercial data to verify our tool features.

Title:

EXTENDING TREE AUTOMATA TO MODEL XML VALIDATION UNDER ELEMENT AND ATTRIBUTE CONSTRAINTS

Author(s):

D. Laurent, D. Duarte, B. Bouchou, Mírian Halfeld Ferrari Alves

Abstract: Algorithms for validation play a crucial role in the use of XML as the standard for interchanging data among heterogeneous databases on the Web. Although much effort has been made for formalizing the treatment of elements, attributes have been neglected. This paper presents a validation model for XML documents that takes into account the element and attribute constraints imposed by a given DTD. Our main contribution is the introduction of a new formalism to deal with both kinds of constraints. We deem that our formalism has interesting characteristics: it allows dealing with finite trees with attributes and elements, it is simple, since it is just an extension of regular tree automata and it allows the construction of a deterministic automaton having the same expression power as that of a DTD. Moreover, our formalism can be implemented easily, giving rise to an efficient validation method.

Title:

AN ARCHITECTURAL FRAMEWORK FOR WEB APPLICATIONS

Author(s):