Advances in Computers Chapter

Published by Tomas Vitvar on April 28, 2009 in Publications, Research

Elsevier published our work on Semantic Web Services with Lightweight Descriptions of Services in its Advances in Computers, volume 76 (co-authored by me, Jacek Kopecky, Jana Viskova, Adrian Mocan, Mick Kerrigan and Dieter Fensel). In order to publish in this book, authors need to receive an invitation from Elsevier and we were glad to receive this invitation in 2008. This book’s edition is in general about semantic web, its foundations and applications such as social web.

It is nice to see that apart from our contribution, there is also a chapter from my former collegues from DERI Galway, John Breslin et al., on The Future of Social Websites: Sharing Data and Trusted Applications with Semantics.

Google’s Plans on Semantics

Published by Tomas Vitvar on April 17, 2009 in Research

Google’s VP research, Alfred Spector, reveals plans on exploiting the huge amount of data for building a “database of concepts and relationships between them” for better search results. He envisions that Google should be able to learn such information from interactions and a very large of information, a different approach to a traditional AI where such an ontology is usually imposed to a system and controlled by an expert.

The use of such ontology is then quite obvious:

Let’s imagine our search software is responding to a query on pets, but we find articles on dogs and cats, but without the word pets. This database of relationships would let Google know that the article is probably about pets because there are multiple instances of a subcategory of “pet.” The database would enable much better search and better language translation because there’d be a better understanding of the meaning of the words.

From this interview it is clear that Google recognizes the importance of semantics for search, however, the major challenge is in construction of ontologies describing a huge and a dynamic environment on which such intelligent search would reliably operate.

Our Innovative B2B SOA Solution Wins €700 Prize

Published by Tomas Vitvar on February 6, 2009 in Publications, Research

Our solution, an innovative SOA technology that we apply to solve various B2B real-world scenarios wins the prize of 700€ at the industrial track of the ASWC in Bangkok, Thailand. The development of the underlying technology (called WSMX) started in early 2004 as part of the DERI and STI R&D activities, and starting from 2006 we significantly improved the technology by solving various real-world scenarios from B2B as defined by the SWS Challenge initiative. Apart from implementation and showcasing the benefits of the technology applied to the B2B (see details here), we also have a significant number of research papers that came out from this activity:

My collaboration with Maciej Zaremba, co-architect and software engineer of the solution, turned out to be very successful: we managed not only to build the working solution for the B2B challenge scenarios but also publish the significant number of articles in very competitive research environments (acceptance rates  usually range from 15 to 20 percent).

Excluding Self-traffic from your Website’s Access Reports

Published by Tomas Vitvar on January 13, 2009 in Ideas, Programming

I use Google Analytics to track my website’s traffic. Since I also often use information on my website to quickly search for a paper I wrote and refer to it, I access my website quite often. From this reason I need to exclude my own traffic from reports generated by Goolge Analytics. Google Analytics comes with two suggestions how to do this, however, neither of them is really suitable for me. The first option is to exclude all traffic based on one or more IPs. I can set a filter to exclude a traffic from my work as well as home networks, however, I do not really want to exclude a traffic from my colleagues which such a filter would do too (my work place has a single public IP shared by all outgoing connections). The other option is to set a variable (cookie) on all pages you want to exclude and create a filter based on that variable. This option is not any better as I would need to always set such variable on every new page I create, that is, call a specific javascript method when page loads, deploy the page to my web server, access the page to set the cookie for my own access, and then remove the javascript method call and redeploy the new page for public access.

Fortunately, a very simple solution came into my mind (currently only works on Firefox). First, I create a custom variable in the Firefox configuration settings called general.useragent.extra.private and set its value to my_agent” (please use your own unique identification, type about:config in your browser’s address bar to create and set such variable). This will add the “my_agent” string to the browser’s agent identification that you can read from within the javascript in the browser by userAgent property of the navigator object (navigator.userAgent). After that you can just add a simple condition to your page’s javascript code that calls Google Analytics methods to track your page’s traffic. The script could look like:

<script type="text/javascript">
    if (navigator.userAgent.indexOf('my_agent') == -1) {
        var pageTracker = _gat._getTracker("UA-xxx");
        pageTracker._trackPageview();
    }
</script>
Update: In Safari, you can set a custom user agent string by enabling Develop menu (goto "Preferences->Advanced->Show Develop Menu in Menu Bar") and in the develop menu "User Agent->Other..." set the user agent string.    

hRESTS — a Microformat for RESTful Services

Published by Tomas Vitvar on December 1, 2008 in Publications, Research

Our work on hRESTS, a microformat to describe RESTful services (co-authored by Jacek Kopecky, Karthik Gomadam and me) has been accepted to a Web Intelligence conference to be held in Sydney, Australia this year in December (the acceptance rate was around 18%). You can download the full paper here.

The value of today’s Web applications is no longer only in providing content to consumers but also in exposing functionality through public APIs designed for machine consumption. Typically, both Web applications and APIs today follow the Web architecture style called REST, and public APIs on the Web are often called “RESTful Web services”. The major problem with today’s RESTful APIs is that they are usually only described in a plain, unstructured HTML documentation useful only to a human developer. From this reason, finding suitable services, composing them (“mashing them up”), mediating between different data formats etc. are currently completely manual tasks.

hRESTS is a microformat for machine-readable descriptions of Web APIs, backed by a simple service model. In general, a microformat is an approach for annotating human-oriented Web pages so that key information is machine-readable. On top of microformats, GRDDL is a mechanism for extracting RDF information from Web pages, particularly suitable for processing microformats. There are already microformats for contact information, geographic coordinates, calendar events, etc.

Figure above depicts the model that the hRESTS uses for HTML annotation. It is derived from the fact that every web application using hyperlinks for linking application’s pages can be seen as a service. Obviously, not every web application can be considered as a RESTful service as it does not necessarily follow the REST architecture style. There are a lot of examples of badly designed RESTful architectures, such as here. We use RDF to represent the model that can be further extended with additional information such as WSMO-Lite service ontology (see my previous post and our paper about WSMO-Lite). In its basic form, the hRESTS annotation for a hotel service is shown below.

In order to extract the meta-data from the annotated HTML document using hRESTS, one needs to know the hRESTS annotation mechanism. For this purpose and in accordance with GRDDL we also provide a XSLT stylesheet that extracts the meta-data in RDF from XHTML pages. You can download the XSLT stylesheet here.

Once the hRESTS is used by RESTful service providers, one can easily build a focused search engine for RESTful services, for example by using Yahoo! BOSS web search in analogical way like BOSS web search can be used for e.g. searching LinkedIn public profiles annotated with hResume microformat. We further plan to submit the hRESTS microformat to microformats.org as well as build extenisions towards semantic annotations which we call MicroWSMO.

Formal Model for Semantic-Driven Service Execution

Published by Tomas Vitvar on August 8, 2008 in Publications, Research

Our work on formal model for semantic-driven service execution (co-authored by me, Adrian Mocan, and Maciej Zaremba) will be published in proceedings of the 7th International Semantic Web Conference (ISWC) to be held in Karlsruhe, Germany in November this year (acceptance rate 16%). You can access the full paper here.

In this work we define a model and an algorithm for execution of services which interfaces are modeled using Abstract State Machines (ASM) that use ontological concepts for their vocabularies (we call this description a choreography). Ontologically-enhanced ASM allows to model services’ interfaces with more descriptive information (as opposed to e.g. interfaces in WSDL only defining a set of operations with input and output messages and message exchange patterns for those operations). In this work we build additional layer of ASM descriptions on top of WSDL descriptions (XML Schema and Interface) and show how a conversation between two services can be executed.

The important aspect of service execution is to maintain services’ interoperability at the data and process levels. Data interoperability needs to be ensured when services use different information models used to define services’ input and output messages, and process interoperability needs to be ensured when one service expects to exchange messages in an order that is not directly matching the order of the other service. We illustrate the usage of the execution model on the case scenario implemented on our WSMO, WSML, and WSMX technologies as the figure below depicts.

The scenario describes a mediation problem defined by the SWS Challenge initiative. In the scenario, a trading company, called Moon, uses a Customer Relationship Management system (CRM) and an Order Management System (OMS) to manage its order processing. Moon has signed agreements to exchange Purchase Order (PO) messages with a company called Blue using the RosettaNet standard PIP3A4. There are two interoperability problems in the scenario: At the data level, the Blue uses PIP3A4 to define the PO request and confirmation messages while Moon uses a proprietary XML Schema for its OMS and CRM systems. At the process level, the Blue follows PIP3A4 Partner Interface Protocol (PIP), i.e. it sends out a PIP3A4 PO message, including all items to be ordered, and expects to receive a PIP3A4 PO confirmation message. On the other hand, various interactions with the CRM and OMS systems must be performed in Moon in order to process the order, i.e. get the internal ID for the customer from the CRM system, create the order in the OMS system, add line items into the order, close the order, and send back the PO confirmation.

By using ASMs and ontologies we automatically adjust the order of messages conforming to both services’ descriptions while at the same time we resolve data interoperability conflicts by using ontology alignments between services’ information models. The following Figure depicts the model in a form of a state diagram.

In our paper we describe in detail the algorithm for services execution operating according to the model depicted in the figure above. The input for the algorithm are two services both having defined their choreographies as ontologized ASMs over WSDL interface operations (in addition, a grounding between ASM rules and underlying WSDL interface operations is also defined), and ontology alignments between both services’ ontologies. In a nuthshell, when a message is available from one service, the algorithm process the message, that is, it transforms the message to the others service’s ontology and places the message for evaluation by its choreography.

It is important to note, that our algorithm works well in cases when the two choreographies are compatible, that is, interoperability can be achieved by adjusting the order of messages. The algorithm in its current form will not work when there is a message requested by one service that the other service never sends. In such cases, the message would have to be created on the fly based on some background knowledge that the algorithm should posses while at the same time solutions for such problems would need to build on a concrete semantics of messages. For example, the algorithm should know that a message is an acknowledgment message, etc. These questions remain open for our future research.

Planned Book on Semantic Technologies for E-Government

Published by Tomas Vitvar on July 29, 2008 in Publications, Research

Our proposal for the book entitled “Semantic Technologies for E-Government: An European Perspective” has been accepted by Springer and we are currently in process of signing a contract. This is the edited book having three editors, Vassilios Peristeras (DERI, Galway, Ireland), Tomas Vitvar (STI Innsbruck, Austria) and Konstantinos Tarabanis (CERTH, Thessaloniki, Greece).

The goal of the book will be to describe current status of research and development in e-government empowered by semantic technologies, mainly done in the context of the EU-funded R&D projects. There is a number of contributing authors to this book that include researchers and practitioners from academia and industry around Europe. In addition, we will have an overview of activities in the US.

Apart from the book editing, we will also contribute to the book with major results of our EU FP6 SemanticGov project. In particular, we will describe an architecture for integration of cross-border e-government services based on the Semantic Web Services architecture and Public Administration ontology based on the Web Service Modeling Ontology (WSMO) conceptual model called WSMO-PA.

The book should be available in mid of 2009.

URL-identifiable Content with AJAX

Published by Tomas Vitvar on July 15, 2008 in Programming

AJAX (Asynchronous JavaScript and XML) is a great technology which allows for the rich user experience with web browsers. With help of AJAX the page does not need to be reloaded every time the user requests a new content from a server. Instead, only relevant parts of the page related to the user’s request can be fetched at the background. This leads to a better performance of web applications and response times.

Although dynamically fetched content using AJAX is very much useful and popular today, there are some drawbacks. One of those drawbacks is to maintain the application state: when the user requests the page using some URL, the server sends that page and browser displays it along with its valid URL. Then, the user performs an action fetching a new content from the server which in turn the AJAX script dynamically embeds in the original page. This changes the content of the page (state of the application), however, the URL identifying the resource currently loaded in the browser refers to the original page’s state. From obvious security reasons, it is not allowed to dynamically modify the URL using AJAX without changing the browser’s content.

One of the main problems related to this drawback is that it is not possible to refer to such dynamic pages, for example, you cannot pass a reference to such pages to others or you cannot bookmark a particular application state, unless you implement additional features which allow you to do so. Below I will show how I have solved this problem in my web site.

If you look at my list of publications (see the picture above), there are two tabs: “Selected” and “All Publications”. By clicking on a tab the AJAX script will dynamically fetch the content of the tab and embeds it in the original page with no change in the URL of the browser. To identify the new content, I have included a “link to this page” at the right-hand side of the page which the AJAX script modifies whenever the user clicks on a tab. The URL which identifies the new content is simple, it adds the query part to the URL in a form of “?t=tabId” where tabId is the id of the newly generated content.
The tabs are represented using HTML <ul> and <li> tags as follows.

1
2
3
4
<ul id="tabul">
	<li><a id="tab1" href="javascript:changeTab(...)">Selected</a></li>
	<li><a id="tab2" href="javascript:changeTab(...)">All Publications</a></li>
</ul>

Every tab is represented using the anchor element which points to a java script function which, when activated by clicking on the anchor element, changes the tab, i.e. it loads the tab content from the server, loads the content to appropriate element in the page and takes care of changing the URL of the “link to this page” anchor; for brevity this script is not shown here.

In order to correctly process the “Link to this page” URL with query parameter in the browser window, I have implemented a simple script which looks for a tab parameter and its value in the URL every time the page is loaded. When such parameter is found, the script automatically activates the appropriate tab. The code below shows this script.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
   <script event="onload" for="window" language="javascript">
      // parse the URL using the regular expression and get the tab id
      /^\?t=(.*)$/.test(window.location.search);
 
      // get the id of the element (tab) that the parameter is referring to
      e = document.getElementById(RegExp.$1);
 
      // if the element is found, evaluate the value of the href element's attribute
      if (e != null)
         eval(e.href);
      else
      // if the element is not found, evaluate the value of
      // the href attribute of the first tab in the list of tabs
      eval(document.getElementById('tabul').
         childNodes.item(0).childNodes.item(0).href);
   </script>

The script first parses the URL search string and gets the id of the tab element. If the tab element is found, it evaluates its href attribute (in other words, the script simulates a click on the tab’s anchor element). If the the tab element is not found, the script activates the first tab in the list. Using this approach, you can now easily refer to the content of any of the tabs, to my selected publications as well as to all publications.

URL identifiable content in AJAX is important when you want to bookmark the application state or pass a reference to others. The solution I have shown in this post is part of the system called Konamara that I develop for web site management. I plan to enhance the system with various features of Web 2.0 as well as Semantic Web hence building a test-bed for various research activities on which I work or plan to work. I plan to reveal its source code as an open-source in the near future.

Success of a PhD Endeavor

Published by Tomas Vitvar on July 11, 2008 in Ideas, Research

I have asked myself a question when observing several PhD students, those I have been supervising as well as friends or colleagues I have been working with: What are successful factors in completing a PhD degree? There is no ideal student who knows what he/she wants from the very beginning till the end, however, there are several aspects that a student should learn in order to be well prepared to work as a research in the future. Although a PhD degree is awarded based on a successful defense of a PhD thesis — a report on research results that a student completes during his/her studies — a PhD student should be active in the community, publish papers in conferences and journals, and, most importantly, do some innovative work. A PhD thesis is thus a report describing results of a PhD work which reflects the student’s life.

In my view, you as a PhD student should ideally:

  • Know what to do. Before you start a PhD you should already know what your PhD should be about. This does not mean that the idea should be clear, but the direction of your work should be clear. This is usually dependent on a research group that you are working with, however, frequent changes of topics is not a good sign.
  • Be at the right place. It is important that you are affiliated with a right group that works in the area of your PhD topic. A student alone is unlikely to do a valuable research as research areas are usually so broad that one cannot cover. Talking to professors, attending meetings, lectures, etc. is the most important thing in getting enough background knowledge for your thesis.
  • Have the experience. Depending on the research topic, it is sometimes important that you have some experience with “real life”. Research is about creating new methods, technologies, or techniques which can be used for better solutions of real problems. Lack of real-world experience might cause resulting work to be “off the grounds”.
  • Have a motivation. Doing a PhD is a long way to go. Getting familiar with the field, learning how to publish and write, finding out gaps to solve, etc. are all important aspects of your research. It is very much easy to lose the motivation on the way in many aspects. You may feel that what you do does not have any value. You may feel that you cannot do any innovative work as what you do has already been done by hundreds of others. You may feel “down” once you get rejection of your paper from a workshop or a conference. At some point in time you will understand that all this is about understanding of how to do research, how to publish and write.
  • Keep deadlines. It is easy to say “there is enough time, I will do it tomorrow” or “there will be another opportunity to publish my paper”. Postponing your deadlines is a start of losing your way in doing your research and, most importantly, completing your thesis. External deadlines such as those set by conferences or journals are very important as you cannot change them, so keeping those always bring you a step forward.
  • Know your supervisor. A good supervisor is one of the most important things in your PhD. A supervisor is an expert in your field and gives you feedback to your intermediate results and teaches you the technical quality of your work. He/she should also provide you with the access to the community, that is, he/she should introduce you to people, research groups, and provide you with publishing opportunities (well-established conferences, journals, magazines etc.). The supervisor is also usually busy as he/she might have more students, managing more projects, etc. Despite what your supervisor does or does not, it is the person who approves your work and eventually your thesis. So, it is important that you learn how to deal with your supervisor, i.e. what are his/her requirements and what you have to do to fulfill those requirements.
  • Have enough time. Students are usually young, knowledgeable and enthusiastic so it is easy to commit students to too many things. You can end up teaching, working in projects, managing projects, organizing conferences, meetings, or doing some evangelism of your research field. Although all of these are very much important tasks that are certainly very important to learn, however, you should keep them in line of your original PhD work while at the same time not committing to too many of them as they can easily distract you from your work.

I have also seen many students who started their PhD from several reasons. The first group of students just want to extend their student life – they feel to be still “young” to start a “serious” life while they want to stay in touch with the university, with student style of living. The second group of students are naturally born theoreticians and researchers who want to push their idea forward, make it right and make it real. The third group of students love expressing themselves in front of some audiance, they love to teach and explain stuff to others. The fourth group of students want to get their degree as they think it will bring them an advantage in finding a good position in the future. There is certainly a big overlap between these groups, however, a common thing to all is that as a PhD student you are supposed to learn what the research is about so that you are well prepared to work as a researcher in the future.

New Version of my Website

Published by Tomas Vitvar on April 25, 2008 in Personal, Programming

I have recently updated my website with some new features. First of all, I have migrated the whole website to a new location and at the same time I have rewritten its source in the language and using the tool called Konamara which I develop for website content management and publishing (there is currently two websites maintained by Konamara, my personal one and the CMS WG website).

Following are the new features of the website:

  • Wordpress blog is fully and seamlessly integrated with website look with recent blog posts displayed on the entry page.
  • Events which I maintain separately in the Google calendar are displayed on the website. Whenever the new event appears in the calendar, the list of events is updated on the website.
  • Integrated Google search engine using Google AJAX Search API to search the content of my website.
  • List of my publications and presentations is generated from the XML of all my publications. Konamara can easily generate selected publications (those with higher rank) and all publications on separate tabs.
  • Some AJAX features such as tabs for “selected” and “all” publications.

The major driver behind Konamara is to allow ease of use of website development powered by Web 2.0 features such as rich user experience, service mashups, etc. I also plan to use it as a basis for semantic service mashups and incorporate some of the ongoing work in semantic web services. I will provide some more details about Konamara later in my blog.

Newer Posts »