Source vs. Resource Ontology

What is a Resource?

The notion of a resource is fundamental in current networked information systems. The term "resource" is used often, specifically in relation the World Wide Web and the W3C's semantic web activity, in standards such as Resource Description Framework (RDF), Uniform Resource Identifier (URI) and others. This relatively simple term masks an exceptional amount of ambiguity.

What is a resource, exactly, in the context of electronic documents served over the web? The ontology developed here attempts to explicate what a resource is and its relation to other related entities.

Although there is a stated definition of a resource in the URI RFC it is in many respects vague:

A resource can be anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources. The resource is the conceptual mapping to an entity or set of entities, not necessarily the entity which corresponds to that mapping at any particular instance in time. Thus, a resource can remain constant even when its content-the entities to which it currently corresponds-changes over time, provided that the conceptual mapping is not changed in the process." [Berners-Lee]

W3C semantic web activity in some way avoids this question of what a resource is, but also raises very interesting questions, some of which will be addressed here. This ontology is limited to web resources - pages, sites, documents, web applications, and web services, and is being formulated specifically with dynamically generated web pages as the prime motivating real world example. The ontology may or may not have further implications or applicability to broder notions of resources.

The Real World

Static pages of HTML are a relatively simple case of web resources. Since they are static they have the same informational content regardless of the user accessing the resource, regardless of the time of access, and regardless of other changes in the outside world. (Certainly there are exceptions, earthquakes that disrupt servers and the like, but this is what the notion of a static page is.)

A significant issue is that even pages we often refer to as being static are not in a strict sense, static, complicating things. "Small" edits over time, for example spelling corrections, change the resource content depending on your perspective. In many cases a "base" of static content is surrounded by dynamic content. For example, a news article that contains advertising. The "aritcle" in one sense doesn't change, but it is a dyanmic resource in an important sense as the advertisements change. Other examples of small amounts of dynamic content in generally static resource include having the current day and time displayed on a page, even if the rest of the page is unchanged.

Many of the resources we care about are not static and clearly are dynamic. Some examples:

Your LEEP homepage
My Yahoo
New York Times
Citeseer entry for Towards a semantics for XML markup

Most of these dynamic resources are generated using a similar general architecture. A database stores information necessary for generation of the page. A program residing on a server interacts with the database, as well as other external data sources to generate a page based on a user request. In some cases, this user request contains uniquely identifying information.

Motivation

Moving away from the current technological aspects of web resources, the ontology should help us understand some important distinctions in web resources. Tim Berners Lee made a distinction between a resource and the entities that represent the "content" of a resource. This distinction between a resource and an "instance" of that resource will be made in the ontology. The beginning intuition is that an instance is an access of a resource in a particular context. What exactly these contexts and instances are will be explicated as well. Furthermore, it is important to distinguish resources from the sources that comprise and inform that resource.

Entity Definitions

Resources

Desipte the ambiguity of the notion of resources being "things with identity," it is difficult to fully explain what a resource entity is in a strict ontologically rigorous way. In an important sense it is the higher level abstract "thing" that that a set of instances belongs to. Berners Lee implies that the conceptual mapping of a "concept" to a set of other entities is also central to a resource, although this ontology seperates the two.

Currently in this ontology a resource is more of a primitive, and defined primarily by its relations to other entities.

Mappings

A mapping roughly corresponds to the program logic in a real world web server example A resource is distinct from its mapping.

A resource has a mapping function that takes contexts and returns instances, this corresponds to an "access" of that resource. For example:

r is a resource, c is a context
i is an instance of resource r in context c
ResourceAccess(r, c) = i

Contexts

A context is a set of assertions. Sources are assertions that inform resources by being part of a context of access. Some sources that belong to a context might include:

time and date (as noted by seconds from epoch stored in a computer's clock)
the headlines of the New York Times (XML representation in an RSS feed)
your identity (from cookies stored by a web browser)
the geographic location of the computer accessing (from IP address)
inventory information (from a database sitting next to the server)

Note that the level of abstraction of the assertions that make up the contexts is not inherent in the definition of sources or contexts. Nor is the general nature of what these assertions say or mean specified in this definition. In the examples listed above there are general assertions at a high level of abstraction about the world, and in parentheses an example of how that information is usually represented in web based systems at a lower level of abstraction. This meant to illustrate that the assertions could accomodate these various levels.

Instances

Instances are also sets of assertions. An instance is therefore dependent on a resource, the mapping function associated with the resource, and a context of access.

Making instances a set of assertions allows a very large set of possible interpretations on what a resource instance is. Since the level of abstraction and nature of these assertions is not specified, they could be at a very high logical level about complex entities in the world, or they could be assertions about the orders of characters in a text file. This flexibility allows this ontology to retain its characteristics indepentenly of one's notion of what the important identifying aspects of a web resource instance are, as long as those notions can be represented as a set of assertions.

Discussion

Questions

Some of the questions and issues this formulation should be able to answer or help clarify our understanding of:

What sources inform or effect a resource?
Has a resource changed? What does that mean?
Are instance A and instance B the same?
Are instance A and instance B instances of the same resource?
Is a resource static or dynamic?

Although, it may be useful to ontologize regardless of intended use just for a better understanding of this increasingly important domain that is central to our information systems. There is an attempt to use this ontology to answer some of the questions, but in particular the identity questions are not addressed at this time.

Static vs. Dynamic resources

Given our notion of what a "static resource" is, one possible definition in the ontology could be:

A resource r is static if and only if
∀c ResourceAccess(r,c) = i

However, this definition implies that in all possible contexts the instance is the same. This may be too strong a statement as there may be certain contexts which are "invalid." Introducing a notion of a valid domain of contexts for a resource can solve this problem.

It may not solve some other more practical problems that include situations like temporary server outages, network problems, or changes to the set of assertions that comprise the instance after they are left from the server. Current real world examples of that include the Google Toolbar's Autolink feature, and the Greasemonkey Firefox web browser extension.

This also introduces another complex issue: "where" are these instances referred to? At what point in the transmission process are we looking? Bits assembled on the server? Sent over the web? The bits received by a client? The electrons on a monitor?

These questions are outside the scope of the discussion, and due to the nature of the flexibility inherent in making instances sets of assertions, it is likely possible to accomodate all of the above possibilities as well as others depending on the interpretation of an instance. Whether your conceptualization of resource accesses considers instances to be bits sent over a network, or letters on a screen, a set of assertions that clearly defines those bits or letters can represent that instance, accomodating the conceputalization.

Source and Resource Relationship

Does a source inform or effect a resource? Does a resource depend on a source? These are important notions to explicate.

For resource r and source s:

Informs(s, r) and Depends(r, s) if and only if
If ∃ c1 and c2 such that
c1 and c2 are within the domain of r
c1, c2 differ only in s value (or presence)
ResourceAccess(r, c1) = i1
ResourceAccess(r, c2) = i2
s.t i1 and i2 are not equal

This states that for two valid contexts, if they differ in a single source value 's' and their instance sets from a resource access differ, then the resource is effected by s.

This seems to get at our notion of what it means for a resource to depend on a source, but raises some important problems. If, for example, time is a part of the context which seems both reasonable and true in real world examples, then it may be impossible to actually isolate a source 's' in question and examine it independently of time. However, the definition above still seems worthwhile for understanding dependence. More complex means of defining dependence may be necesary, however.

A weaker but still important relation between a source and resource is that of "may depends." A resoure may depend on source s if and only if there exists a valid context c for the resource mapping function and s is a member of c.

Open Questions

This ontology makes a distinction between a resource and a mapping function from resources and contexts to instances. If the essence of a resource is a "conceptual mapping" then perhaps these are the same thing. It also seems possible that this "conceptual mapping" may in fact be a more abstract entity that my or may not need to be represented in the ontology.

Some of the questions related to this that are unanswered involve the cardinality of the relation between a resource and mapping function. It is implied each resource has a single mapping function. This is probably not a one-to-one relation, as a single static document residing on different servers, accessible at multiple URL is generally considered a different resource, but it seems in this ontology both would share a single mapping.

Although contexts are gaining favor in the artificial intelligence community, there are questions and issues related to their use in this ontology. The ontology introduces the notion of a domain of valid contexts for a mapping function, but does not specify any limitations or guidelines, or how to go about defining a valid domain for contexts.