Thursday, 6 October 2005

CORDRA, Federated Search, Subject Gateway & Repository

In 1999, Ip et el wrote a paper "Metasearching or Megasearching: Toward a Data Model for Distributed Resource Discovery" looking at searches based on free-text and web crawling, and depth searches using metadata harvesting across multiple repositories. These are alternate/complementary ways of information discovery. Different methods fit different purposes. In 2001, the group followed up with another "Resource Synergy" paper which looked at the value proposition of specialised collection of resources, called subject gateway at that time.

Subject gateways exist to meet the resource needs of their communities of interest. They tend to place a higher value on returning quality resources than general search engines - however we may interpret 'quality'. The value of any SG is in finding more efficient and effective ways to build, index and facilitate retrieval from special interest collections sifted or culled from the massive underlying search space. This sifting and culling depends on the domain knowledge of the SG owners. Such domain expertise is hard to find and replicate in general purpose search services.

As time goes by, it turns out that the value proposition of subject gateways is not sustainable and eventually SGs faded out. Instead, we are seeing a lot of locked repositories where the resources are put behind password protected walls. However the value proposition for collaboration among SG now applies to repositories as well, hence the idea of federated search. Since most repositories are community-based and metadata enabled, apparently it makes sense to based the federated search on metadata. Metadata, based on schema developed to meet specific needs of a particular community, have great difficulty to map among each other. A solution is proposed within the ADL/SCORM community, CORDRA:
(Content Object Repository Discovery and Registration/Resolution Architecture): An open, standards-based model for how to design and implement software systems for the purposes of discovery, sharing and reuse of learning content through the establishment of interoperable federations of learning content repositories.

I took the opportunity at the "Advancing ADL through Global Collaboration" to ask Dan Rehak for a sample instance of CORBRA which I can play with without the need of a user name and password. After thinking for a while, Dan asked me whether I can read Japanese. Apparently the only available CORBRA instance which does not required a user name and password is a Japanese repository.

Business model aside, the value of a federated search is the trust that a user have on the relevancy and quality of the result returned. Federated search will NOT return thousands or millions of potential result. The value of federated search is the limited set of result with the "trust" associated to the result set. That's also one of the value-proposition of subject gateways I referred to earlier.

The relevancy of federated search is based on a smaller search space with well-defined and hopefully well populated metadata set. It also depends on the specificity of the search criteria that a user can put in. Assuming that the user is part of the community, that's is not a problem.

When we are tackling search problem among repositories, we are basically cross-searching (or mega search as used in the 1999 paper) among different community. The are different metadata schemas between different communities, obviously because each and every community will customize its schema to fit its own need and requirement. The cross-mapping (or cross-walking) between schemas are problematic. This will introduce a level of "fuzziness" in the return set of federated search. Hence I cannot buy into the argument that metadata-based federated search will consistently produce better result set. (Better in the sense of relevancy as "fitness for the purpose".)

I reported on Rollyo last Thursday. I don't know the ranking mechanism of Rollyo, but the search space of the result from Rollyo is limited to the nominated websites. This is where the "trust" of quality is exercised by the searcher.

Will Rollyo be a better alternate solution to federated search? Your call.

