I was trying to use a bit of mathematical terms to express the "improvement" of the data model. It seemed it did not go through very well from the comments I have received so far. In this post, I try to rewrite the idea in plain English.

Say we have a set of data, let call it S1 with elements s_{11}, s_{12}, s_{13}, ... s_{1n}. These elements were referred to as type 1 data in the previous paper.

Let use start with a collection of documents. These documents were referred as type 1 data in the previous paper.

Now, apply a "meta" operation* on each of the element in S_{1}which will produce a set S_{2}with elements s_{21}, s_{22}, s_{23}, ... s_{2n}where s_{2i}is the metadata of s_{1i}. These elements ( s_{21}, s_{22}, ... s_{2n}) was referred to as type 2 data in the previous paper.

"Meta-data" is data about data. Hence we can get the data about each of the documents in our collection. We have an collection of data which are data about the documents. The data in the second collection were called type 2 data in the previous paper.

Note that elements of S_{2}are data as well. These elements are themselves type 1 data and hence we can apply "meta" operation on these as well to produce another set of type 2 data. This is infinitely recursive.

We noted that the data in second collection are, themselves, documents. Hence, it is also possible for us to get data about these data. This data about data can reiterate as many times as you like.

What is interesting, and perhaps confusing, is that there exist more than one meta operation. In fact, there are infinite number of meta operations. Each meta operation will produce a set of Type 2 carrying the implicit characteristics of the meta operation. We further define a meta-meta operation as an operation on a set S which will extract the common characteristics all elements in the set S_{2}to produce M_{1}. Since there are infinite number of possible meta operations, there exists infinite number of characteristic, M_{1}, M_{2}, M_{3},... M_{n},... Each of these characteristics, when expressed in as data, is what we refer to as type 3 data in the previous paper.

[Note: this is the improvement to the model.]There is more than one type of data about data. The "normal" metadata is only one type of data about the document. Another type of data about the document may be the number of times a certain word has appeared in the document (frequency count). When we look at the data about the document (our second collection), these data carry some particular features due to the method we get these data about document. If we can express these characteristics in a data format, then we get what we refer to as type 3 data in the previous paper.

An example of M_{i}may be the Dublin Core specification, which defines a particular meta operation. The process of producing M_{1}is the meta-meta operation. Different community of practice will obviously have their own variations of meta operation (adoption and extension of DC) producing M_{i}.

This paragraph is self-explanatory.

Meta-meta operation applies on data elements. Since type 1 data is data, we can also meta-meta operation on type 1 data. One of the possible characteristics of type 1 data is the link information among the elements. This link information has been an important information to determine the "page-rank" in Google's search result. Again, there are other meta-meta operation which can be applied to type 1 as well as type 2 data.

We also note that the way of describe the "common" characteristics of data about document is the same as the "common" characteristics of data. We can, therefore, apply the same technique to the original type 1 data, i.e. the document to try to get some characteristics of the collection of document. One of the characteristics may be the linking relationship between these documents.

## No comments:

Post a Comment