Archive for the ‘Semantic Web’ Category

Here is my post about defining words as the molecular building blocks in the creation and meaning of ideas. However, considering the confusion caused by the term semantics and the unwanted association to linguistics and the semantic web, I think I have to first provide a theory explaining how people correlate and interpret their interpersonal reality: The semantics of semiosis in the interdependent reality of being human, i.e.: the semantics of our humanity.

There is a Representational Theory of Mind (RTM) that is a controversial though sensible and practical theory taken up by many but not all computer scientists and AI engineers. I wish to take up and raise the power of this theory. RTM (cf. Field 1978, 37; Fodor 1987, 17). Fodor and Field developed this representational theory of thought out of Fodor’s Language of Thought Hypothesis (cf. Foder, 1975) and this goes back to James (1890). The theory recognizes thoughts as actions paraphrased thus:

For each biological or psychological act (inference/intention/disposition/resolution, etc.) A, one recognizes and partakes of (commits to) a distinct (i.e. a dedicated) physical affordance R to operate on one or more physical processes selected by subject S. S Acts to influence, or by influence of experience E or (to partake of) process P.

Logically, S bears a relation R to experience E and to physical process P.

Experience with this logical formula induces a cenoscopic type of knowledge that comes from the systematic realization of predictable consequences. These are implied by the way the first-order logic takes “reality” in its aspect to the induction or deduction of such logical relations. The scare quotes around “reality” are needed. Really! The subject S bearing the relation R has a limited range of experience E contending with undefined, yet potential actions or constraints R on one or more indefinite processes P of which one must partake to make an interpretation or create an idea.

A problem arises because whether any variable introduced into this logic actually coincides or correlates with life or with any particularly objective reality is not really questioned by those who apply the logic. This is where human beings and direct experience isn’t of much avail. If one does not know which humanistic affordance offers the most advantage and which humanistic process P to select, or is to be selected, to create a sustainable idea or manifest a suitable and realistic humane thought, how can any idea be measured against any other?

While RTM makes sense, being inductive of cenoscopic knowledge, followers have so far failed to identify the distinct sort of physical affordance R that humanity shares, that a subject S commits to, or; the operations, objects or functions subject S recognizes to act on or interpret their experience E. They have failed to properly characterize any process P in which subject S partakes to create ideas. There is folk-psychological doctrine and there is talk about beliefs – that is the sum of it. Since its inception: neither, the author’s of the theory of RTM proper nor proponents of the doctrines that have embraced it, has been successful in helping adherents identify key objects, operations, procedures and processes.

In his Essay on Human Understanding (1823/1963, p. 174), Locke (1690) wrote:

“All that can fall within the compass of human understanding, being either, first, the nature of things, as they are in themselves, their relations, and their manner of operation: or, secondly, that which man himself ought to do, as a rational and voluntary agent, for the attainment of any end, especially happiness: or, thirdly, the ways and means whereby the knowledge of both the one and the other of these is attained and communicated; I think science may be divided properly into these three sorts.”

From my perspective as a layman, I can see that modern computer and social sciences and philosophy have failed humanity in two of the three divisions of science. The RTM referenced above is science of the second division: what man ought to do as a rational voluntary agent to attain one’s own ends (whatsoever they may be). Because there is no focused definition of humanism, the actions of the agent are not committed to being humane, or even rational, at all. Without a requirement for humanity, whatever rationality exists arises from either irrational desire or rage; neither is appealing nor cultured. It seems to me that, in the case of raising the culture of human understanding, the first commitment one must make is to the humanity from whence biophysical affordance R emerges and rationality follows.

Computer, cognitive and social sciences, particularly linguistics and natural language engineering have failed humanity in both the first and second divisions of science. They have not developed ideoscopic knowledge of the nature of things, or of objects as they are in themselves, in their relations, and their manner of operation. They have not developed ideoscopic knowledge of what man himself, or woman herself, ought to do, as a rational and voluntary agent, for the attainment of a humane end. Finally and thirdly because they have failed to attain to an articulation of ideoscopic knowledge of both divisions one and two, the knowledge being communicated is cenoscopic knowledge, which; while it may often be necessary as excogitated minutia, is grossly insufficient and inadequate to formulating a workable theory of thought and a complete knowledge of both the first and second divisions of science.

Ideoscopic knowledge is knowledge that cannot be arrived at or verified without experimentation –like knowing how to swim, for example. We have ideoscopic knowledge of swimming that is shareable. You can verify this claim by looking up the definition of “to swim” or Google define: swim for the WordNet definition. Then look up the definition of the verb “fly” or Google define: fly. You may notice the difference. Many of us do have ideoscopic knowledge of the states of swimming and flying.

In the case of flying; that ideoscopic knowledge has not yet been attained or recognized by the scientists at Google or by the WordNet authority at Princeton University, and therefore, it is not being communicated. In the area of the humanity of thought and thinking, ideoscopic knowledge has to do with the humanistic use of signs and the correlative distinctions humans make or create between objects and things within the semiotic process, or semosis, carried out in a human mind.

Introducing a Semiotic Theory of Thought:

Mental processing, thinking in particular, takes the form of a triadic system of (supervening) power with relations between a) things (in the lebenwelt; in life) and b) objects in one’s own peculiar objective reality (one’s umwelt). In linguistics, words are signs of things that presuppose objects. According to the RTM, we have a) the subject or signifier S, and; b) the signified, that being the object signified or presupposed of existential sense data E, and; c) the supervening power (process) P and relationships R to which each of us, as human beings, commit ourselves in order to attain to the imputed cause, judgment or interpretation of creation as experienced by S.

That is the static view with experience resting on judgment; here is the dynamic view:

The domain of experience clearly has the potential to become interpreted. The powerful and dynamic relations of this system can be formalized (represented) as an algebraic formula – a recipe – taking creative power P as the main ingredient (the referent) to psychological processes of objectification and condensation. The subject S (the signifying processor) partakes of (selects from) the process P of creation, the distinct ability and specific affordance R necessary to distinguish and symbolize existential findings – the things of E (of the lebenwelt, such as P and R and things (sensible sights, sounds) in and among themselves) as objects — in a world of experience (umwelt) — and; in a state of being (the signified), subject to human being or human signification.

Therewith are the means by which the things and appearances of the outer life, outside self-existence, (lebenwelt) are objectified and condensed into the true and correlative objects of the world of experience (umwelt) by way of the functions (operations) in the domain and range of perception, creativity, imagination and cognitive activity (of one’s innenwelt). The formulable essence enveloping the three dimensions of one’s personal though objective reality is a semantic field of thought whereby thoughts are a function f of the interpretive system enformed by the process P of creation, hastened, constrained or halted, as needs be, by the affordance or potential of action according to the selection or choice of relationship R.

So, in the sense that every act of S bears R to E and partakes of P; we propose these commitments:

f ( P x R ): We define P and R according to Adi’s theory of semantics, whereby:

There is a universal set of psychosomatic objects: G={self,other}, with a potential for engagement and attraction;

there is a universal set of biophysiological objects: T={open,closed}, a potential for inhibition and boundary, and;


R = T x G = { r(j) | j = 1 to 4 } =

{(closed, self),(open, self),(closed, others),(open, others)}, and;

there is process P to creation, whereby;

P= { p(i) | i = 1, 2, 3 } = {assignment, manifestation, containment}

The large scale distribution of ideas depends on ideoscopic knowledge of and command over this subtle and creative process or power P of creation: the ability to compare and confer status being part of the creative power of thought, e.g., A=A. Plato knew this. He railed against people to think for themselves. Each person has the power and authority to assign a value (e.g., equality); both command of the process P of creation and the liberty to exercise this ability, together with a mental and linguistic aptitude to manifest or create such an assignment for oneself.

Over the century of the self, people have lost their power and misplaced their values, now, kings and governments vie to be the only authority with the power to confer status. Click the link and watch the videos for an uncommon look at how human thirst for happiness has been used against you to rob you of your humanity.  The people who have given up or have lost their humanity to the false powers have misplaced values as well.  When people restore their humanity they will find the power they need to confer the status of being humane.

Below is a matrix (PxR) that symbolizes the variable functions of a sign system (English) as a unified semiotic field – derived from these definitions. Each cell has an atomic weight (not shown). In this table, each cell is an atomic unity that generates a state of thought; expanding in time and space. It shows the confluence of power and conditions generating the peculiar essence to each listed word/idea. Words can be defined as the molecular building blocks of a language – the molecular particulates needed to construct ideas. All ideas are themselves dependent upon the power P of creation and the potentiality of the relationships R that people have and are committed to. Speech acts are signs of that commitment.

At the top: the objects self and other symbolize psychosomatic relationships that afford a set of engagement conditions. The objects open and closed symbolize physical relationships that afford a set of boundary conditions. Together, they form a universal affordance, formalizing biophysiological boundary and engagement conditions. These relationships R embody the separation of the objects in space time; open and closed, self and other, in the various types of interdependently unifying configurations. They represent natural relationships R by symbolizing the valence of biophysiological influences on life; such as the fact that opposites attract and inhibitors inhibit among other interesting features of objects, things and their states of being.

Semiotic Field of Thought

This table demonstrates how the process P of creation (defined as a power set and listed vertically down the left-hand column) is objectified and expressed as the initial conditions in the creation or formation of objects in the world of experience; such as those listed. This biophysiological and psychosomatic potential orients the function f of each speech act, each word; the selected affordance determines the input function, domain and range for one’s own judgment. Consider the making or breaking of bonds in the ideas that are symbolized by the two symmetrical columns in the right-side of this matrix.

As you may surmise, the phonetic alphabet is in itself a physical symbol system (albeit one comprised of atomic particulates) that symbolizes and condenses these powers and conditions into the range of one’s own composure; the affordance of which is pragmatically acquired while learning motor and social skills.  Yet, because modern philosophy has failed to develop just this sort of ideoscopic knowledge, the connection is not pointed by instruction and demonstration, it is left to the imagination of each succeeding generation of children.  Many older children often miss the connection and thereby lack grounded concepts.  This creates adults with many doubts and few anchors in their world of experience.

As a basis for a language like English, anyone can now see how this system evolves coherent states of being that we already know about (and routinely refer to). Any word in existence, from any language, can potentially refute this theory. Because every term of every language can be defined in this way, there have been plenty of chances produced. Falsification is a property of a valid scientific theory. So, I invite others to try and refute this theory, I welcome their attention and trial.

Please let me know, with your comments, how it goes.  Until someone does refute this theory, let us regain our humanity and insist on humanism in all science:  the belief in a thinking human being that is capable of partaking of the power of creation: A human being who knows what creative power is, and becomes a willing participator, thereby (in that ideoscopic knowledge), in the process of creation.  Let him or her then be one of those human beings that attends to hastening and accommodating rather than obstructing and destroying the creative process for the Good of our humanity.

Read Full Post »

Search. I suppose there is no denying that the word “search” ascended to significance in the consciousness of more people since the birth of Information Science than perhaps at any other time in history. This supposition is supported by a recent Pew Foundation internet study stating that:

The percentage of internet users who use search engines on a typical day has been steadily rising from about one-third of all users in 2002, to a new high of just under one-half (49%).

While it may not be obvious, it becomes apparent on closer examination of the phenomena, that the spread and growth in the numbers of words and texts and more formal forms of knowledge, along with the modern development of search technology, had a lot to do with that.

Since people adopted the technology of writing systems, civilizations and societies have flourished. Human knowledge and culture, and technological achievement, have blossomed. No doubt.

Since computers, and ways of linking them over the internet, came along, the numbers of words and the numbers of writers have increased substantially. It was inevitable that search technology would be needed to search through all those words from all those writers. That is what Vannevar Bush was telling his contemporaries in 1945 when he said the perfection of new instruments “call for a new relationship between thinking man and the sum of our knowledge.

But somewhere along the line things went wrong; some things went very, very wrong. Previous knowledge and the sum of human experience was swept aside. Search technology became superficial, and consequently, writing with words is not considered as any kind of technology at all. That superficiality violates the integrity of the meaning of search, and the classification of words merely as arbitrary strings is also wrong, in my view.

Some scientists I know would argue that the invention of writing is right up there at the top of human technological achievement. I guess we just take that for granted these days, and I am nearly certain that scientists that were embarking into the new field of information technology in the 1940’s and 1950’s were not thinking of writing with words as the world’s first interpersonal memory– the original technology of the human mind and its thoughts and interactions.

Most information scientists have not yet fully appreciated words as technical expressions of human experience but treat them as labels instead. By technical, I mean of or relating to the practical knowledge and techniques (of being an experienced human).

Very early in the development of search technology, information scientists and engineers worked out assumptions that continue to influence the outcome, that is, how search technology is produced and applied today. The first time I wrote about this was in 1991 in the proceedings of the Annual Meeting of the American Society of Information Science. There is a copy in most libraries if anyone is interested.

And here we are in 2008, in what some call a state of frenzy and others might call disinformed and confused– looking at the prospects of the Semantic Web. I will get to all that in this post. I will divide this piece into the topics of the passion for search technology, the state of confusion about search technology, and the semantics of search technology.

The term disinformed is my word for characterizing how people are under-served if not totally misled by search engines. A more encompassing view of this sentiment was expressed by Nick Carr in an article appearing in the latest issue of the Atlantic Monthly where he asks: Is Google making us stupid?

I am going to start off with the passion of search.

Writing about the on-line search experience in general, Carmen-Maria Hetrea of Britannica wrote:

… the computing power of statistical text analysis, pattern-matching, and stopwords has distracted many from focusing on (should I say remembering?) what actually makes the world tick. There are benefits and dangers in a world where the information that is served to the masses is reduced to simple character strings, pattern matches, co-location, word frequency, popularity based on interlinking, etc.

( … ) It has been sold to us as “the trend” or as “the way of the future” to be pursued without question or compromise

That sentiment pretty much echos what I wrote in my last post. You see, computing power was substituted for explanatory power and the superficiality of computational search was given credibility because it was needed to scale to the size of the world wide web.

This is how “good enough” became state of the art. Because search has become such a lucrative business and “good enough” has become the status quo, it has also become nearly impossible for “better” search technology to be recognized, unless it is adopted and backed by one of the market leaders such as Google or Microsoft or Yahoo.

I have argued in dozens of forums and for more than twenty years that search technology has to address the broader logic of inquiry and the use of language in the pursuit of knowledge, learning and enhancing the human experience. It has to accommodate established indexing and search techniques and it has to have explanatory power to fulfill the search role.

Most that know me know that I am not showing up at this party empty-handed. I have software that does all that and while my small corporate concern is no market or search engine giant my passion for search technology is not unique.

In her Britannica Blog post about search and online findabillity, Carmen-Maria Hetrea summed up her passion for search:

Some of us dared to differ by returning to the pursuit of search as something absolutely basic to the foundations of our human existence: the simple word in all of its complexity — in its semantics and in its findability and its futuristic promise.

You have to ask yourself what you are really searching for before you can find that it is not for keywords or patterns at all. Out in the real world almost everyone is searching for happiness. Some are also searching for truth or relevance. And many search for knowledge and to learn. If your searching doesn’t involve such notions, maybe you don’t mind the tedium of thorough, e.g., exegetical, searching. Or maybe you are someone who doesn’t search at all, but depends on others for information.

How is the search for happiness addressed by online search technology? Should it be a requirement of search technology to find truth or relevance? Should a search be thorough or superficial? Is it about computing power or explanatory power? I am going to try and address each of these questions below as I wade through the causes of confusion, expose the roots of my passion and maybe shed some light on search technology and its applications.

Some people have said in the online world you have both the transactional search and the research search, which are not the same. They imply that these search objectives require different instruments or plumbing. I don’t think so. I think it is just a crutch vendors use to justify superficial search. Let’s look at an example transactional search, say, searching for a new car. There are so many places where you can carry out that transaction, being thorough and complete is not an issue. Here’s is a search vendor quiz:

Happiness is a ___________ search experience.

Besides searching for objects of information that we know but don’t have at hand, in cyberspace and on the web, we might search for a pizza place in a new destination. Many search for cheap air fares or computer or car parts, or deals on eBay, while others search for news, music, pictures and many other types of media and information. A few others search for knowledge and for explanation. Happiness in the universe of online search is definitely a satisfying search experience irrespective of what you are searching for.

Relevance is paramount to happiness and satisfaction whether searching for pizza in close proximity or doing research with online resources. Search vendors are delivering hit lists from their search engines, where users are expecting relevance and to be happy with the results. Satisfaction, in this sense, has turned out to be a tall order and nonetheless a necessary benefit of search technology that people still yearn for.

Let’s now turn to the state of confusion.

Carmen-Maria mentions that new search technology has to be backward compatible and she also complains that bad search technology is like the wheel that just keeps getting reinvented:

The wheel is being reinvented in a deplorable manner since search technology is deceptive in its manifestation. It appears simple from the outside, just a query and a hitlist, but that’s just the tip of the iceberg. In its execution, good search is quite complex, almost like rocket science.

… The wealth of knowledge gained by experts in various fields – from linguists to classifiers and catalogers, to indexers and information scientists – has been virtually swept off the radar screen in the algorithm-driven search frenzy.

The wheel is certainly being re-invented; that’s part of the business. I am uncertain what Carmen-Maria means by algorithm-driven search frenzy. Algorithms are the stuff of search technology. I believe that some of the problems with search stem from the use of algorithms that are made fast by being superficial, by cutting corners and by other artificial means. The cutting of corners begins with the statistical indexing algorithms or pre-coordination of text– so retrieval is consequently hobbled by weaknesses in the indexing algorithms. But algorithms are not the cause of the problem.

Old and incorrect assumptions are the real problem.

Modern state-of-the-art search technology (algorithms) invented in the 1960’s and 1970’s strip text of its dependence on human experience under something information science (IS) professionals justify as the independence assumption. Information retrieval (IR) professionals– those that design software methods for search engine algorithms– are driven by the independence assumption to treat each text as a bag of words without connection to other words in other texts or other words in the human consciousness.

I don’t think he was thinking about this assumption when Rich Skrenta wrote:

… – the idea that the current state-of-the-art in search is what we’ll all be using, essentially unchanged, in 5 or 10 years, is absurd to me.

Odds are that he intends to sweep a lot of knowledge out of the garage too, and I would place the same odds that any “new” algorithm Rick brings to the table will implicitly apply that old independence assumption too.

So this illustrates a kind of tug of war between modern experts in search technology and the knowledge of ages of experience. There is also a kind of frenzy or storm over so-called “new” technologies and just what constitutes “semantic” search technology. While some old natural language processing (NLP) technology has debuted on the online search scene, it has not brought any new search algorithms to light. They have only muddied the waters in my opinion. I have written about this in previous posts.

The underlying current is stirred up by imbalance existing in the (significant) history of search technology contrasted with the nascence of online search and other modern applications of search technology. Add to that disturbance the dichotomy exasperated by good (satisfying) and bad (deceptive) search results, multiplied by the number of search engine vendors, monopolistic or otherwise, and you have the conditions where compounding frenzy, absurdity and confusion, rather than relevance, reigns supreme.

I like to think my own view transcends this storm and sets an important development principle that I established when I produced the first concept search technology back in 1987. The subjects of the search may be different but the freedom to search for objects, for answers, or for theories or explanations of unknown phenomena is the right of inquiry.

This right of intellectual inquiry is as important and as basic as the freedom of speech. This is what ignites my passion for search technology. And I cannot stand to have my right of inquiry blocked, limited, biased, restricted, arrested or constrained, whether by others, or by unwarranted procedure (algorithm) or formality, or by mechanical devices.

I wear my passion on my sleeve and it frequently manifests as a rant against the “IT” leaders or so-called experts that Carmen-Maria wrote about:

Many consider themselves experts in this arena and think that information retrieval is this new thing that is being invented and that is being created from scratch. The debate often revolves around casual observations, remarks, and opinions come mostly from an “IT” perspective.

To be fair, not all those with “IT” perspectives are down with all this “new thing” in online search engines. Over at the Beyond Search blog, Stephen Arnold wrote about the problem with the thinking about search technology:

… fancy technology is neither new nor fancy. Google has some rocket science in its bakery. The flour and the yeast date from 1993. Most of the zippy “new” search systems are built on “algorithms”. Some of Autonomy reaches back to the 18th century. Other companies just recycle functions that appear in books of algorithms. What makes something “new” is putting pieces together in a delightful way. Fresh, yes. New, no.

I also think Stephen understands the history of search technology pretty well. He demonstrates this when he writes:

Software lags algorithms and hardware. With fast and cheap processors, some “old” algorithms can be used in the types of systems Ms. Hane identifies; for example, Hakia, Powerset, etc. Google is not inventing “new” things; Google is cleverly assembling bits and pieces that are often well known to college juniors taking a third year math class.

Like Carmen-Maria Hetera, Stephen Arnold sounds biased against algorithms, “old” algorithms in particular, though I don’t think he intended any bias, as many of the best algorithms we have are “old”. There are really not many “new” algorithms. Augmented, yes. Modified, Yes. New, no.

To be involved in IT and biased against algorithms is absurd as long as technology is the application of the scientific method and scientific search methods are understood as collections of investigative steps systematically combined into useful search procedures or algorithms. So there you have my definition of search technology.

The algorithms for most search technology are not rocket science and can be boiled down to simple procedures. At the very least there is an indexing algorithm and a search algorithm:

Pre-coordination per-text/document/record/field procedure:

  1. Computerize an original text by reading the entire text or chunks of it into computer memory.
  2. Parse the text into the smallest retrievable atomic components (usually patterns (trigrams, sentences, POS, noun-phrases, etc.) or keywords or a bag (alphabetical list) of infrequent words).
  3. Store the original text with a unique key and store the parsing results as alternate keys in an index.
  4. Repeat for each new text added to a database or collection.

Post-coordination per-query procedure:

  1. Read a string from input, parse the query into keys in the same way as a text.
  2. Search the index to the selected collection or database with the keys.
  3. Assemble (sort, rank) key hits into lists and display.
  4. Choose hit to effect retrieval of the original text.

These basic algorithms are fulfilled differently by different vendors but vendors do not generally bring new algorithms to the table. They bring their methods of fulfilling these algorithms; they may modify or augment regular methods employed in steps 2 and 3 of these procedures as Google does with link analysis.

In addition, vendors fold search technology into a search engine. Most online search engines– those integrated “software systems” or search appliances that process text, data and user-queries, are composed of the following components:

  1. A crawler for crawling URI’s or files on disk or both.
  2. An indexer that takes input from the crawler and recognize key patterns or words.
  3. A database to store crawler results and key indexing (parsing) results.
  4. A query language (usually SQL, Keyword-Boolean) to use the index and access keys in the database.
  5. An internet server and/or graphical user interface (GUI) components for getting queries from, and presenting results to, users.

Most search engine wizards, as they are called, are working on one or more of these software components of online search engines. You can look at what a representative sample of these so-called wizards have to say about most of these components at the ArnoldIT blog here. If you read through the articles, you won’t find one of them (and I have not read them all) that is working on new indexing methods or new mapping algorithms for mapping the meaning of the query to the universe of text, for example.

Many of the “new search engines,” popping up everywhere, are not rightly called new search technology even though they frequently bear the moniker. They are more rightly named new applications of search technology. But even vendors are confused and confusing about this. Let’s see what Riza Berkin of Hakia is saying in his most recent article where he writes:

But let’s not blind ourselves by the narrowness of algorithmic advances. If we look closely, the last decade has produced specialist search engines in health, law, finance, travel etc. More than that, search engines in different countries started to take over (like Naver, Baidu, Yandex, ect.)…

He had been writing that Search 1.0 began with Alta Vista (circa 1996) Search 2.0 is Google-like and Search 3.0 is semantic search “where the search algorithms will understand the query and text”. I guess all those search engines from Fulcrum, Lexis-Nexis, OpenText, Thunderstone, Verity, Westlaw, and search products from AskSam to Readware ConSearch to ZyIndex, were Search 0.0 or at leat P.B. …. You know like B.C. but Pre-Berkin.

And so this last paragraph (above) makes me think he is confusing search applications with search technology. His so-called specialists search engines are applications of search technology to the field or domain of law, to the field or domain of health, and so on.

Then he confuses me even more, when he writes about “conversational search”:

Make no mistake about it, a conversational search engine is not an avatar, although avatars represent the idea to some extent. Imagine virtual persons on the Web providing search assistance in chat rooms and on messengers in a humanly, conversational tone. Imagine more advanced forms of it combined with speech recognition systems, and finding yourself talking to a machine on the phone and actually enjoying the conversation! That is Search 2.0 to me.

Now I can sympathize with Riza because I used the phrase “conversational search” to describe the kind of conceptual search engine I was designing in 1986. I am not confused about that. I am confused that he calls that Search 2.0 when earlier– statistically augmenting the inverted index –was described as Search 2.0.

He doesn’t stop there. He continues describing Search 3.0 that “will be the ‘Thinking Search’ where search systems will start to solve problems by inferencing. ” Earlier he wrote that semantic search was Search 3.0. Semantics requires inferencing, so I began to reckon maybe thinking and semantics are equal in his mind, until he writes: “I do not fool myself with the idea that I will see that happening in my life time” — so now I am confused again. I think it is what vendors want; they want the public to remain confused about the semantics of search and what you get with it.

And that brings me to the semantics of search.

There are only two words that matter here: Thoroughness and Explanatory.

When I started tinkering with text processing, search and retrieval software in the early 1980’s, I was captivated by the promise of searching and reading texts on computers. The very first thing that I noticed about the semantics of search, before my imagination became involved in configuring computational search technology, was thoroughness. The word /search/ implies thoroughness if not completeness in its definition. Thoroughness is a part of the definition of search. Look at the definition of search for yourself.

You need only look at one or two hit lists from major search engines and you can see that is not what we get from commercial search engines, or from most search technology. Search is not a process that is completed by delivering some hints of where to look, but that is what it has been fashioned into by the technological leaders in the field. Millions of people have accepted it.

Yet, in our hearts we know that search must be complete and it must be explanatory to be satisfying; We must learn from it, and we expect to learn from conducting a search. Whether we are learning of the address to the nearest pizza place or we are learning how to install solar heating, it is not about computational power, it is about explanatory power. They forgot that words are part of the technique of communicating interpersonal meaning, let’s hope search vendors don’t forget that words have explanatory power too.

Tell me what you think.

Read Full Post »

Peter Mika recently wrote an article about the semantic web and NLP-style semantic search. I should just ignore his claim that there are only two roads to semantic search because he is plainly mistaken on that count. As Peter works for Yahoo, he was mainly discussing data processing with RDF and Yahoo’s Search Monkey. He obviously knows that subject well.

He constructed an example of how to use representational data (such as an address) according to semantic web standards and how to integrate the RDF triples with search results. His claim is that one cannot do “semantics” without some data manipulation and for that the data must be encoded with metadata; essentially data about the data. In this case, the metadata necessary to pick out and show the data at the keyword: address.

At the end of his article, Peter talks about the way going forward, and; in particular, about the need for fostering agreements around vocabularies. I suppose that he means to normalize the relationships between words by having publishers direct how words are to be used. He calls this a social process while calling on the community of publishers to play their role. Interesting.

About the time Peter was beginning his PhD candidacy, industry luminary John Sowa wrote in Ontology, Metadata and Semiotics that:

Ontologies contain categories, lexicons contain word senses, terminologies contain terms, directories contain addresses, catalogs contain part numbers, and databases contain numbers, character strings, and BLOBs (Binary Large OBjects). All these lists, hierarchies, and networks are tightly interconnected collections of signs. But the primary connections are not in the bits and bytes that encode the signs, but in the minds of the people who interpret them.

This is the case in the trivial example offered by Peter. The reason one is motivated to list an address in the search result of a search for Pizza is because it is relevant to people who are searching for a pizza place close to them. In his paper, John Sowa writes:

The goal of various metadata proposals is to make those mental connections explicit by tagging the data with more signs.

This is the essential nature of the use case and proposal offered by Yahoo with SearchMonkey. It seems a good idea, doesn’t it? Yahoo is giving developers the means to tag such data with more signs. Besides, it has people using Yahoo’s index, exposing Yahoo’s advertisers. Sowa cautions that:

The ultimate source of meaning is the physical world and the agents who use signs to represent entities in the world and their intentions concerning them.

Which resources do investigators or developers use to learn about agents and their intentions when using signs? The resource most developers turn to is language and they begin by defining the words of language in each context in which they appear.

Peter says it is common for IR systems to focus on words or grams and syntax. While some officials may object, though NLP systems such as Powerset, Hakia and Cognition use dictionaries and “knowledge bases” to obtain sense data, they each focus mainly on sentence syntax and (perhaps with the exception of Powerset) use keyword indexes for retrieval just like traditional IR systems.

Hakia gets keyword search results from Yahoo as a matter of fact. All of these folks treat words, and even sentences, as the smallest units of meaning of a text. Perhaps these are the most noticeable elements of a language that are capable of conveying a distinction in meaning though they certainly are not the only ones. There are other signs of meaning obtainable from textual discourse.

Believe it or not, the signs people use most regularly are known as phonemes. They are the least salient because we use them so often, and frequently they are also largely used subconsciously. Yet, we have found that these particular sounds are instantiations, or concrete signs, of the smallest elements of abstract thought– distinctive elements of meaning that are sewn and strung together to produce words and form sentences. When they take form in a written text they are also called morphemes.

Some folks may not remember that they learned to read words and texts by stringing phonemes together, sounding them out to evoke, apprehend and aggregate their abstract meanings. I mention this because if a more natural or organic semantic model were standardized, the text on the world wide web could become more tractable and internet use might become more efficient.

This would happen because we could rid ourselves of the clutter of so many levels of metalevel signs and the necessity of controlled vocabularies for parsing web pages, blogs and many kinds of unstructured texts. An unstructured text is any free flowing textual discourse that cannot easily be organized in the field or record structure of a database. Neither is it advantageous to annotate the entirety of unstructured text with metalevel signs. Because as John Sowa wrote:

Those metalevel signs themselves have further interconnections, which can be tagged with metametalevel signs. But meaningless data cannot acquire meaning by being tagged with meaningless metadata.

So now it begs the question of whether or not words and their definitions are just meaningless signs to begin with. The common view of words—as signs— is that they are arbitrarily assigned to objects. I am unsure whether linguists could reach consensus that the sounds of words evoke meaning, as it seems many believe that a horse could have been called an egg without any consequence to its meaning or use in a conversation.

Within the computer industry it becomes even more black and white: A word is used to reference objects by way of general agreement or convention, where the objects are things and entities existing in the world. Some linguists and most philosophers recognize abstract objects as existing in the world as well. Though this has not changed the conventional view that is a kind of defacto standard among search software vendors today.

This view implies that the meaning of a word or phrase -its interpretation- adheres only to conventional and explicit agreements on definitions. The trouble is that it overlooks or ignores the fact that meaning is independently processed and generated (implicitly) in each individual (agents) mind. This is generally very little trouble if the context is narrow and well-defined as in most database and trivial semantic web applications on the scene now.

The problems begin to multiply exponentially when the computer application is purported to be a broker of information (like any search engine) where there is a verbal interchange of typically human ideas in query and text form. This is partly why there is confusion about meaning and about search engine relevance. Relevance is explicit, in as much as you know it when you see it, otherwise, relevance is an implicit matter.

Implicit are the dynamic processes by which information is recognized, organized, acted on, used, changed, etc. The implicit processes in cognitive space are those required to recognize, store and recall information. Normally functioning, rational, though implicit and abstract thought processes organize information so we that may begin to understand it.

It is obvious that there are several methods and techniques of organizing, storing and retrieving information in cyberspace as well. While there are IR processes running both in cyberspace and in cognitive space, it is not the same abstract space and the processes are not at all the same. In cyberspace and in particular in the semantic web, only certain forms of logical deduction have been implemented.

Cognitive processes for organizing information induce the harmonious and coherent integration of perceptions and knowledge with experience, desires, the physical self, and so on. Computational processes typically organize data by adding structure that arranges the information in desired patterns.

Neither the semantic web standards, nor microformats, nor NLP, seek the harmony or coherence of knowledge. Oh, yes, they talk about knowledge and about semantics yet what they deliver are little more than directives; suitable only for data manipulation in well-understood and isolated contexts.

Neither NLP nor semantic web meta data or tools presently have sufficient faculty for abstracting the knowledge that dynamically integrates sense data or external information with the conditions of human experience. The so-called semantic search officials start with names and addresses because these data have conventionally assigned roles that are rather regular.

When it comes down to it, not many words have such regular and conventional interpretations. It would actually be quite alright if we were just talking about a simple database application, but proponents of the semantic web want to incorporate everything into one giant database and controlled vocabulary. Impossible!

While it appears not to be recognized, it should be apparent that adherence to convention is a necessary yet insufficient condition to hold relevant meaning. An interpretation must cohere with its representation and its existence (as an entity or agent in the world) in order to hold. Consider the case of Iraq and weapons of mass destruction. Adhere, cohere, what’s the difference –it’s just semantics– right? Nonetheless, neither harmony nor coherence can be achieved by directive.

A consequence of the conventional view is that such fully and clearly defined directives leave no room for interpretation even though some strive for under specification. The concepts and ideas being represented can not be questioned; because, being explicit directives, they go without question. This is why I believe the common view of words and meaning that many linguists, computer and information experts, like Peter, hold, is mistaken.

If the conventional view were correct, the interpretation of words would neither generate meaning nor provide grounds for creating new concepts and ideas. If it were truly the case, as my friend Tom Adi said, natural language semantics would degenerate into taking an inventory of people’s choices regarding the use of vocabulary.

So, I do not subscribe to the common view. And these are the reasons that I debate semantic technologies even though end-users could probably care less about the techniques being deployed. Because if we are not careful we will end up learning and acting by directive too. That is not the route I would take to semantic search. How about you?

Read Full Post »

I would like to address the few questions I received on the three parts 1,2 and 3 of the semantics of interpersonal relations. The first and most obvious questions was:

I don’t get it. What are the semantics?

This question is about the actual semantic rules that I did not state fully or formally in any of the three parts. I only referred to Dr. Adi’s semantic theory and related how the elements and relations of language (sounds and signs) correspond with natural and interpersonal elements and relations relevant to an embodied human being.

Alright, so a correspondence can be understood as an agreement or similarity and as a mathematical and conceptual mapping (a mapping on inner thoughts). What we have here, essentially, is a conceptual mapping. Language apparently maps to thought and action and vice-versa. So the idea here is to understand the semantic mechanism underlying these mappings and implement and apply it in computer automations.

Our semantic objects and rules are not like those of NLP or AI or OWL or those defined by the semantic web. These semantic elements do not derive from the parts of speech of a language and the semantic rules are not taken from propositional logic. And so that these semantic rules will make more sense, let me first better define the conceptual space where these semantic rules operate.

Conceptually, this can be imagined as a kind of intersubjective space. It is a space encompassing interpersonal relationships and personal and social interactions. This space constitutes a substantial part of what might be called our “semantic space” where life lived, what the Germans call Erlebnis, and ordinary perception and interpretation (Erfahrung) intersect, and where actions in our self-embodied proximity move us to intuit and ascribe meaning.

Here in this place is the intersection where intention and sensation collide, where sensibilities provoke the imagination and thought begets action. It is where ideas are conceived. This is where language finds expression. It is where we formulate plans and proposals, build multidimensional models and run simulations. It is the semantic space where things become mutually intelligible. Unfortunately, natural language research and developments of “semantic search” and the “Semantic-Web” do not address this semantic space or any underlying mechanisms at all.

In general when someone talks about “semantics” in the computer industry, they are talking either about English grammar, rdf-triples in general or they are talking about propositional logic in a natural or artificial language, e.g., a data definition language, web services language, description logic, Aristotelian logic, etc. There is something linguists call semantics though the rules are mainly syntactic rules that have limited interpretative and predictive value. Those rules are usually applied objectively, to objectively defined objects, according to objectively approved vocabulary defined by objectively-minded people. Of course, it is no better to subjectively define things. Yet, there is no need to remain in a quandary over what to do about this.

We do not live in an completely objective, observable or knowable reality, or a me-centric or I-centric society, it is a we-centric society. The interpersonal and social experience that every person develops from birth is intersubjective — each of us experience the we-centric reality of ourselves and others entirely through our own selves and our entirely personal world view.

Perhaps it is because we do not know and cannot know– through first-hand experience at least– what any others know, or are presently thinking, that there is this sort of dichotomy that sets in between ourselves and others. This dichotomy is pervasive and even takes control of some lives. In any case, conceptually, there is a continuum between the state of self-realization and the alterity of others. This is what I am calling the continuum of intersubjective space.

A continuum of course, is a space that can only be divided arbitrarily. Each culture has their own language for dividing this space. Each subculture in a society have their own language for dividing this space. Every technical field has their own language for dividing the space. And it follows, of course, that each person has their own language, not only for dividing this space, but for interacting within the boundaries of this space. The continuum, though, remains untouched and unchanged by interactions or exchanges in storied or present acts.

The semantics we have adopted for this intersubjective space include precedence rules formulated by Tom Adi. Adi’s semiotic axioms govern the abstract objects and interprocess control structures operating in this space. Cognitively, this can be seen as a sort of combination functional mechanism, used not only for imagining or visualizing, but also for simulating the actions of others. I might add that while most people can call on and use this cognitive faculty at will, its use is not usually a deliberate act; it is mainly used subconsciously and self-reflexively.

We can say that the quality of these semantics determine the fidelity of the sound, visualization, imitation or simulation to the real thing. So when we access and use these semantics in computer software as we do with Readware technology, we are accessing a measure of the fidelity between two or more objects (among other features) . This may sound simplistic though it is a basic level cognitive faculty. Consider how we learn through imitation. Note to self: Don’t leave out the cognitive load to switch roles and consider how easily we can take the opposite or other position on almost any matter.

We all must admit, after careful introspection, that we are able to “decode” the witnessed behavior of others without the need to exert any conscious cognitive effort of the sort required for describing or expressing the features of such behavior using language, for example. It may be only because we must translate sensory information into sets of mutually intelligible and meaningful representations in order to use language to ascribe intentions, order or beliefs, to self or others, that the functional mechanism must also share an interface with language. It may also be because language affords people a modicum of command and control over their environment.

Consider the necessity of situational control in the face of large, complex and often unsolvable problems. I do not know about you, but I need situational control in my environment and I must often fight to retain it in the face of seemingly insurmountable problems and daily ordeals.

Now try and recognize how the functional aspects of writing systems fill a semiotic role in this regard. Our theoretical claim is that these mutually intelligible signs instantiate discrete abstract clusters of multidimensional concepts relative to the control and contextualizing of situated intersubjective processes.

Like the particles and waves of quantum mechanics are to physics, these discrete intersubjective objects and processes are the weft and the warp of the weaving of the literary arts and anthropological sciences on the loom of human culture. We exploited this functional mechanism in the indexing, concept-analysis, search and retrieval software we call Readware.

We derived a set of precedence rules that determine interprocess control structures and gave us root interpretation mappings. These mappings were applied to the word roots of an ancient language that were selected because modern words derived from these word roots are used today. These few thousand root interpretations (formulas) were organized into a library of concepts, a ConceptBase, used for mapping expressions in the same language and from different languages. It was a very successful approach for which we designed a pair of ReST-type servers with an API to access all the functionality.

To make this multi-part presentation more complete, I have posted a page with several tables drawn up by Tom Adi, along with the formal theory and axioms. There are no proofs here as they were published elsewhere by Dr. Adi. These tables and axioms identify all the key abstract objects, the concepts and their interrelationships. Tom describes the mappings from the base set (sounds) and the axioms that pertain to compositions and word-root interpretations, together with the semantic rules determining inheritance and precedence within the control structures. You can find that page here.

And that brings me to the next question, which was: How can you map concepts between languages with centuries of language change and arbitrary signs? The short answer is that we don’t. We map the elements of language to and from the elements of what we believe to be are interlinked thought processes that form mirror-like abstract and conceptual images (snapshots) of perceptive and sensory interactions in a situated intersubjective space.

That is to say that there is a natural correspondence between what is actually happening in an arbitrary situation and the generative yet arbitrary language about that situation. This brings me to the last question that I consider relevant no matter how flippant it may appear to be:

So what?

The benefits of a shared semantic space should not be underestimated. Particularly in this medium of computing where scaling of computing resources and applications is necessary.

Establishing identity relations is important because it affords the self-capacity to better predict the consequences of the ongoing and future behavior of others. In social settings, the attribution of identity status to other individuals automatically contextualizes their behavior. By contextualizing content, for example, knowing that others are acting as we would effectively reduces the cognitive complexity and the amount of information we have to process.

It is the same sort of thing in automated text processing and computerized content discovery processes. By contextualizing content in this way (e.g, with Readware) we dramatically and effectively reduce the amount of information we must process from text, to more directly access and cluster relevant topical and conceptual structure, and to support further discovery processes. We have found that a side-effect to this kind of automated text-analysis is that it clarifies data sources by catching unnatural patterns (e.g., auto-generated spam) and it also helps identify duplication and error in data feeds and collections.

Read Full Post »

If “literature” is “the imaginative and creative writing of a language, period or culture” the blogs, new media and web pages of (so-called) social media qualifies as the new literature. One could say that online news, health and finance sites, the “e-zines” and the several hundred million web logs or blogs comprise a growing part  of the classic literature of our time.

According to a Nobel prize winner Issac Bashevis Singer– emotion and intellect are essential poles of literature.

“The very essence of literature is the war between emotion and intellect, between life and death. When literature becomes too intellectual — when it begins to ignore the passions, the motions — it becomes sterile, silly, and actually without substance.” –in the New York Times Magazine, November 26, 1978

It is the reason why many so-called “expert systems” and intelligent or “semantic” information systems are the way the are: they haven’t much substance. Emotional and interpersonal relevance is illogical, of course … unscientific … It explains why programmers, software developers and product engineers cannot fathom the interpersonal and social relations within the media. Search engine developers cannot capture and index human emotion.

Neither software developers nor product engineers can quantify or computationally relate the affairs of the human intellect. Logicians are not able to assign a truth value to passion, let alone determine what relations essence may enjoy or which of a myriad of ephemeral forms of matter have substance worth computing. Creating a system for indexing, searching and relating social media is not like creating an accounting system. There are no general rules of the affairs of the human imagination. It is not a membership or inventory management system and human psychology is not easily captured with traditional software engineering principles.

Therefore: No big names are researching the emotional and interpersonal structure– the powerful foundation –on which all sorts of human affairs and institutions rest. There are no big computational engineering projects designing or building parsers that can get at the emotional, perceptual and cognitive base of human reason for that would disrupt the present oligarchy.

It is the reason for the resurgence of artificial intelligence techniques in the garb of the semantic web. It is caused by an industry and academic failure: a failure to contemplate what is happening. This widespread failure is no more evident today, than in the Internet-based discussions about more intelligent and so-called “semantic search” engines.

Most of these discussions are highly charged and the noise and rampant neologism is symptomatic of the social confusion and disorder. So it is that the promise of an intelligent web and higher access to knowledge stands in binary opposition to the binds and perils of navigating the social spaces of the Internet. I even read a blog that talked about hardwiring nouns and verbs to make a semantic web.

This is the very reason why people complain about the results they get from search engines as in this recent article, and also in this article. And this video makes us think about the outcome of the superficial, artificial and logical path we are on.

Instead of supporting research and discovery we get the big search engines and major shopping destinations who form alliances to snag our attention. They are not trying to understand what we should want to know; they only want to know what we want to purchase. They vie with one another and a vast host of marketers and hucksters to be the first to offer the right price.

You can find the quote that addresses the search for “the essence of literature”only incidentally because a Nobel prize winner was quoted in a New York Times article. The only reason it shows up in the first page of a search result is because it appears on a large advertiser supported web property that is seen as an authority by the major search engines. Google supports and extends the oligarchy to all those ready and willing to pay for the use of words while claiming to do no evil.

I am pretty sure that there are plenty of authorities on “social media” if only because billions of dollars are being spent. Try searching for “the essence of social media” on Google and Yahoo. That is a kind of research-type search that could benefit from semantics. Try it on Hakia and Lexxe and PowerSet, the results are better at Hakia but still not satisfying. You can go through the entire list of alternate search engines and you will not get much better results. Speaking of Hakia, let’s look at semantic search.

While the authority at Wikipedia seems to recognize the difference between research and navigational searching, incredulously the entry for ‘semantic search‘ defines it as way of leveraging XML and RDF data from semantic networks. The ‘bark’ disambiguation example is a case in point. Do you think they might be barking up the wrong tree? It makes me wonder whether the author of this entry actually lives in the real world, having never been barked at by a boss or spouse.

I thought: Hakia obviously has a more complete semantic sense of the term “bark” than the systems cited in the ill-informed semantic search entry at Wikipedia. At Hakia I asked: “Are the democrats barking up the wrong tree?” –thinking about their plans to override the president’s veto of the bill extending the children’s health program. It turned out I was wrong. I invite you to go see for your self. The usefulness and “semantic” relevance of ad-supported search pages was characterized as well, as I captured in this clip:

Sponsored Result

There was an article that I was unable to find on any search engine. Neither the results or the sponsored links were satisfying for the big search engines or the so-called “semantic” alternates.

So that no one thinks I am just complaining and have opinions without solutions, I am beginning another article, where I will spell out a solution to the problem. In the meantime, let me know what you think by leaving a comment below.

Read Full Post »

Alex Iskold recently authored a couple of articles over at the Read/Write web about the Semantic Web. I caught the one labeled Top-Down: A New Approach to the Semantic Web that refers to an earlier post on the same subject. I commented at the site that I thought it was misinformed and not very helpful at all. When I say it is not helpful, I mean that I do not believe it is helpful to the community, to progress or understanding.

I must say upfront that I never met Alex Iskold, but from his own bio and articles I have read, he appears intelligent and accomplished. I am not picking on him and I am not attacking him personally. Any interpretation of what follows to the contrary is just wrong.

I will admit also that I am holding the Read/Write Web to an editorial and perhaps even an academic standard while I recognize the hosts are not obligated to any such standard. Nonetheless, I am compelled to point out that neither the author or the hosts are authorities on the Semantic Web.

This leads us directly to what I have issue with, which is: some authoritative posturing that is both misleading and detractive. So this post is my rant against all those who hype semantics without much knowledge about semantics proper, no experience with ontologies or graphs and certainly no semantic methods or algorithms of their own. I am only using Alex’s article as an example but the SEO scene is overflowing with this kind of noise that proliferates confusion and ambiguity and leads to misunderstanding.

First, let us define what I called semantics proper. For our purposes we can use this definition from Wikipedia: Semantics (Greek sēmantikos, giving signs, significant, symptomatic, from sēma (σῆμα), sign) refers to aspects of meaning, as expressed in language or other systems of signs.

A sign may be expressed in a single word. A signifier of explicit command and control is often expressed that way: stop, go, turn. The word ‘stop’ is a linguistic sign. It is used to indicate or point out a specified meaning: that meaning is to halt further motion. There is a clear and unambiguous relationship between the giving of the sign and the effect to be “understood”. Its purpose is also clear: to command and control. It is clear, in this case, that the aspect of meaning being signified or being given as a sign is not the same thing as the linguistic sign used to express it. Please try to keep this in mind.

Semantics are measures used by people to interpret significance in the real world. In this sense, signs are the guideposts of thinking about, communicating (collaborating or corresponding) and relating things. Another way of saying it is that meaningful or significant aspects of interaction are expressed as words, as nouns, verbs, and as subjects (AKA linguistic signs) in any language, but they are not the same thing as the linguistic signs used to express them.

In Alex Iskold’s post, he is using words in ways that appear significant, but upon deeper examination, or under a critical eye, are not. They are as I have charged — just noise. Why? Because of the way he chose to characterize the Semantic Web and the actual fact that he offers no new approach at all.

In language and in human discourse, names, used as indicators of something, must be characteristic at least and obviously symptomatic at best. When they are not, the semantics (semantic measures) must be called into question. I will come back to this in a future article; for this post I just want to substantiate my charge that Alex’s Iskold’s two posts at the Read/Write web are just noise. Critical thinking is all we need in order to achieve a clarity of understanding of the significance of discourse on cultural and social institutions (such as the Semantic Web).

The Semantic Web of Sir Tim-Berners Lee (TBL) is not about enabling “computers to ‘understand’ semantics the way humans do” as Alex Iskold claimed in his article about the difficulties with the classic approach. This statement is depreciative because it is completely counter to what the Semantic Web is really about.

I would first suggest that most humans, people, do not understand all this discourse about semantics, and are rather quite confused when the discussion ranges into semantics. Yet, people use semantics proficiently in all of their interpersonal relationships. And groups of people, indeed, entire nations rely on their understanding of semantics. This is necessary in order to realize that the threat of the use of WMD by Iraq was a ruse rather than a reality, for example.

The objective should be for we people to understand and articulate these semantics well enough that our machines may process (use) these semantics to better relate and collaborate with us, and to help us identify and discover relations in discourse and between languages, cultures and society at large. But that is also another story that I will cover in another post.

When people agree that some thing or topic of discussion is a matter of semantics, they tend to drop the subject or topic of discussion. The reason being: there is a disagreement over the semantics (of naming). It is understood that nothing comes of that (disagreement over the naming). So in fact, humans do understand semantics very well

So since people are not confused about using their sense of semantics in interpersonal situations, I am beginning to ponder whether this purported lack of understanding is some sort of occupational hazard. Software architects and computer engineers, it seems, tend to be unsocial if not altogether inhuman in the face of the computational complexities of sense-making. So the idea has some merit on the face of it.

Because some of them can assume authoritative roles, it then creates problems in understanding because the noise they proffer muddies the waters and creates barriers to understanding. When popular publications are driven by monetizing every story they post, they seem to be more willing to compromise their editorial integrity and allow this noise into the blogosphere and cyberspace. This tends to a more serious problem where “information” becomes so diffused and ambiguous that it is no longer possible to meaningfully communicate except in small close-knit groups.

Fortunately, a little exegetical examination can help people spot those problems. Let me demonstrate how one can apply critical explanation to the first paragraph of Alex’s article:

“Earlier this week we wrote about the classic approach to the semantic web and the difficulties with that approach. While the original vision of the layer on top of the current web, which annotates information in a way that is “understandable” by computers, is compelling; there are technical, scientific and business issues that have been difficult to address.”

The first sentence requires one to read the previous article to substantiate the so-called difficulties of the approach. I will come back to this first sentence after looking at the second. Let’s just take the phrase ‘which annotates information in a way that is “understandable” by computers‘. Not considering whether that was really the “original vision” of Sir TBL; think about it: Ask yourself: Are there computer programs that annotate their information in ways that are not understood by the programs in those computers? Does that make sense?

This is what semioticians call a sign, a significant hint, a semantic indicator that, in this case, signifies something is amiss. I do not really care what linguists call it or whether they agree, because we are not talking about parts of speech or sentence construction here. We are talking about the significance of the words found in a specific text and discourse and how those words capture, divide and signify a specific state of being. Is the article really about the Semantic Web, an alternative proposal, or is it just a ruse?

While Alex might rightly complain: that is not what he meant at all! This example explains why we need to use exegesis so that we might begin to benefit from “understanding” what is really happening. Because it is tedious and requires close examination it would be a good task for computers. Because people are not always prone to a critical examination and tend to be far more superficial, it answers the question of why we need computers to understand semantics. Okay, now let me correct Alex’s first incorrect premise by offering some evidential resources for you.

Alex Iskold’s claims about the Semantic Web of the W3C are empty because they have no relation to Sir TBL’s vision and are not in the least representative. No matter how sophisticated or insightful Alex’s top down approach is, the premise he gives does not support it in the least. It is really nothing new as well, but rather describes the status quo and the way designers architect their systems today. Alex even gave examples of a few companies structuring data in one way or another.

Sir Tim-Berners Lee’s vision was for a universal means for computers to store, link and exchange data. This vision has nothing to do with human knowledge except as a by-product of organizing data that may (in turn) represent human knowledge in more uniform (and interchangeable) ways. It is certainly not about enabling “computers to ‘understand’ semantics the way humans do”. Anyone with any doubt can view this rather recent video of Sir TBL’s testimony before the U.S. House of Representatives, Committee on Energy and Commerce, Subcommittee on Telecommunications and the Internet.

In order that you do not have to watch the entire 2 hour plus video, here is a time index of relevant parts:

  1. Sir TBL begins his testimony about 30:00 minutes into the video and discusses his vision —the universal nature of the world wide web– until about 37:00 minutes into the session.
  2. He begins discussing the importance of data integration at about 41:50 minutes.
  3. He wraps up with why it is all necessary from about 44 minutes through 49:00 and continues his testimony until the time index of 52.05 minutes.

After that Sir TBL takes some questions, while the U.S. representatives jockey for some sort of platform of theirs, addressing each issue as the great gentleman that he is. I would encourage anyone to listen to the entire session.

Not that I relish supporting the Semantic Web, without a semantic theory, those who take the time to listen to Sir TBL will agree that Alex Iskold cannot possibly be an authority if he is this confused about the vision for the Semantic Web in the first place. I mean; it is clear for example, that the foundation (the internet) and the infrastructure (the weaving together of standards, protocols and programs that underly the web) make way for the ground-breaking applications that will free data from inaccessible information silos.

Alex suggests we maintain the status quo, store data in ways only a few can use, lock us into a structure and an approach– just as RDF locks us into a single and very specific semantic form. Let’s see… Alex’s top down tree — or — the standards of the W3C. Which authority would you choose?

The Romans used to say that all roads lead to Rome. We citizens of modern civilization can only hope to free our authorities from the trappings of top-down and bottom up ideas and approaches to the publicly funded citadels and depositories of the knowledge of human culture. What to do about those who try to pass noise off as authoritative opinion and commercialize the affair is another matter entirely.

Please let me know what you think.

Read Full Post »

As a follow-on to my post from yesterday, another interview confirms that Tim Berners Lee would really rather refer to the semantic web as the data web instead. Sir Tim Berners-Lee said as much at the beginning of this interview by ZDNet Executive Editor David Berlind.

Even though it may not sound like much, it really clarifies the status-quo and should go a long way towards insulating the W3C from criticism. By that I mean that “Semantic Web” proponents opened themselves up to criticism by adhering to the preposterous idea that their work was about meaning or interpretation of meaning from natural language.

It is really about linking data and not about meaning at all, other than the indirect link to how one gets meaning out of combining different sorts of data.

The interview is interesting because TBL talks a little about the long road from the humble beginnings until now. In a lot of ways, we had a similar development path beginning with how to clean, clarify and verify the source pages, how to capture relations and links in human discourse, how to transform phrases and expressions into executable query structures, and then how to use the results through automata (rules, programs).

When people learn that Tom Adi and I have been working on Readware for more than twenty years, they wonder why it takes so long. It is really not so unusual. Dr. Raskin over at Purdue and Hakia has also been working for decades on semantics and theoretical and computational models. It is not something that develops overnight.

The dream of Tim Berners-Lee is to link data from disparate data sources and to have uniform means for storing such data on computers.

The Data Web will make it much easier to get data out of databases.

I am all for that. How about you?

Read Full Post »

Older Posts »