Archive for July, 2007

Things do heat up in the summer time and some say there is some competition brewing among Natural Language vendors that are offering search services.

Over at the Conceptualist, Sahar Sarid comments on whether 30 years of research is enough to beat Google. Citing Michael Reisman for MIT Technology, he thinks semantic search is important but he believes that digging relationships from text is not as useful as personalization and understanding the user’s intent.

I cannot argue against the importance of understanding the user’s intent, but personally, I don’t think any search technology, with or without a personalization feature, is “enough” to beat Google. Google is so much more than a search engine at this stage, their business will be hard to upset.

On the other hand, there certainly seems to be a competition brewing, in the views of some bloggers and in the opinions of the technology press, at least. And the competition is about semantic search offerings, or so they say. Over at the Read/Write web Bernard Lunn, claimed that the money seems to be riding on the NLP systems. It does not feel right to him and not to me either.

These NLP systems, along with the AI of the Semantic Web and lexical resources such as WordNet, are each in themselves great and powerful systems. They are each like the old Roman numbering system in that these modern linguistics systems have a similar effect on people using the Internet as the Roman numeral system had on ancient Romans.

Roman Numerals were a numbering system that prevented an entire civilization from doing any higher math. You can read what Thomas Frey has to say about it here. The proponents of NLP and AI systems from 30 years ago have tried to prevent research into other viable semantic methods.

They have blocked the widespread development of semantic techniques that are capable of processing real and conceptual relationships between words and names and topics or subjects of interest, in favor of extracting part of speech relations. It is very important. Because language deals with everything, and human semantics are universal, getting the fundamentals wrong here mucks up the entire works. It makes things become more complex than need be, and more expensive. That is the state of affairs today.

The ways and means of NLP systems and functional grammars, and all their adherents’ and proponents, are preventing semantic search from surfacing. This goes unnoticed by everyone until someone shouts loud enough to rise above the din of the crowd. There is even greater pressure than the burden of unwieldy systems and better cover than market confusion.

After pumping giga-tax and industrial dollars into the research labs of the prominent schools and the works of their scientists and students, Governments need the venture capitalists to cough up the giga-bucks needed to actually produce something and capture some kind of market. I am not saying that this in itself is good or bad — it is just the way of capitalism after all, and it meets the objective of the industrio-academia-government partnerships that dominate the field.

Yet, by focusing research and market development funding on NLP and AI based-systems, “gatekeepers” have nearly prevented independent theories and very creative developers from getting funding and from “going commercial” just by playing their role as gatekeepers. By such actions, they continue to stymie and hobble viable research directions and other quite defeasible possibilities for semantic search. Thomas Frey wrote an essay about that too; you can find it here.

So I predict that although these companies are making in-roads, and they are making NLP systems more adaptable and usable, they will fail as “semantic search” systems because they are not doing semantic search at all. Or perhaps the public is as fickle as they seem and can be fooled, in which case, I could be wrong.

While Hakia and Lexxe have excellent implementations, and I have no doubt that PowerSet’s offering will also have strengths — not one of them qualifies as semantic search in my book. In regards to PowerSet, what Michael Reisman was reporting was that Barney Pell claims that Powerset has innovations that make the system more adaptable so that it can extract deep relationships from text. No one is saying what that “deep relationship” is, mainly because it is not deep at all; it is a surface level linguistic feature.

Not one of these so-called NLP-wonders can answer a third grade question; as I previously wrote here. Neither can they pass a simple test for semantic search capabilities — the most revealing of which is the capability to construe the meaning of a query given in another language, like this.

Commercial NLP based systems, such as Hakia, Lexxe and PowerSet can only do this in regards to English grammar– and how well they handle all forms of grammar is highly questionable and often disagreeable.

People should remember that the relationships they deliver are grammatical relationships. These relations cannot even be classed as semantic except as they relate terms to parts of speech. Having and knowing the concept of noun or verb and extracting the relation between the subject and object in a sentence reveals little about the possible associations and relevance between words and structures and concepts of the mind.

Read Full Post »

How important do you think that it is to recognize when you are about to make an error? If you rate it as pretty important to you, then you will agree that that sort of recognition would be something very meaningful. The very act of distinguishing the error is of perceptual significance and personally meaningful.

Wouldn’t it be nice if a semantic search algorithm can distinguish a bad or false hit (an error) from a good or positive one just as we can?

Recent research (you can read about over at Science Daily: Why We Learn From Our Mistakes) shows that our brains are built for recognizing errors.

Science Daily — Psychologists from the University of Exeter have identified an ‘early warning signal’ in the brain that helps us avoid repeating previous mistakes. Published in the Journal of Cognitive Neuroscience, their research identifies, for the first time, a mechanism in the brain that reacts in just 0.1 seconds to things that have resulted in us making errors in the past.

This is also so universally applicable to human nature that human language has a built in semantic domain to identify, distinguish, communicate and control sense datum such as an error or a deviation.

Some people may be surprised to learn that there are also inheritance rules that require consistency of the semiotic signals as they propagate the properties, characteristics and all variant interpretations of the sense data through all possible symbolic compositions and permutations. The symbols “e r r o r” spell out the property or characteristic to convey, linking the sense datum to the lexical element of the English language in this case. This may seem preposterous without knowing the theoretical basis for that link, but that does not make it so.

Think about it this way: Failures, errors, faults, deviations, in body, mind, health, work, opinion, belief, character: Does the interpretation of these sorts of things occupy any part of your thoughts? Do you talk about them, communicate with others: your spouse, your preacher, your friend, your dog? Do we debate them socially, nationally, culturally? Do they go away if you do nothing or do they stay with you, even grow and propagate? If you don’t get that, consider this: It is clearly an innate function of interpersonal communication, among all species, to observe and alert others of error, deviation, or danger.

It is what we talk about, if you take some time to consider it. Language rests on the ability to interpret such sense data and we have highly refined symbols, semiotics and communications systems to propagate such signals.

The syntactic mechanics and semantics for distinguishing and interpreting sense data makes a fine basis for a parsimonious and scalable computational model of human language. Perhaps it also plays a significant role in the psychological and social basis of meaning and interpersonal communications. Because this particular semantic mechanism has been developed into a functional computational model it presents the possibility of a new direction for research on functional models of language and cognition.

Unfortunately, the research scene is tenured and business and government research and development is incestuously imitative. It is almost unheard of to break from the recent past, in computation, in linguistics, and in philosophy and logic. I started blogging about semantic search mainly because I have direct experience developing and fielding semantic search applications since before the Internet was much more than a figment of people’s imaginations. That is a long time– not as long as John Sowa perhaps, but a very long time nonetheless.

For as long as I have been in the business, the semantics of personal meaning and perception have always been a kind of nebulous subject in the computing sciences where they invent and use formal programming languages with formal logic and accompanying semantics.

Nevertheless, without the slightest consideration of interpersonal meaning and human perception, it seems every young computer science graduate thinks they have the stuff to process natural language statements, messages and texts, simply by building a few arrays and parsers and doing some table look up. Those that try fail.

They fail for many varied reasons, but mostly they fail for lack of a well-grounded and universal semantic theory of natural language and human cognition. It has been that way since before computer science began being offered as a course at major universities. It is that way today.

I guess there is hope and we can have some faith that people are built with the faculties to eventually realize they have made a mistake and orient in a new direction. What do you think?

Read Full Post »