Feeds:
Posts
Comments

I have heard it said that “a rose by any other name would smell as sweet”. The original line comes from Shakespeare’s famous play about Romeo and Juliet:

“What’s in a name? That which we call a rose
By any other name would smell as sweet.”

According to scholars: Juliet, prevented from marrying Romeo by the feud between their families, complains that Romeo’s name is all that keeps him from her. Juliet’s lines before the quotation most often remembered, are:

“Tis but thy name that is my enemy;
Thou art thyself, though not a Montague.
What’s Montague? it is nor hand, nor foot,
Nor arm, nor face, nor any other part
Belonging to a man. O, be some other name!”

She is certainly complaining that a name is no “part belonging to a man.” It is not part of the substance of being a man. Juliet is trying to make sense of her conflicted being– inflamed with her heart’s desire. Perhaps as part of her thinking about ways out of the conflict in which she is engaged, she reasons there is no particular cause that her man could not be called by another name. She pleads for another judgment supported with her famous argument that the “essence”, or “bare and particular substance” of a Rose is its sweet smell –that would remain if called by any other name– she continues in the following lines:

“So Romeo would, were he not Romeo call’d,
Retain that dear perfection which he owes
Without that title. Romeo, doff thy name,
And for that name which is no part of thee
Take all myself.”

She concludes Romeo would still have that “dear perfection which he owes” were he not called Romeo. Finally she asks he forsake his name for her. This is not an easy thing she asks. Forsaking one’s name means forsaking one’s family, one’s ancestors and heritage and perhaps one’s fortune and inheritance. She does not offer to forsake her own name. That demonstrates a small and unavoidable, and also undeniable, part of the subjective obfuscation of the “dear perfection” which is the bare and particular substance of everything in existence.

Human culture is unique in our capabilities to name all the things that touch the existence and impinge on the experience whether or not such things have or show tangible form. Such is the distinctive character and aroma of the rose, the distinctive character and dear perfection of Romeo, for example. Other kinds that have brains, intentional states and awareness, do not confer, give or grant names to things and record them for their posterity.

This suggests that one owes all that adds to one’s own knowledge and character to that “dear perfection” -that Wisdom to which each of us may often appeal –that, that exists, remains and endures—beyond individual and subjective existence and experience– certainly beyond the names of things. For what might become of that presence we call a rose if no one person were ever around to experience its sweet aroma, to call it a rose; to cultivate and appreciate it—no one with any senses to thrill, existing or thinking?

The terms “thought,” “idea” and “belief” are just names for the “stuff” of that dear perfection that seems to flow into and out of each of us, just as the name of the rose is a handy moniker for the apparent salience of that unmistakable aroma and perfection sensed upon its appearance or recollection. These names are terminology we invent; to order, “slice up” and talk about the presence of “stuff” going on or happening in our heads, in our hearts and all around us, only because that same stuff is particular to everything that is going on or happening and it cannot otherwise be distinguished.

Therefore it is a distinguishing process, this conceptual, reflective thinking in which each of us engages. Thinking is something that each of us happens to do, though the stuff or substance that we take in as input to such awareness and cognition is nebulous and it is regularly deemed unclassifiable as it is channeled, consumed, recognized and altered into a product observed or otherwise output.

It is for one’s own self to distinguish for individual perspectives of that stuff can only be accorded a spontaneously occurring designation suited to the moment, to an aspect or to a function. It is wise to: Know thyself.  Yet, if we are to know it, however sophisticated the designation that is born of that special ideal, we ought not to mistake the cognition nor the designation of it  for substantive power of it –this power by which each of us are gently impelled and often rigidly compelled to register, resolve and to reason.

Shakespeare wrote:  “We are such stuff / As dreams are made on”

Dreams are a force on their own. The mind is made on that same sort of stuff. Though we cannot quite touch this powerful substance we cannot deny the forceful and influential effect the dreams of all humanity have had on our collective and individual awareness.  In order that we may dream a dream and think a thought, indeed, that we enjoy the inherent capacity to know a rose, we have this power to use, to do work;  to try and extract or separate some of that essence we call the rose– from its existence in that dear perfection in which it resides and from which all the world gets its share.

The Egyptians attributed such power and rudiments to Thoth, the early Greeks attributed the power to the Logos. Today, some call this Providence and many call this the power of knowledge.  The rose is not a lotus nor any other kind except that it is. The existence of names, terms and the pervasive use of language throughout the time of human civilization, supports the fact that there is present, common, and enduring value to that dear perfection empowering and diffusing every idea—the cumulative and dynamically incremental heritage of creaturely sharing in which each of us persistently partake and delight.

That there are rudiments derivable from such distinctly human qualities and that these are representative sign functions for the wisdom and thought also obtainable from the gamut of human languages, seems incredible…and but for Adi’s Semantic Theory we are indeed clueless, rudderless.  Many scientists, those skeptics called relativists, even professional linguists, resist the idea that the stuff of dear Perfection, the Logos behind all speech and every human word is indeed amenable, if not to definition, at least to utilitarian indication; and there is meaning enough in that for everyone’s pleasure.

A New Theory of Cognition

I am happy to announce the publication of “A New Theory of Cognition and Software Implementations in Information Technology” to be published in the April-June issue of the Journal of Information Technology Research, Vol. 2, Issue 2, 2009.

Abstract

“The Scientific Method means that theories are developed to explain observed phenomena— similar to the task of text analysis—or to search for unobserved phenomena—similar to text retrieval. Theory development means testing a large number of proposed theories (hypotheses) until one is corroborated. To make theory development efficient, a method is needed to construct promising theories—ones more likely to succeed. Such a method is part of a new theory of cognition that is introduced. The theory is implemented in the software Readware. Readware uses theory development methods for text analysis and retrieval. Readware’s development, features and large-scale performance are reviewed. This includes a fast ontology-building system, the cross-lingual word-root theory base, a language to code theories, algorithms and ontology implementations, and software applications and servers that perform text analysis and retrieval using Readware API functions.”

Article copies are available for purchase from InfoSci-on-Demand.com.  You can also look for the publication in your local university library.

The Keys to Relevance

A key is a fundamental or central operative of harmony. The connexion of relevance is recognized concordantly.

A quick read of popular technology news and review sites– gives one the impression that the trouble people have with search engines– those called semantic search engines and all other search engines too– is the relevance of the results. This of course, is besides any trouble people have with the actions of the company fielding the search technology, e.g. the corporate entities such as Google, Yahoo, Microsoft and Powerset, and Hakia, Cognition among about a hundred others.

The problem I want to address is the problem with the relevance of the results, because even with the new crop of semantic search engines using sophisticated Natural Language Processing (NLP) technologies, the problem remains. In fact, NLP technology hardly addresses ambiguity, let alone, relevance. There is a good reason for this that I can now summarize, after dealing with it for more than twenty-five years. The problem actually stems from the abstract ideas about relevance.

By ideas, I just mean people’s thoughts and deliberations over exactly what is relevant. For corporations, the answer is very obvious: what is relevant is nothing but that substance that increases corporate equities. That substance may be abstract to some, but it is very real to corporate shareholders and the business managers they employ. It makes perfect sense that money and wealth and any investments of the same in any assets that generate that substance is the thing that is relevant to business. Any assets that do not perform or have little or no uptake in that way are dumped. This is why industry, corporations and businesses thereof, have overlooked the keys to relevance and have instead held steadfast to their own values of relevance to their own institutions.

However, a quick scan of current events shows that this substance they value so highly is not very real. It can evaporate and disappear before your very eyes. This is because money has a closer connection to fuel than it does to the fundamental keys of relevance. Anyway, that shows that that substance: money, wealth, etc. are not fundamental keys to relevance. The same thing is true of the objects of Natural Language Processing technologies: they are not the substance or keys to relevance; some of the objects they use are the signs of such substance: the semiotic keys.

A word is such a key: a semiotic key, not a harmonic key. A harmonic key is the substance of relevance and meaning of which the word is a semiotic key symbol. A symbol is a sign of something invisible or abstract. A word is a sign of abstract harmonic keys that are the substance and essence of relevance to judgment. As I pointed out in my previous post, these keys are marked by sounds, phonemes, letters.

Harmonic keys would behave just as they sound, and they would not leave anyone reeling in discord and conflicted in such a way as money markets are conflicted today. This is because the keys to relevance are concordant to the essence of relevance in one’s own mind. By that I mean, what is relevant to one’s own judgment. Just what is that?

That is of course, the premises required by the rational mind. For the premises of an argument for your judgment are often left unstated or hidden– left for you to figure out – left abstract.

Would anyone like to know more? For example, would you like to know the premise of fear? Why do people feel fearful about the economy today? The premises for this are factual. The signs appear in the outlook or horizon and in the word fear: the semiotic symbol itself. If you know this premise, you already know the reason for your fearfulness. It is like asking what is the fundamental quantity of fearfulness? The fundamental quantity of a physical substance is mass, length or time. What is the fundamental quantity of a conceptual substance like fear? Wouldn’t you want to know this measure?

Leave a comment if you would.

The notions of meaning, semantic mapping and relevance are nebulous not because they are fanciful, though that could be argued in many cases. The conceptions or mental representations of these notions–formed in people’s minds– are not entirely clear nor plainly understood.

Many would argue that one cannot know the concepts in people’s mind; particularly those involved in computer programming and computational linguistics.  They consider thoughts to be ephemeral and what people may have in mind relevant but murky at best. Because meaning is the effect of interpreting a sign— something that refers to an object—on the interpreter’s mind, we must attempt to understand these signs. In this attempt, we must clarify not only our language but the interrelations of its terms to our thoughts, our emotions and other personal motivations.

In this post we will take flight, despite the forecast, through the storms of controversy and the clouds of confusion. I will take you to a place where the meaning is sound and discovery is a realization. It is a long and difficult exploration so get comfortable before you begin.

Because people think with words and communicate with language; because we learn by reading and writing, and because laws are written with words, it makes sense to understand what it is about words and language that is connected to thinking. We know it is meaning that connects it altogether.

What every sense-maker wants is to abstract the significant.  In the face of incompleteness we may settle for the explanatory power of the elements and dimensions of that meaning. This so that a course may be predicted with greater confidence and that we may reach a more calculated response.  Because as any sense-maker knows: impressions evoke more perceptive thought and such thoughts provoke action.

Because human behavior is the cause of severe problems in society, it would be a great assistance if computers were able to help clarify to people, to us, the elements and dimensions of thought processes and patterns underlying languages, logics, laws, computing, and all inventions of the mind. If only so we can be more certain we have the same things in mind.

It is like that in the case of the moniker semantic search engine.  What is a semantic search engine? It is obvious that we agree about the search engine part. Agreement about the word semantic is less certain; we do not have the same thing in mind at all.

Computational linguists, generally speaking, do not study meaning in the general sense outlined above. Instead, linguists, under the influence of leaders such as Bloomfield and Chomsky, left the study of meaning and mental representations to psychologists and to others.  The study of meaning, mental representations or semantics, was deemed unscientific.  Chomsky called it “mental gymnastics” and “pseudosemantics”.  Whoever has a linguistics degree now, was trained in that tradition.

In this article I will post about semantic maps and the semantic map employed by Readware technology in particular; mainly because that is the one I know. I will also emphasize where Readware’s approach and semantic mapping departs from the traditional approaches that combine natural language processing (grammar-based NLP) with artificial intelligence and database techniques.

(Full Disclosure – I developed Readware technology along with Dr. Tom Adi.)

A semantic map is related to the schema used in relational database technologies with a data dictionary. It is used to model the data in the system.  In semantic web products like Powerset, and some NLP-based products, a semantic map is used to model the data in the system according to the ontology being utilized. Those that don’t use an ontology specifically, consider graphs of triples (e.g. a,r,b, or a relates to b) to be semantic mappings that map language or some other item into a logical assertion –also called a concept.  Let me depart to define the term ontology.

In AI technology, and on the semantic web, an “ontology” is a set of classes, attributes and relationships (essentially a set of assertions) that are used to model a domain of knowledge in a language that is close to that of formal logic (according to Tom Gruber). In addition to use in mechanical inference, the main purpose of such ontologies is not to study natural phenomena but to integrate “heterogeneous databases, enabling interoperability among disparate systems,” Gruber says.  This is different than the long time use of the term in philosophy where it refers to the study of being (in the world) and where its purpose is the study of natural phenomena.

An ontology of information assets (e.g. the Dublin Core) can help me sort and identify media but it cannot help me understand why some people of some nationalities want to kill everyone of another nationality.  What is the meaning of that? Isn’t that phenomena worth studying so we can avoid that sort of behavior in people where we live?  Talk about a use case for disambiguation — but you won’t find any NLP technology up to that task in this case.  They say it is not possible with current technology but what they mean is: it is not possible with their technology.

So the reason I bring this up is because some researchers, computer scientists, developers and even product managers who talk about a “semantic map” are not mapping words onto psychological or cultural meaning at all.  It is not even about natural phenomena or “meaningful” experience. They are mapping words onto the artificial classes, and or attributes and relationships (assertions) that are used to model a domain of knowledge coded in their computational specification.

According to their announcement, Cognition is mapping words by word forms and word senses (attributes of words) and by synonyms and antonyms (relationships between words) among other mappings. They call that meaning and semantics.  Besides the significant difference in our understanding of meaning this is also where there is a significant departure between the traditional logic and techniques of artificial intelligence and those of Readware technology.

You see.  The objects of NLP-driven search engines –and AI programs and RDF files– are assertions.  Millions of assertions if you heard what the folks at Cognition said.  Each vendor is trying to prove they have or make more assertions than the others.  It is as if they reason that if you can catalog all the possible choices, you can sort out which ones apply at specific instances.

It is not so easy. Because information is generally incomplete and inconclusive, just like it is in the case of language and semantics, one must study the subjects to get to the truth. And, in cases like this, the truth is often hard to recognize and may be hard to pin down.

Every natural phenomena we sense has to be studied to some extent.  It is how we learn. The best way to study any natural phenomena, such as language and meaning, is with the scientific method.  For literate people brought up in the western traditions of the world, thinking is less rigorous and formal but it is patterned on this method, and: it is how we learn.

So let me say how I think and learn.  I think with conjecture, guesses… theories… about the nature of the real things and principles I recognize. This leads me to choices and to assertions. I conclude with what is true or not and what assertions might be made; I do not start with them.  I start, usually, with a conjecture and begin to refine a simple idea or mental representation of the reality of the situation.  Descarte called these initial thoughts innate ideas and Kant referred to them as a priori judgments.

Let me ask you seriously:  Do we want computers to check our logic or do we want them to help us create more useful and concrete theories for solving the problems we face?  Do we want computers to be spectators of the human condition or do we want computers to be useful participants assisting us in thinking about how we can improve the human condition?

I think constriction to assertions and traditional logic limits the help computer programs could provide in clarifying theories, evaluating choices and making predictions.  In the words of the eminent mathematician Dr. Vaughn Pratt:

Traditional logic, like classical mechanics, is a spectator sport: there is an apparatus and a separate observer. Information flows from the apparatus into and around the observer, whose measurements are assumed not to disturb the apparatus. The observer is therefore an information processing system, the essence of which is a graph with nodes A,B,… along whose edges f:A->B (measurement f with source A and result B) information flows. The apparatus itself does not see these edges (but constitutes the sources of some of them) and is not disturbed by the observer. The graph of an idealized observer is a Heyting or even Boolean algebra in the case of nonconstructive logic and a cartesian closed category in the case of constructive logic. Considerations of computational complexity and relevance may call for weaker observers, but not so weak that they disturb the apparatus.

The essence of traditional logic then is an intelligent graph reaching its edges into an unsuspecting structure and contemplating its behavior.

This is useful for static structures and well-known procedures but language and the world itself is made up of dynamically changing structures and interrelated processes. Nothing is really static.  So a key difference between Readware technology and the AI technology used by NLP approaches is the difference between being a spectator and being a participant.  The nature of Readware logic is to order and interrelate the elements of a structure and thereby determine the essence of its controlling processes.

The Readware appartus or semantic map is an information processing device (a regular sign system) and the observer (the mind) is a controller. It receives partially processed (ordered) information from the apparatus and it responds with decisions (and even assertions). The abstract objects of Readware are not assertions and Readware algorithms generate theories not assertions.  There is a big difference between these two notions and the consequences of their common use and deployment.

Notwithstanding the search relevance, and as for the rest of the results NLP products achieve, let me say they show an incredible amount of sophistication.  The parsing and recognition of word forms and word senses is world class in all the systems on the market.  The products that implement the complex parsing and indexing of documents into word forms and senses and entity classes and relations are world class products.  I would love to have a stack comprised of Readware and any one of these language processors.

Because what this crop of vendors do not do (even though their claims imply they do), is pattern experience well enough to induce meaning (in computational memory) from the word forms themselves, as I described above and as literate people do.  Let’s look at an example.

Psychologists who study emotional trauma’s consider language as some of the best evidence available.  If someone says they hurt, they should also know that they are hurt and therefore the self-report of being hurt is valid evidence along with other behavioral and physiological evidence, or the lack thereof. It is important for the care provider to recognize, or create a mental representation, of what it means for the patient to be hurt particularly in the instance where there is a lack of physiological or other behavioral evidence.

In order to interpret self reports, (and testimony, text reports, etc.) one needs to study language. This is best done using natural language semantics. To study the semantics of language one needs universal elements or objects. Linguists, psychologists and others who do study semantics look for such semantic universals—concepts that are cross-lingual and cross-cultural.  Being in psychological pain is not an English or even a Chinese language object, it is a meaningful impression of biological or psychological activity.

Such universal concepts are the major defining characteristic of a “semantic map” for a computer program whose vendor makes the incredulous claim that:

We have taught the computer virtually all the meanings of words and phrases in the English language,

As Cognition chief executive Scott Jarus told AFP.

He could have claimed that Cognition had cross-referenced all word forms and senses known to the English language with the definitions of all the words and phrases in their lexicon.  I believe they may have done that much.  However, if the “meanings” of the words were really taught to the computer, then the computer ought to be able to look-up a word and use its definition and any other entailments to perform a search on the subject and report the result.

The Cognition search engine clearly cannot do this. It can only search on the word or words a user provides.  Pick any word and try it for yourself.  Being able to look up any word and cross reference all its details is a skill that goes a long way.  And reading, interrelating and using those definitions to explain reality or experience is something else entirely; this is where a semantic map becomes necessary.

In psychology, a semantic map is a pattern imposed on reality or experience to assist in explaining it, mediating perception, or guiding response. That is the conception of a semantic map I want the reader to have in mind as we continue.  By understanding a semantic map in this way, one has an intimate way of evaluating the efficacy of a proposed semantic map without buying into the computer products first. Because evaluation of semantic maps is the critical and necessary step before adopting them, let me be redundant.

  1. A semantic map is a pattern imposed on reality or experence.
  2. The purpose of a semantic map is to assist in explaining experience.
  3. The purpose of a semantic map is to mediate perception or guide response.

Those annoucing semantic maps should meet this criteria and explain what part or how much of reality they have successfully mapped. My own participation in the research and devlopment of a semantic map (we called it a semantic matrix in the original work) began in 1982.

Getting through the online noise and storm around the concepts of semantics and relevance-– to the actual elements and dimensions of “meaning” and “human understanding” -–is a long term, often frustrating and sometimes harrowing experience. Now that I am thinking about it, it puts me in mind of spiraling down to the ground through thunderclouds in severe weather.  The problem is that things can get out of hand quickly.

For those that maintain control and make it to the ground, there are ways to understand these concepts and all concepts of the mind. This is simply because in order to be shared and to persist; a) any favorable concept must become less abstract and nebulous and form into a more concrete idea, scheme or plan, and; b) there are abstract and specific and recognizable elements and dimensions to every well-formed plan, idea and understanding.  By favorable, I mean a concept likely to survive for whatever purpose: good or bad.

Ultimately language is intimately involved and plays a great role in all forms of human understanding.  And so, many researchers accept that there is a mapping, between the abstract elements and dimensions of meaning and the signs of language.  Readware technology maps the signs of language onto abstract elements and dimensions of emotional and physical control using a matrix of sound symbols as the semantic map according to Adi’s Semantic Theory (ATS).

Some may ask: why choose the elements and dimensions of emotional and physical control? In truth, we did not choose them.  They were derived from a semantic study.  But in hindsight, one wonders why others  did not already recognize them. Going back to the psychologist interpreting their patient is hurting from psychological pain, caregivers want to know how to control that ‘pain’ so it can be mediated in the constitution of their patients.  To get control of pain means we must rest that control from someone or something else.  Physicians have tactics and best practices for this case.

I cannot think of many things in the world that are not directly interrelated to or affected by some form of emotional, physical or environmental control.  Because emotional and physiological control is a large part of the human condition, a shared and interpersonal semantic space is readily patterned by its elements and dimensions.

The Readware semantic matrix has a small number of elements and dimensions for mapping a large number of interpretations.  This is why I read with some amazement the announcement by Cognition that they have the world’s largest semantic map. It motivated me to write this post.  One reader commented that those of us that disagree with Cognition’s claim should just hold our objections and let them tout their wares.

The problem is that the claim they make gives the reader the wrong impression.  Here is the impression Anthony C. shared with his own words:

The academics can discuss the Olde English and definitive dictionaries that have a set number of words, but I’d prefer an NLP system that understands all the meanings of those dictionary entries. That one sounds like it can build a business by licensing “the bit about them that’s unique.”

Anthony has the (wrong) impressions that the OED is a dictionary of a fixed number of words in the Olde English language and that the NLP system (Cognition) understands all the meanings of it’s dictionary entries. And he reaches a dubious conclusion because of those impressions.

It does not take much to prove that there is no NLP logic capable of interpreting the meaning of simple word forms like feel, fear, hope and love and using those interpretations for locating instantiations in natural language expressions (text). There is only keyword search.  People speaking other languages also have words that refer to the same meanings indicated by the words feel, fear, hope and love becasue these human emotions are experienced irrespective of the language spoken. Keywords don’t work cross-lingually on texts.

I expect most sense-makers will hold, as I do, that there is no possibility of achieving “a more accurate or relevant understanding” without understanding the universal elements and dimensions of the meaning of such signs.

An Introduction to Semantic Mapping with Readware technology.

Some researchers believe semantic universals can be found in simple terms like feel, fear, hope and love that are shared across cultures and languages. Many believe that more complex concepts are not shared by many languages. Along with my colleague Dr. Tom Adi I believe that certain sounds are symbols material objects used to represent something invisible. These sounds are shared by all languages. The symbols represent abstract semantic universals that are used by the mind for symbolic processing tasks such as thinking, reading and writing.  The modern phonetic alphabet represents these symbols thus:

a b c d e f g h i j k l m n o p q r s t u v w x y z

All the sounds of every word of the English language are mapped into the writing system using these symbols.  The Adi Theory of Semantics (ATS) maps these symbols onto 11 dimensions of emotional and physical process control. The Roman alphabet and phonetic symbols are arbitrary, of course, as are the conventions for combining the sounds of the English language and any natural language for that matter. Therefore the symbols of any language can be mapped to the universal elements and dimensions of these elementary, compound and interrelated processes without changing or disturbing them in any way.

Every symbol maps onto one of seven abstract processes—assignment, manifestation, containment, assignment of manifestation, assignment of containment, manifestation of containment, and assignment and manifestation of containment—each with  one of four abstract polarities—closed-self, open-self, closed-others, and open-others  This is visualized in the table I have included below.

You may notice that some cells are empty and some have multiple symbols.  There are reasons for this though listing them here would take us away from the present discussion. You may also notice the abstract objects closed-self and open-self are opposites as are closed-others and open-others. These pairings of polarities can be taken to represent abstract interpersonal engagement conditions (self and others) with abstract interpersonal boundary conditions (closed and open).

So, in Readware technology, intuitions about the words of a language– such as feel, fear, hope and love — are obtained by mapping the phonemes of these words —considered by most linguists to be the smallest elements of meaning– onto these abstract semantic universals. This is done with a simple algorithm used for transforming a word into the abstract objects indicated by the structure of the (word’s-root) phonemes (an abstract word theory) representing the intuitive meanings of the word.

These abstract theories (sign functions) produce impressions that have many possible interpretations or realizations.  We would not want to put a number on this and rather believe that the number of realizations are open-ended.  Such realizations have explanatory power that can be studied outright, Readware technology quantifies them for use in computational algorithms.  Now let’s get into the practical implementation so we see how it works.

Most words are ambiguous because sounds produce ambiguity (multiple meanings) when combined in a word root. This is at least partly because each phoneme symbolizes compound abstract objects that convey different aspects and characteristics of the natural phenomenon referenced by the word.  Polysemy is a linguistic term that means that a word root may refer to different objects in different contexts.  All phonemes are polysemic.  All words appear to be polysemic too.

According to ATS, every word can be transformed into one of several forms of quantifiable functions defined over the discrete domains and ranges (of control) dimensioned by these abstract semantic universals. Thus, even emotional word roots that are ambiguous can be included in the evidence studied to understand their nature.

Adi’s theory of semantics has rules to convert any word root into an abstract mathematical mapping such as f: X->Y or f(X), etc..  Consonants that refer to processes with higher precedence play the role of the function f of the mapping f: X->Y and the remaining consonants represent the domain X or range Y of the mapping.  The mapping f(X) is a mapping with an unspecified range.  The words feel, fear, hope and love are mapped by this formula, implying that the context for these word can range across anything at all.

For some, the formulas and the abstract mappings of semantic universals, may be too abstract to be of much practical use, yet each abstract mapping can be interpreted into more concrete terms suitable not only as a definition, but also as a knowledge representation with extraordinary explanatory power.  In other words these abstract impressions induce more concrete interpretations.  Such interpretations can be corroborated with personal experience

Consider, for example, that the sound /fe/ symbolized by the letter “f” represents the abstract semantic universals open-self and manifestation. Remember that open-self is a polarity (think charge, inclination, valence) and that manifestation is a process.  Action and activity are effects of processes of manifestation, i.e., one manifests behavior and actions, and activity is manifested.  So this is how a phoneme from a language is mapped to a compound abstract object that evokes multiple interpretations –meanings– from the sound-symbol.

Both words, feel and fear are used to refer to sorts of emotional activity. That is not a definition from a dictionary though it is defining.  In the word fear, the emotional activity applies to the domain of self –is assigned inward– (by the polarity) indicated with the consonant “r” in the formula.  In the word feel, the “l” has outward polarity and assigns the manifestation outward– the domain is open to the outside.  These abstractions give us some impressions of what it instinctively means to fear or feel–yet they are even more than that. They are theories. We can use them to generate more concrete theories about what the words fear and feel mean.

For the “f” in feel, these abstract universals can be realized simply as “opening oneself to outside manifestations”. For the sound “f” in fear, the universals induce the more concrete realization: vulnerable state. Vulnerable is a realization of the open-self. A state is a realization of a particular manifestation.  Of course, any realization is conjunct to the situation, circumstances or context of use.

An event is also an interpretation of manifestation and the open-self is the universal negative, so a concrete realization of fear is a negative event. The open-self polarity is also realized as unfamiliar emotion and both the words feel and fear are used to communicate a sense of unfamiliar emotional activity. The unfamiliar can induce fear and result in feeliings of anxiety and agitation.  And a feeling is often initially unfamiliar enough to get our attention.  Do you feel me?

The explanatory power of the semantic universals of this mapping enable us to make predictions such as:

–the advent of uncertain or unfamiliar circumstances can evoke fear in the minds of people.

This prediction can be applied to events concurrent with the uncertainty of the political future in America.  Given a collection of American news articles covering this year (2008) from January until September, Readware algorithms can identify the instances that evoke fear and contribute to increasing uncertainty, anxiety and agitation of popular opinion from articles that do not.  This can be done with a query of the form: fear, because.  The entire process for this case, would take less than a few hours, including installing software, parsing and indexing documents and achieving the results. It would cost pennies per article given a few million artilces exist in that range.

By finding instances, textual evidence from passages in press reports, Readware technology can be used to inform political strategists, for example– to locate and track relevant issues that produce fear in the populace and deserve focused attention. Such information can be used to great advantage, to damage an opponent, or not at all.

So, If you are really thinking about using semantic technology, you should know about the limitations of products based on traditional AI-style logic and mechanical inference.  Learn to recognize the difference between what they claim to do and what they actually do.  And be aware of the availability of alternative methods that exploit the explanatory power of text.

Search. I suppose there is no denying that the word “search” ascended to significance in the consciousness of more people since the birth of Information Science than perhaps at any other time in history. This supposition is supported by a recent Pew Foundation internet study stating that:

The percentage of internet users who use search engines on a typical day has been steadily rising from about one-third of all users in 2002, to a new high of just under one-half (49%).

While it may not be obvious, it becomes apparent on closer examination of the phenomena, that the spread and growth in the numbers of words and texts and more formal forms of knowledge, along with the modern development of search technology, had a lot to do with that.

Since people adopted the technology of writing systems, civilizations and societies have flourished. Human knowledge and culture, and technological achievement, have blossomed. No doubt.

Since computers, and ways of linking them over the internet, came along, the numbers of words and the numbers of writers have increased substantially. It was inevitable that search technology would be needed to search through all those words from all those writers. That is what Vannevar Bush was telling his contemporaries in 1945 when he said the perfection of new instruments “call for a new relationship between thinking man and the sum of our knowledge.

But somewhere along the line things went wrong; some things went very, very wrong. Previous knowledge and the sum of human experience was swept aside. Search technology became superficial, and consequently, writing with words is not considered as any kind of technology at all. That superficiality violates the integrity of the meaning of search, and the classification of words merely as arbitrary strings is also wrong, in my view.

Some scientists I know would argue that the invention of writing is right up there at the top of human technological achievement. I guess we just take that for granted these days, and I am nearly certain that scientists that were embarking into the new field of information technology in the 1940’s and 1950’s were not thinking of writing with words as the world’s first interpersonal memory– the original technology of the human mind and its thoughts and interactions.

Most information scientists have not yet fully appreciated words as technical expressions of human experience but treat them as labels instead. By technical, I mean of or relating to the practical knowledge and techniques (of being an experienced human).

Very early in the development of search technology, information scientists and engineers worked out assumptions that continue to influence the outcome, that is, how search technology is produced and applied today. The first time I wrote about this was in 1991 in the proceedings of the Annual Meeting of the American Society of Information Science. There is a copy in most libraries if anyone is interested.

And here we are in 2008, in what some call a state of frenzy and others might call disinformed and confused– looking at the prospects of the Semantic Web. I will get to all that in this post. I will divide this piece into the topics of the passion for search technology, the state of confusion about search technology, and the semantics of search technology.

The term disinformed is my word for characterizing how people are under-served if not totally misled by search engines. A more encompassing view of this sentiment was expressed by Nick Carr in an article appearing in the latest issue of the Atlantic Monthly where he asks: Is Google making us stupid?

I am going to start off with the passion of search.

Writing about the on-line search experience in general, Carmen-Maria Hetrea of Britannica wrote:

… the computing power of statistical text analysis, pattern-matching, and stopwords has distracted many from focusing on (should I say remembering?) what actually makes the world tick. There are benefits and dangers in a world where the information that is served to the masses is reduced to simple character strings, pattern matches, co-location, word frequency, popularity based on interlinking, etc.

( … ) It has been sold to us as “the trend” or as “the way of the future” to be pursued without question or compromise

That sentiment pretty much echos what I wrote in my last post. You see, computing power was substituted for explanatory power and the superficiality of computational search was given credibility because it was needed to scale to the size of the world wide web.

This is how “good enough” became state of the art. Because search has become such a lucrative business and “good enough” has become the status quo, it has also become nearly impossible for “better” search technology to be recognized, unless it is adopted and backed by one of the market leaders such as Google or Microsoft or Yahoo.

I have argued in dozens of forums and for more than twenty years that search technology has to address the broader logic of inquiry and the use of language in the pursuit of knowledge, learning and enhancing the human experience. It has to accommodate established indexing and search techniques and it has to have explanatory power to fulfill the search role.

Most that know me know that I am not showing up at this party empty-handed. I have software that does all that and while my small corporate concern is no market or search engine giant my passion for search technology is not unique.

In her Britannica Blog post about search and online findabillity, Carmen-Maria Hetrea summed up her passion for search:

Some of us dared to differ by returning to the pursuit of search as something absolutely basic to the foundations of our human existence: the simple word in all of its complexity — in its semantics and in its findability and its futuristic promise.

You have to ask yourself what you are really searching for before you can find that it is not for keywords or patterns at all. Out in the real world almost everyone is searching for happiness. Some are also searching for truth or relevance. And many search for knowledge and to learn. If your searching doesn’t involve such notions, maybe you don’t mind the tedium of thorough, e.g., exegetical, searching. Or maybe you are someone who doesn’t search at all, but depends on others for information.

How is the search for happiness addressed by online search technology? Should it be a requirement of search technology to find truth or relevance? Should a search be thorough or superficial? Is it about computing power or explanatory power? I am going to try and address each of these questions below as I wade through the causes of confusion, expose the roots of my passion and maybe shed some light on search technology and its applications.

Some people have said in the online world you have both the transactional search and the research search, which are not the same. They imply that these search objectives require different instruments or plumbing. I don’t think so. I think it is just a crutch vendors use to justify superficial search. Let’s look at an example transactional search, say, searching for a new car. There are so many places where you can carry out that transaction, being thorough and complete is not an issue. Here’s is a search vendor quiz:

Happiness is a ___________ search experience.

Besides searching for objects of information that we know but don’t have at hand, in cyberspace and on the web, we might search for a pizza place in a new destination. Many search for cheap air fares or computer or car parts, or deals on eBay, while others search for news, music, pictures and many other types of media and information. A few others search for knowledge and for explanation. Happiness in the universe of online search is definitely a satisfying search experience irrespective of what you are searching for.

Relevance is paramount to happiness and satisfaction whether searching for pizza in close proximity or doing research with online resources. Search vendors are delivering hit lists from their search engines, where users are expecting relevance and to be happy with the results. Satisfaction, in this sense, has turned out to be a tall order and nonetheless a necessary benefit of search technology that people still yearn for.

Let’s now turn to the state of confusion.

Carmen-Maria mentions that new search technology has to be backward compatible and she also complains that bad search technology is like the wheel that just keeps getting reinvented:

The wheel is being reinvented in a deplorable manner since search technology is deceptive in its manifestation. It appears simple from the outside, just a query and a hitlist, but that’s just the tip of the iceberg. In its execution, good search is quite complex, almost like rocket science.

… The wealth of knowledge gained by experts in various fields – from linguists to classifiers and catalogers, to indexers and information scientists – has been virtually swept off the radar screen in the algorithm-driven search frenzy.

The wheel is certainly being re-invented; that’s part of the business. I am uncertain what Carmen-Maria means by algorithm-driven search frenzy. Algorithms are the stuff of search technology. I believe that some of the problems with search stem from the use of algorithms that are made fast by being superficial, by cutting corners and by other artificial means. The cutting of corners begins with the statistical indexing algorithms or pre-coordination of text– so retrieval is consequently hobbled by weaknesses in the indexing algorithms. But algorithms are not the cause of the problem.

Old and incorrect assumptions are the real problem.

Modern state-of-the-art search technology (algorithms) invented in the 1960’s and 1970’s strip text of its dependence on human experience under something information science (IS) professionals justify as the independence assumption. Information retrieval (IR) professionals– those that design software methods for search engine algorithms– are driven by the independence assumption to treat each text as a bag of words without connection to other words in other texts or other words in the human consciousness.

I don’t think he was thinking about this assumption when Rich Skrenta wrote:

… – the idea that the current state-of-the-art in search is what we’ll all be using, essentially unchanged, in 5 or 10 years, is absurd to me.

Odds are that he intends to sweep a lot of knowledge out of the garage too, and I would place the same odds that any “new” algorithm Rick brings to the table will implicitly apply that old independence assumption too.

So this illustrates a kind of tug of war between modern experts in search technology and the knowledge of ages of experience. There is also a kind of frenzy or storm over so-called “new” technologies and just what constitutes “semantic” search technology. While some old natural language processing (NLP) technology has debuted on the online search scene, it has not brought any new search algorithms to light. They have only muddied the waters in my opinion. I have written about this in previous posts.

The underlying current is stirred up by imbalance existing in the (significant) history of search technology contrasted with the nascence of online search and other modern applications of search technology. Add to that disturbance the dichotomy exasperated by good (satisfying) and bad (deceptive) search results, multiplied by the number of search engine vendors, monopolistic or otherwise, and you have the conditions where compounding frenzy, absurdity and confusion, rather than relevance, reigns supreme.

I like to think my own view transcends this storm and sets an important development principle that I established when I produced the first concept search technology back in 1987. The subjects of the search may be different but the freedom to search for objects, for answers, or for theories or explanations of unknown phenomena is the right of inquiry.

This right of intellectual inquiry is as important and as basic as the freedom of speech. This is what ignites my passion for search technology. And I cannot stand to have my right of inquiry blocked, limited, biased, restricted, arrested or constrained, whether by others, or by unwarranted procedure (algorithm) or formality, or by mechanical devices.

I wear my passion on my sleeve and it frequently manifests as a rant against the “IT” leaders or so-called experts that Carmen-Maria wrote about:

Many consider themselves experts in this arena and think that information retrieval is this new thing that is being invented and that is being created from scratch. The debate often revolves around casual observations, remarks, and opinions come mostly from an “IT” perspective.

To be fair, not all those with “IT” perspectives are down with all this “new thing” in online search engines. Over at the Beyond Search blog, Stephen Arnold wrote about the problem with the thinking about search technology:

… fancy technology is neither new nor fancy. Google has some rocket science in its bakery. The flour and the yeast date from 1993. Most of the zippy “new” search systems are built on “algorithms”. Some of Autonomy reaches back to the 18th century. Other companies just recycle functions that appear in books of algorithms. What makes something “new” is putting pieces together in a delightful way. Fresh, yes. New, no.

I also think Stephen understands the history of search technology pretty well. He demonstrates this when he writes:

Software lags algorithms and hardware. With fast and cheap processors, some “old” algorithms can be used in the types of systems Ms. Hane identifies; for example, Hakia, Powerset, etc. Google is not inventing “new” things; Google is cleverly assembling bits and pieces that are often well known to college juniors taking a third year math class.

Like Carmen-Maria Hetera, Stephen Arnold sounds biased against algorithms, “old” algorithms in particular, though I don’t think he intended any bias, as many of the best algorithms we have are “old”. There are really not many “new” algorithms. Augmented, yes. Modified, Yes. New, no.

To be involved in IT and biased against algorithms is absurd as long as technology is the application of the scientific method and scientific search methods are understood as collections of investigative steps systematically combined into useful search procedures or algorithms. So there you have my definition of search technology.

The algorithms for most search technology are not rocket science and can be boiled down to simple procedures. At the very least there is an indexing algorithm and a search algorithm:

Pre-coordination per-text/document/record/field procedure:

  1. Computerize an original text by reading the entire text or chunks of it into computer memory.
  2. Parse the text into the smallest retrievable atomic components (usually patterns (trigrams, sentences, POS, noun-phrases, etc.) or keywords or a bag (alphabetical list) of infrequent words).
  3. Store the original text with a unique key and store the parsing results as alternate keys in an index.
  4. Repeat for each new text added to a database or collection.

Post-coordination per-query procedure:

  1. Read a string from input, parse the query into keys in the same way as a text.
  2. Search the index to the selected collection or database with the keys.
  3. Assemble (sort, rank) key hits into lists and display.
  4. Choose hit to effect retrieval of the original text.

These basic algorithms are fulfilled differently by different vendors but vendors do not generally bring new algorithms to the table. They bring their methods of fulfilling these algorithms; they may modify or augment regular methods employed in steps 2 and 3 of these procedures as Google does with link analysis.

In addition, vendors fold search technology into a search engine. Most online search engines– those integrated “software systems” or search appliances that process text, data and user-queries, are composed of the following components:

  1. A crawler for crawling URI’s or files on disk or both.
  2. An indexer that takes input from the crawler and recognize key patterns or words.
  3. A database to store crawler results and key indexing (parsing) results.
  4. A query language (usually SQL, Keyword-Boolean) to use the index and access keys in the database.
  5. An internet server and/or graphical user interface (GUI) components for getting queries from, and presenting results to, users.

Most search engine wizards, as they are called, are working on one or more of these software components of online search engines. You can look at what a representative sample of these so-called wizards have to say about most of these components at the ArnoldIT blog here. If you read through the articles, you won’t find one of them (and I have not read them all) that is working on new indexing methods or new mapping algorithms for mapping the meaning of the query to the universe of text, for example.

Many of the “new search engines,” popping up everywhere, are not rightly called new search technology even though they frequently bear the moniker. They are more rightly named new applications of search technology. But even vendors are confused and confusing about this. Let’s see what Riza Berkin of Hakia is saying in his most recent article where he writes:

But let’s not blind ourselves by the narrowness of algorithmic advances. If we look closely, the last decade has produced specialist search engines in health, law, finance, travel etc. More than that, search engines in different countries started to take over (like Naver, Baidu, Yandex, ect.)…

He had been writing that Search 1.0 began with Alta Vista (circa 1996) Search 2.0 is Google-like and Search 3.0 is semantic search “where the search algorithms will understand the query and text”. I guess all those search engines from Fulcrum, Lexis-Nexis, OpenText, Thunderstone, Verity, Westlaw, and search products from AskSam to Readware ConSearch to ZyIndex, were Search 0.0 or at leat P.B. …. You know like B.C. but Pre-Berkin.

And so this last paragraph (above) makes me think he is confusing search applications with search technology. His so-called specialists search engines are applications of search technology to the field or domain of law, to the field or domain of health, and so on.

Then he confuses me even more, when he writes about “conversational search”:

Make no mistake about it, a conversational search engine is not an avatar, although avatars represent the idea to some extent. Imagine virtual persons on the Web providing search assistance in chat rooms and on messengers in a humanly, conversational tone. Imagine more advanced forms of it combined with speech recognition systems, and finding yourself talking to a machine on the phone and actually enjoying the conversation! That is Search 2.0 to me.

Now I can sympathize with Riza because I used the phrase “conversational search” to describe the kind of conceptual search engine I was designing in 1986. I am not confused about that. I am confused that he calls that Search 2.0 when earlier– statistically augmenting the inverted index –was described as Search 2.0.

He doesn’t stop there. He continues describing Search 3.0 that “will be the ‘Thinking Search’ where search systems will start to solve problems by inferencing. ” Earlier he wrote that semantic search was Search 3.0. Semantics requires inferencing, so I began to reckon maybe thinking and semantics are equal in his mind, until he writes: “I do not fool myself with the idea that I will see that happening in my life time” — so now I am confused again. I think it is what vendors want; they want the public to remain confused about the semantics of search and what you get with it.

And that brings me to the semantics of search.

There are only two words that matter here: Thoroughness and Explanatory.

When I started tinkering with text processing, search and retrieval software in the early 1980’s, I was captivated by the promise of searching and reading texts on computers. The very first thing that I noticed about the semantics of search, before my imagination became involved in configuring computational search technology, was thoroughness. The word /search/ implies thoroughness if not completeness in its definition. Thoroughness is a part of the definition of search. Look at the definition of search for yourself.

You need only look at one or two hit lists from major search engines and you can see that is not what we get from commercial search engines, or from most search technology. Search is not a process that is completed by delivering some hints of where to look, but that is what it has been fashioned into by the technological leaders in the field. Millions of people have accepted it.

Yet, in our hearts we know that search must be complete and it must be explanatory to be satisfying; We must learn from it, and we expect to learn from conducting a search. Whether we are learning of the address to the nearest pizza place or we are learning how to install solar heating, it is not about computational power, it is about explanatory power. They forgot that words are part of the technique of communicating interpersonal meaning, let’s hope search vendors don’t forget that words have explanatory power too.

Tell me what you think.

Peter Mika recently wrote an article about the semantic web and NLP-style semantic search. I should just ignore his claim that there are only two roads to semantic search because he is plainly mistaken on that count. As Peter works for Yahoo, he was mainly discussing data processing with RDF and Yahoo’s Search Monkey. He obviously knows that subject well.

He constructed an example of how to use representational data (such as an address) according to semantic web standards and how to integrate the RDF triples with search results. His claim is that one cannot do “semantics” without some data manipulation and for that the data must be encoded with metadata; essentially data about the data. In this case, the metadata necessary to pick out and show the data at the keyword: address.

At the end of his article, Peter talks about the way going forward, and; in particular, about the need for fostering agreements around vocabularies. I suppose that he means to normalize the relationships between words by having publishers direct how words are to be used. He calls this a social process while calling on the community of publishers to play their role. Interesting.

About the time Peter was beginning his PhD candidacy, industry luminary John Sowa wrote in Ontology, Metadata and Semiotics that:

Ontologies contain categories, lexicons contain word senses, terminologies contain terms, directories contain addresses, catalogs contain part numbers, and databases contain numbers, character strings, and BLOBs (Binary Large OBjects). All these lists, hierarchies, and networks are tightly interconnected collections of signs. But the primary connections are not in the bits and bytes that encode the signs, but in the minds of the people who interpret them.

This is the case in the trivial example offered by Peter. The reason one is motivated to list an address in the search result of a search for Pizza is because it is relevant to people who are searching for a pizza place close to them. In his paper, John Sowa writes:

The goal of various metadata proposals is to make those mental connections explicit by tagging the data with more signs.

This is the essential nature of the use case and proposal offered by Yahoo with SearchMonkey. It seems a good idea, doesn’t it? Yahoo is giving developers the means to tag such data with more signs. Besides, it has people using Yahoo’s index, exposing Yahoo’s advertisers. Sowa cautions that:

The ultimate source of meaning is the physical world and the agents who use signs to represent entities in the world and their intentions concerning them.

Which resources do investigators or developers use to learn about agents and their intentions when using signs? The resource most developers turn to is language and they begin by defining the words of language in each context in which they appear.

Peter says it is common for IR systems to focus on words or grams and syntax. While some officials may object, though NLP systems such as Powerset, Hakia and Cognition use dictionaries and “knowledge bases” to obtain sense data, they each focus mainly on sentence syntax and (perhaps with the exception of Powerset) use keyword indexes for retrieval just like traditional IR systems.

Hakia gets keyword search results from Yahoo as a matter of fact. All of these folks treat words, and even sentences, as the smallest units of meaning of a text. Perhaps these are the most noticeable elements of a language that are capable of conveying a distinction in meaning though they certainly are not the only ones. There are other signs of meaning obtainable from textual discourse.

Believe it or not, the signs people use most regularly are known as phonemes. They are the least salient because we use them so often, and frequently they are also largely used subconsciously. Yet, we have found that these particular sounds are instantiations, or concrete signs, of the smallest elements of abstract thought– distinctive elements of meaning that are sewn and strung together to produce words and form sentences. When they take form in a written text they are also called morphemes.

Some folks may not remember that they learned to read words and texts by stringing phonemes together, sounding them out to evoke, apprehend and aggregate their abstract meanings. I mention this because if a more natural or organic semantic model were standardized, the text on the world wide web could become more tractable and internet use might become more efficient.

This would happen because we could rid ourselves of the clutter of so many levels of metalevel signs and the necessity of controlled vocabularies for parsing web pages, blogs and many kinds of unstructured texts. An unstructured text is any free flowing textual discourse that cannot easily be organized in the field or record structure of a database. Neither is it advantageous to annotate the entirety of unstructured text with metalevel signs. Because as John Sowa wrote:

Those metalevel signs themselves have further interconnections, which can be tagged with metametalevel signs. But meaningless data cannot acquire meaning by being tagged with meaningless metadata.

So now it begs the question of whether or not words and their definitions are just meaningless signs to begin with. The common view of words—as signs— is that they are arbitrarily assigned to objects. I am unsure whether linguists could reach consensus that the sounds of words evoke meaning, as it seems many believe that a horse could have been called an egg without any consequence to its meaning or use in a conversation.

Within the computer industry it becomes even more black and white: A word is used to reference objects by way of general agreement or convention, where the objects are things and entities existing in the world. Some linguists and most philosophers recognize abstract objects as existing in the world as well. Though this has not changed the conventional view that is a kind of defacto standard among search software vendors today.

This view implies that the meaning of a word or phrase -its interpretation- adheres only to conventional and explicit agreements on definitions. The trouble is that it overlooks or ignores the fact that meaning is independently processed and generated (implicitly) in each individual (agents) mind. This is generally very little trouble if the context is narrow and well-defined as in most database and trivial semantic web applications on the scene now.

The problems begin to multiply exponentially when the computer application is purported to be a broker of information (like any search engine) where there is a verbal interchange of typically human ideas in query and text form. This is partly why there is confusion about meaning and about search engine relevance. Relevance is explicit, in as much as you know it when you see it, otherwise, relevance is an implicit matter.

Implicit are the dynamic processes by which information is recognized, organized, acted on, used, changed, etc. The implicit processes in cognitive space are those required to recognize, store and recall information. Normally functioning, rational, though implicit and abstract thought processes organize information so we that may begin to understand it.

It is obvious that there are several methods and techniques of organizing, storing and retrieving information in cyberspace as well. While there are IR processes running both in cyberspace and in cognitive space, it is not the same abstract space and the processes are not at all the same. In cyberspace and in particular in the semantic web, only certain forms of logical deduction have been implemented.

Cognitive processes for organizing information induce the harmonious and coherent integration of perceptions and knowledge with experience, desires, the physical self, and so on. Computational processes typically organize data by adding structure that arranges the information in desired patterns.

Neither the semantic web standards, nor microformats, nor NLP, seek the harmony or coherence of knowledge. Oh, yes, they talk about knowledge and about semantics yet what they deliver are little more than directives; suitable only for data manipulation in well-understood and isolated contexts.

Neither NLP nor semantic web meta data or tools presently have sufficient faculty for abstracting the knowledge that dynamically integrates sense data or external information with the conditions of human experience. The so-called semantic search officials start with names and addresses because these data have conventionally assigned roles that are rather regular.

When it comes down to it, not many words have such regular and conventional interpretations. It would actually be quite alright if we were just talking about a simple database application, but proponents of the semantic web want to incorporate everything into one giant database and controlled vocabulary. Impossible!

While it appears not to be recognized, it should be apparent that adherence to convention is a necessary yet insufficient condition to hold relevant meaning. An interpretation must cohere with its representation and its existence (as an entity or agent in the world) in order to hold. Consider the case of Iraq and weapons of mass destruction. Adhere, cohere, what’s the difference –it’s just semantics– right? Nonetheless, neither harmony nor coherence can be achieved by directive.

A consequence of the conventional view is that such fully and clearly defined directives leave no room for interpretation even though some strive for under specification. The concepts and ideas being represented can not be questioned; because, being explicit directives, they go without question. This is why I believe the common view of words and meaning that many linguists, computer and information experts, like Peter, hold, is mistaken.

If the conventional view were correct, the interpretation of words would neither generate meaning nor provide grounds for creating new concepts and ideas. If it were truly the case, as my friend Tom Adi said, natural language semantics would degenerate into taking an inventory of people’s choices regarding the use of vocabulary.

So, I do not subscribe to the common view. And these are the reasons that I debate semantic technologies even though end-users could probably care less about the techniques being deployed. Because if we are not careful we will end up learning and acting by directive too. That is not the route I would take to semantic search. How about you?

In looking at the comments of the last post The Search for Semantic Search, I see there appears to be some interesting interpretations. Let me explain my motives, address any perceived bias and clarify my position.

Alex Iskold wrote about semantic search that we were asking the wrong questions; that it was essentially the root of the problem with semantic search engines, and; they were only capable of handling a narrow range of questions– those requiring inference. Among other things, he also wrote that his question about vocation was unsolvable; impossible, was the term he used. These ideas and the fact that Alex implied Google was a semantic search engine, and inferred that vendors must dethrone Google to be successful, motivated me to blog about it myself.

I was criticized, in the comments, for implying that the so-called “semantic search” capability of these NLP-driven search engines is weak, and due to this they do not really qualify as “semantic search” engines. Actually Kathleen Dahlgren introduced a new name in her comments: “Semantic NLP”. I was also criticized for asking a silly question and for posting my brief analysis of this one single question that Alex said was unsolvable without massively parallel computers.

Of course you cannot judge a search engine by the way it mishandles one or even a few queries. But in this case one natural language question reveals a lot about the semantic acuity of NLP, and the multiple query idea is a kind of strawman argument intended to distract us. It almost proves Alex right as it is misleading.

I do not believe people are motivated to ask wrong questions and I do not believe people ask silly questions to computer search engines while expecting a satisfactory set of results or answers. Nevertheless, when any case fails, the problem or fault does not lie with people. The search engine is supposed to inform them. The fault lies with the computer software for failing to inform. You can try and dismiss it with a lot of hand waving but just like that pesky deer fly — it is going to keep coming back.

While NLP front ends and semantic search engines are the result of millions of dollars in funding and the long work of brainiacs, and while they may be capable enough to parse sentences, build an index or other representation of terms and use some rules of grammar, they are not always accommodating or satisfying. In fact they can be quite brittle or breakable. This means they do not always work. But they do work under the right circumstances in narrowly defined situations. One of the questions here is whether they work well enough to qualify them as “semantic search” engines for English language questions.

Any vendor who comes out in public and claims they are doing “semantic search” should prove it by inferring the significance of the input with sufficient quality and acuity such that the result, or search solution or conclusion, satisfies the evidence and circumstance. This a minimum level of performance. There are tests for this. Many people use a relevance judgment as a measure of that satisfaction as far as any type of search and retrieval method or software is concerned.

With that said, my last post was about debunking the so-called complex query myth not about “testing” the capabilities of any search engine. It was about semantic search and how any search engine solves this single so-called impossible question. There were results, and they were not completely “useless” as I see, on review, that I wrote. I apologize for calling them useless.

Both Cognition and Powerset produced relevant results (with one word) that were more comprehensive than the results Google provides, in my opinion. That is not a natural language process of understanding a sentence though. Having a capacity to look up a word in a dictionary is not the same as the capacity to referentially or inferentially use the concept. In this case, to make some judgments (distinguish the significant relationships, at least) and inform the search process.

This capability to distinguish significant relationships is a key criteria of “semantic” search engines — meaning they should have a capacity to infer something significant from the input and use it. The results of this query tell a different story. You cannot just profess linguistic knowledge, call the question silly and make the reality it represents go away. This kind of problem is systemic.

As far as the so-called “semantic” search engines inferring anything siginificant from this (full sentence case) question (evidence) or circumstance of searching, I treated all the results with equal disaffirmation. What is more; I stand by that as it is supported on its face. If you look at the results of the full sentence case query at Cognition, you will notice that they are essentially the same as those from Powerset.

I reckon this could be because both engines map the parts of speech and terms from the query onto the already prepared terms and sentences from Wikipedia. This “mapping strategy” clearly fails –in this case– for some pretty obvious reasons. Without pointing out all the evidence I collected, I summed those reasons up as a lack of semantic acuity. That seems to have touched a nerve.

So I will get into the details of this below. Let me first take a moment to address the fact that one inquiry reveals all this information. Really it is not just one inquiry. It is one conceptualization. Dozens of questions can be derived by mapping from the concepts associated to these terms of this single question. For example: Where are the best careers today?; Who has the better jobs this year?; Where can I work best for now?; What occupation should I choose given the times?; etc. I tried them all and more with varying degrees of success.

One problem is that NLP practitioners are concerned with sentence structure and search engineers are concerned with indexing terms and term patterns. Either way, the methods lack a conceptual dimension and there is no apparent form of any semantic space for solving the problem. The engines have no sense of time points or coordinated space or other real contexts in which things take place. The absence of semantic acuity is not something that only affects a single inquiry. It will infect many inquiries just as a disease infects its host.

Now that I recognize the problem, if I were challenged to a wager, I would wager that I could easily produce 101 regular English language questions that would demonstrate this affliction. The search engines may produce a solution, except that the results would be mostly nonsense and not satisfying. It would prove nothing more and nothing less than I have already stated. What say you Semantic Cop?

I should mention that I have long suspected that there was a problem mapping NLP onto a search process and I could not put my finger on it. A literature search on evaluations of text retrieval methods will show, in fact, that the value of part of speech processing (in text search and retrieval) has long been regarded as unproven. By taking the time to investigate Alex Iskold’s complex query theory I gained more insight into the nature and extent of this problem. It is not just a problem of finding a definition or synonyms for a given term as some reader’s may infer. Let me explain further.

While Powerset, Cognition and Hakia each had the information that a vocation was a kind of altruistic occupation, and the search circumstance (a hint) that the information seeker could be looking for an occupational specialty or career, they did not really utilize that information. The failure, though, really wasn’t with their understanding of the terms occupation or vocation. Their failure was specifically related to the NLP approach to the search process. That is supported by the fact that these different search products employing NLP fail in the same way.

That should not be taken to mean that the products are bad or useless. Quite to the contrary, the product implementations are really first class productions and they appear to improve the user experience as they introduce new interface techniques. I think NLP technology will continue to improve and will eventually be very useful, particularly at the interface as Alex noted in his post. But does that make them semantic search engines?

Lest I have been ambiguous, let me sum up and clarify by referring back to the original question: Whether you are looking at Powerset, Cognition or Hakia results, they clearly did not understand the subordinate functionality of the terms /best/, /vocation/, /me/ and /now/ in the sentence.

They clearly could not conceptualize ‘best vocation’ or ‘now’– they could only search for those keyword patterns in the index or data structures created from the original sentences. That is not just ‘weak’ semantics that is not semantic search at all. Maybe they “understood” the parts of speech but they did not infer the topic of inquiry nor did they properly map the terms into the search space. Google did not fare any better in this case, but Google does not claim to be a semantic search engine. So where’s the semantics?

By that I mean (for example) that interpreting /now/ from the natural language question ‘what is the best vocation for me now’ as an adverb, does not improve the search result. Treating it as a keyword or arbitrary pattern does not improve the search result. And it demonstrates a clear and present lack of acuity and understanding of the circumstance.

Finding the wrong sense of /now/ and showing it is of dubious value. An inference from /now/ to ( –> ) today, at present, this moment in time, or to this year or age, and using that as evidence leading to an informed conclusion would demonstrate some semantic acuity in this case. Most people have this acumen, these NLP search engines obviously do not–according to the evidence in this case.

The NLP vendors defend this defect by accusing people of not asking the right question in the right way or not asking enough questions. That is like me saying to my wife:

If you want some satisfying information from me you better use a lot of words in your question and it better not be silly. Don’t be too precise and confuse me and don’t use an idiom and expect me to satisfy you. I’ll still claim to understand you. It is you that asks silly questions. That not being enough, you also have to nag me with more long and hard questions before you say my responses are rubbish.

If I should either desire or dare to do that at all, what do you think her response would be? More importantly what do prospects say when you tell them their questions are silly?

I do not need to proceed with a hundred questions when with a dozen or so I have enough evidence to deduce that these NLP-driven search engines are limited when it comes to ** inferring the topic of inquiry **. In some cases they are simply unable to draw on the *significant structures or patterns of input, evidence or circumstance” and produce a suitable solution.

What bothers me is that some of these so-called “semantic search engines” claim to “understand” in general. I did that too, a very long time ago. Yes, I was there in the back of the room at DARPA and NIST meetings and I have been at PARC and the CSLI for presentations. I was challenged then. And it enlightened me. If such claims go unchallenged it will only serve to demean the cultural arts and independent critical thinking and confuse prospects about the capabilities regular people expect of semantic products. I do not wish to lower the bar.

In this instance, and there are many similar cases that could be derived from the semiotic characteristics of this instance, the NLP-driven engines do not show the slightest acumen for inferring the topic of inquiry. I hope the discerning reader sees that it is not just about some synonyms. If they could infer the topic of inquiry, that would demonstrate a little understanding… at least a capacity to learn.

The result, in all such cases, is that these so called “reasoning-engines” and semantic search engines do not lead us to a satisfying consequence or conclusion at all. They have technical explanations such as synonymy and hyponymy for any word, yet, if the software cannot infer the sense of everyday terms, is it even sensible to call the methods “semantic”? Just because the vendors profess linguistic knowledge does not mean their their semantics are any more than just another marketing neologism.

It may be called semantic NLP but that does not qualify as semantic search in my opinion.

The Search for Semantic Search

In a recent Read Write Web article that was much more myth than reality, Alex Iskold posits the fact that a semantic search engine must dethrone Google (myth1). Fortunately by the end of his article he concludes that he was mislead into thinking that. I do not think he was misled at all. I just think he is confused about it all.

He posits a few trivial queries (myth 2) to show that there is “not much difference” between Powerset and Hakia and Freebase (myth 3). And that semantic search “is no better than Google search ” (myth 4). After that Alex writes that there is a set of problems suitable to semantic search. He says these problems are wonderfully solved (already) by relational databases (myth 5).

It makes one wonder why we should mess with semantic search if the problems are already solved. It is not true. That is why. Neither was any of the talk about query complexity, true.

It is not all these myths, exactly, but unclear thinking that leads to false expectations as well as false conclusions. Alex seems to be confused about the semantic web and semantic search. These are two different things but somehow Alex morphs them into one big database. Because I do want this post to impart some information instead just being critical of a poorly informed post, let me start by debunking the myths.

Myth1: Semantic Search should dethrone Google

For many search problems, semantics plays no role at all. Semantics plays a very limited role when the query is of a transactional nature, e.g., a search problem of the type: find all x.

Google is a search engine that solves search problems of this type. Yet the Google kingdom is based on being a publisher. Google uses super-fast and superficial keyword search to aggregate dynamic content from the internet for information seekers and advertisers alike. Google’s business does not even lend itself to semantic search for some very obvious reasons having to do with speed and scalability. Google’s best customers know exactly what they want and they certainly do not want any “intellectual” interpretations.

None of the so-called semantic search engine companies, that I know of, are pursuing a business strategy to dethrone Google as an information-seeker’s destination of choice. Powerset, for example, is not aggregating dynamic content like Google. It’s business model does not seem to be based on a publishing or advertising model.

Powerset is using their understanding of semantics to assist the user (of wikipedia) in relating to that relatively static content, from several different mental or rational and conceptual perspectives. This is meant to assist the information-seeker with interpreting the content. That is a good and valid application of semantics.

This is not the position a company seeking to unseat Google would take. A company seeking to unseat Google would be better positioned by producing technology to assist advertisers in classifying, segmenting and targeting buyers.

Myth 2: Trivial and Complex Queries

Unfortunately Alex did not supply any complex examples in his post. He tried to imply that his trivial queries were complex and the most complex was impossible to solve. This query was the one labeled impossible: “What’s the best vocation for me now?” I will use Alex’s query to debunk his misguiding assumptions. First, let’s clarify by looking at the search problems represented by the Alex’s natural language queries.

Note 1: Alex offers the first query as impossible to solve. It must be because Alex is expecting a machine and some software to divine his calling based presumably on his mood now and some mind-reading algorithm. I should hope most people would seek a human counselor rather than rely on the consul of a semantic search engine for addressing their calling. It is fair to use a search engine to find a career or occupation and it is valid to expect a semantic search engine to “understand” the equivalence relationship between the terms occupation and vocation, in this context.

As I suggested best + vocation, or just vocation alone is a simple solution that should be easy to satisfy. However, this simple search solution fails on all search engines. Even so-called semantic search engines have a problem with this query (see comparative search results under myth 4 below). It is not because it is complex query. It is because Alex used the word vocation. This word is not frequent and search engines do not know its synonyms. This is a complex concept as it takes semantic acuity to “understand” it. No one talks about semantics in terms of acuity though.

Nonetheless, a search for vocation + best, and sorting the results by most recent date, will however, create a valid search context in which one can reasonably expect a solution from their semantic search engine. Most people, I am assuming, would have a more reasonable expectation than Alex; one that may be fulfilled by this internet page suggested by Readware:

A semantic search engine needs semantic acuity to “understand” that the concept of a vocation and the concept of an occupation are related. Obviously none of the search engines mentioned in Alex’s article have such acuity of understanding. Some of the search engines tried to process the pronoun me and the word now. Instead of being a solution, it created a problem as can be seen in the search results (under myth 4) below.

Note 2: This query needs a search engine with some more exotic search operators than a simple keyword search engine might provide. The query, however, is not complex. Some search engine may index US Senator as a single item to facilitate such a search. A search engine would need extended Boolean logic to process phrases using a logical AND between them. A more seasoned search engine, such as Google, would parse and logically process the query from a search box, without any specifying logic, and return an acceptable result. NLP-based engines (like Hakia and Powerset) try to do this too. They use propositional logic instead of Boolean logic. The effects are not very satisfying as can be seen below (in the search results listed under myth 4).

A more sophisticated and indeed “semantic” search engine may interpret foreign entity according to a list of “foreign entities”. It would take some sophisticated semantics to algorithmically interpret what type of labels may be subordinate to foreign entity. For example: A German businessman, a Russian bureaucrat, a Japanese citizen, an American taxpayer. Which is the foreign entity?

Yet, it is also clear that an inventory of labels can be assembled under certain contexts. Building such an inventory constitutes the building of knowledge. A semantic search engine should help inventory and utilize knowledge for future researches. None of the semantic search engines that Alex mentioned do anything like this. Readware does do this.

Note 3: This search would benefit from a search engine that recognizes names. I think Hakia has an onomastic component. I am not sure about Powerset. However, this search works on nearly any search engine because their are plenty of pages on the web that contain the necessary names and terms. Otherwise there is nothing complex about this query.

The reality, as you can see, is that every query Alex offered is trivial. Yet it demonstrates what is wrong with so-called “semantic search”. That is, today’s semantic search products, including the NLP -powered search engines masquerading as “semantic” search, fail at real tests of semantic acuity. Before I get into the evidence though. Let me just say something about semantic search technologies in general.

Semantic Search Technologies

There are no public semantic search engines today. There are search engines and there are search engines with Natural Language Processors (NLP) that work on the indexing and query side of the search equation. Whether or not databases are used to store processed indexes or search results, databases and database technology like RDBMS and SQL have nothing to do with it.

The search engines that have the capacity for natural language processing usually claim to “understand” word and/or sentence semantics– in a linguistic sense. This usually means that they understand words by their parts of speech, or they can look up definitions in a resource. Hakia and Powerset fall into this class, as does Cognition and several other search engines both in the U.S. and abroad. These are called semantic search engines and they claim to understand word sense and do disambiguation and so forth and so on, but as I will show below: at questionable acuity.

Google is not a semantic search engine at all. While Hakia and Powerset may represent some small part of the spectrum of semantic search engines they are hardly representative of semantic search. Along with Freebase and Powerset, more representative of “semantic web” search is SWSE, Swoogle and Shoe.

Besides these semantic web search engines, there are semantic search engines akin to Hakia, such as Cognition, as mentioned in this article at GigaOM, along my own favorite Readware. So, in summary, Alex’s comparison is not representative and is really poor evidence.

Myth 3: No difference between Powerset and Hakia and Freebase.

Well this is just ridiculous. It is not only a myth, it is pure misinformation. Nothing could be further from the truth. While Powerset and Hakia use NLP technology that could be construed as similar, Freebase is a essentially an open database that can be queried in flexible ways. Freebase and Powerset happen to be somewhat comparable because Powerset works on Wikipedia and uses RDF to store data, and semantic triples (similar to POS) to perform some reasoning over the data. Freebase also stores Wikipedia-like data in RDF tuples.

It is probably also worthwhile to mention that Hakia’s NLP comes from the long time work and mind of the eminent professor Victor Raskin and his students. Powerset’s NLP comes from the work of Ronald Kaplan, Martin Kay and others connected with Palo Alto Research Center, Stanford University and the Center for the Study of Language and Information (CSLI). Cognition’s technology is based on NLP work done by Kathleen Dahlgen.

While Hakia, Powerset and Cognition represent these notable NLP approaches, their search methods and results show they do not know a great deal about search tactics and solutions. They do not seem to be successful in mapping the sentence semantics into more relevant and satisfying search results. It seems, from the evidence of these queries, they only know how to parse a sentence for its subject, object and verb and, a lot like Google, find keywords.

Myth 4: Semantic Search is No Better than Google.

Hakia and Powerset are like neophytes in Google’s world of search. That alone makes these engines no better than Google. Yet, that does not apply to semantic search in general. The truth is that the semantics of the search engines we are talking about (Hakia, Powerset, Freebase and Search Monkey), do not appear to make the results any worse than those from Google. Let’s take a look at the Google search results for ‘What is the best vocation for me now’:

As may be predicted, the results are not very good (because the keyword vocation is not popular). Google also wants to be sure we do not mean ‘vacation’ instead of vocation. Hakia , on the other hand , strictly interpreted the query:

Just like the results from Google, these are not very satisfying. You might think that because Hakia is a semantic search engine, it would have the semantic acuity to “understand” that vocation and occupation are related. As you can see in the following search result, this could not be farther from the truth:

Not one of Hakia’s results had to do with occupational specialties or opportunities for career training and employment. Powerset did not produce any results when the term vocation is used and it really had nothing on occupation so it searched for best + me + now. There is nothing semantic about that and it is a pretty bad search decision as well. The results are useless; I will post them so you can judge for yourself:

When you have results like this, it really does not matter what kind of user interface you have to use. If it is a bad or poor user interface, it only makes the experience that much worse. Even if it is a good, fancy, slick or useful interface, it won’t improve irrelevant results.

Another so-called semantic search engine, Cognition, did not fare any better:

This above search result is useless provides a starting point for further investigation, as is does the search for occupation:

I actually was mildly surprised that Cognition related the term occupation to the acronym MOS, which means Military Occupational Specialty. Then I saw that they did not distinguish the acronym from other uses of the three letter keyword combination. Again not a very satisfying experience. I did not leave Freebase out, I just left them until last. All Freebase results do is confirm that vocation is an occupation or a career:

It was not possible for freebase to accept and process the full query. As this result shows, the data indicates that a vocation is also known as an occupation but none of these engines realize that fact.

Myth 5: Already solved by RDBMS.

If the search problem or the “semantic” problem could be solved by the RDBMS, Oracle would be ten times the size of Google and Google might be using Oracle’s technology if it existed at all. None of these problems (aggregated search, semantic parsing of the query and text, attention, relevance) are solved by any RDBMS. But Alex brushed over the real problems to make the claim that it is all up to the User Interface and the semantics only matter there. I suppose that was the point he was trying to make by including Search Monkey in his comments. This is just hog wash though. By that I mean that it is not true and it is in fact misleading.

Conclusions

It is plain to see that a semantic search engine needs acuity to discern the differences and potential relations that can form between existing terms, names and acronyms. It is also plain to see that none of the commercial crop of search engines have it. The Natural Language search engines, which have dictionaries as resources, do not associate vocation to occupation (for example) and therefore cannot offer any satisfying search results.

There are 350,000 words in the English language. How many do you suppose are synonymous and present a case just like this example? Parsing a sentence for its subject, object and verb, is fine. It does not mean it will be useful or helpful in producing satisfying search results.

It is foolish to think that NLP will be all that is needed to obtain more relevant search results. The fact is that search logic and search tactics are arts that are are largely unrelated to linguistics or NLP or database technology. While language has semantics, testing of the semantics of so-called semantic search engines has demonstrated that the semantics, if they are semantics, are pretty weak. I have demonstrated that semantic acuity plays a large role in producing more relevant and satisfying search results. A semantic search engine should also help inventory and utilize knowledge for future researches. An informed search produces more satisfying results.

I would like to address the few questions I received on the three parts 1,2 and 3 of the semantics of interpersonal relations. The first and most obvious questions was:

I don’t get it. What are the semantics?

This question is about the actual semantic rules that I did not state fully or formally in any of the three parts. I only referred to Dr. Adi’s semantic theory and related how the elements and relations of language (sounds and signs) correspond with natural and interpersonal elements and relations relevant to an embodied human being.

Alright, so a correspondence can be understood as an agreement or similarity and as a mathematical and conceptual mapping (a mapping on inner thoughts). What we have here, essentially, is a conceptual mapping. Language apparently maps to thought and action and vice-versa. So the idea here is to understand the semantic mechanism underlying these mappings and implement and apply it in computer automations.

Our semantic objects and rules are not like those of NLP or AI or OWL or those defined by the semantic web. These semantic elements do not derive from the parts of speech of a language and the semantic rules are not taken from propositional logic. And so that these semantic rules will make more sense, let me first better define the conceptual space where these semantic rules operate.

Conceptually, this can be imagined as a kind of intersubjective space. It is a space encompassing interpersonal relationships and personal and social interactions. This space constitutes a substantial part of what might be called our “semantic space” where life lived, what the Germans call Erlebnis, and ordinary perception and interpretation (Erfahrung) intersect, and where actions in our self-embodied proximity move us to intuit and ascribe meaning.

Here in this place is the intersection where intention and sensation collide, where sensibilities provoke the imagination and thought begets action. It is where ideas are conceived. This is where language finds expression. It is where we formulate plans and proposals, build multidimensional models and run simulations. It is the semantic space where things become mutually intelligible. Unfortunately, natural language research and developments of “semantic search” and the “Semantic-Web” do not address this semantic space or any underlying mechanisms at all.

In general when someone talks about “semantics” in the computer industry, they are talking either about English grammar, rdf-triples in general or they are talking about propositional logic in a natural or artificial language, e.g., a data definition language, web services language, description logic, Aristotelian logic, etc. There is something linguists call semantics though the rules are mainly syntactic rules that have limited interpretative and predictive value. Those rules are usually applied objectively, to objectively defined objects, according to objectively approved vocabulary defined by objectively-minded people. Of course, it is no better to subjectively define things. Yet, there is no need to remain in a quandary over what to do about this.

We do not live in an completely objective, observable or knowable reality, or a me-centric or I-centric society, it is a we-centric society. The interpersonal and social experience that every person develops from birth is intersubjective – each of us experience the we-centric reality of ourselves and others entirely through our own selves and our entirely personal world view.

Perhaps it is because we do not know and cannot know– through first-hand experience at least– what any others know, or are presently thinking, that there is this sort of dichotomy that sets in between ourselves and others. This dichotomy is pervasive and even takes control of some lives. In any case, conceptually, there is a continuum between the state of self-realization and the alterity of others. This is what I am calling the continuum of intersubjective space.

A continuum of course, is a space that can only be divided arbitrarily. Each culture has their own language for dividing this space. Each subculture in a society have their own language for dividing this space. Every technical field has their own language for dividing the space. And it follows, of course, that each person has their own language, not only for dividing this space, but for interacting within the boundaries of this space. The continuum, though, remains untouched and unchanged by interactions or exchanges in storied or present acts.

The semantics we have adopted for this intersubjective space include precedence rules formulated by Tom Adi. Adi’s semiotic axioms govern the abstract objects and interprocess control structures operating in this space. Cognitively, this can be seen as a sort of combination functional mechanism, used not only for imagining or visualizing, but also for simulating the actions of others. I might add that while most people can call on and use this cognitive faculty at will, its use is not usually a deliberate act; it is mainly used subconsciously and self-reflexively.

We can say that the quality of these semantics determine the fidelity of the sound, visualization, imitation or simulation to the real thing. So when we access and use these semantics in computer software as we do with Readware technology, we are accessing a measure of the fidelity between two or more objects (among other features) . This may sound simplistic though it is a basic level cognitive faculty. Consider how we learn through imitation. Note to self: Don’t leave out the cognitive load to switch roles and consider how easily we can take the opposite or other position on almost any matter.

We all must admit, after careful introspection, that we are able to “decode” the witnessed behavior of others without the need to exert any conscious cognitive effort of the sort required for describing or expressing the features of such behavior using language, for example. It may be only because we must translate sensory information into sets of mutually intelligible and meaningful representations in order to use language to ascribe intentions, order or beliefs, to self or others, that the functional mechanism must also share an interface with language. It may also be because language affords people a modicum of command and control over their environment.

Consider the necessity of situational control in the face of large, complex and often unsolvable problems. I do not know about you, but I need situational control in my environment and I must often fight to retain it in the face of seemingly insurmountable problems and daily ordeals.

Now try and recognize how the functional aspects of writing systems fill a semiotic role in this regard. Our theoretical claim is that these mutually intelligible signs instantiate discrete abstract clusters of multidimensional concepts relative to the control and contextualizing of situated intersubjective processes.

Like the particles and waves of quantum mechanics are to physics, these discrete intersubjective objects and processes are the weft and the warp of the weaving of the literary arts and anthropological sciences on the loom of human culture. We exploited this functional mechanism in the indexing, concept-analysis, search and retrieval software we call Readware.

We derived a set of precedence rules that determine interprocess control structures and gave us root interpretation mappings. These mappings were applied to the word roots of an ancient language that were selected because modern words derived from these word roots are used today. These few thousand root interpretations (formulas) were organized into a library of concepts, a ConceptBase, used for mapping expressions in the same language and from different languages. It was a very successful approach for which we designed a pair of ReST-type servers with an API to access all the functionality.

To make this multi-part presentation more complete, I have posted a page with several tables drawn up by Tom Adi, along with the formal theory and axioms. There are no proofs here as they were published elsewhere by Dr. Adi. These tables and axioms identify all the key abstract objects, the concepts and their interrelationships. Tom describes the mappings from the base set (sounds) and the axioms that pertain to compositions and word-root interpretations, together with the semantic rules determining inheritance and precedence within the control structures. You can find that page here.

And that brings me to the next question, which was: How can you map concepts between languages with centuries of language change and arbitrary signs? The short answer is that we don’t. We map the elements of language to and from the elements of what we believe to be are interlinked thought processes that form mirror-like abstract and conceptual images (snapshots) of perceptive and sensory interactions in a situated intersubjective space.

That is to say that there is a natural correspondence between what is actually happening in an arbitrary situation and the generative yet arbitrary language about that situation. This brings me to the last question that I consider relevant no matter how flippant it may appear to be:

So what?

The benefits of a shared semantic space should not be underestimated. Particularly in this medium of computing where scaling of computing resources and applications is necessary.

Establishing identity relations is important because it affords the self-capacity to better predict the consequences of the ongoing and future behavior of others. In social settings, the attribution of identity status to other individuals automatically contextualizes their behavior. By contextualizing content, for example, knowing that others are acting as we would effectively reduces the cognitive complexity and the amount of information we have to process.

It is the same sort of thing in automated text processing and computerized content discovery processes. By contextualizing content in this way (e.g, with Readware) we dramatically and effectively reduce the amount of information we must process from text, to more directly access and cluster relevant topical and conceptual structure, and to support further discovery processes. We have found that a side-effect to this kind of automated text-analysis is that it clarifies data sources by catching unnatural patterns (e.g., auto-generated spam) and it also helps identify duplication and error in data feeds and collections.

For all those just joining me on this multi-part post, in this part I will write about how we derived computational objects by abstracting them from the significant or semantic properties of being a human in this world.

I will also introduce you to the notion that the sounds of natural language indicate and interpret compound objects representing all possible actions or (human) processes, and the boundary and engagement conditions of personal action and interaction. Besides that, throughout this piece, I will also note where this sort of theoretical model fits and compares with modern computer technologies. So that I can give all these aspects proper consideration, this part will be slightly longer than the previous two parts.

We want to model interpersonal relations on computers because computer software does not relate to the environment of people or the world that we inhabit very well if at all. Now that the search engine is a common appliance, more and more people are realizing that despite decades of costly research, computers are nearly clueless of the implications of words and common expressions.

It goes beyond search engines to computational semiotics. A lot of people have trouble interfacing with computers. I believe this is because there is no framework for computers to understand people. People have empathy towards each other. What do computers have? Maybe if the computer were more understanding, we would have better computer interfaces.

If computers are ever going to “understand” people better it makes sense to start by identifying with people and their environment. Understanding something or someone always begins by identifying with the situation. Most computer software does not do that; data is generally a dry and often drastic reduction of the situation. Natural language, for instance, is reduced to its terms and their parts of speech.

Term vectors in text processing systems are statistics of the occurrence of terms in documents and in collections of documents. They are a representation of text that the computer simply and blindly records and accounts for without any consideration for any other relationships beyond the needs of the application. This is not very intelligent. The term “computational semiotics” intuitively suggests that we use a science of meaning to create useful and intelligent computer programs.

One place computers are trying to achieve more intelligent processing is in text and document analysis. Here, in this and related fields of Information Retrieval (IR), a “better understanding” is measured according to recall, precision and relevance. Partly due to the growth of web pages on the Internet, a major focus of advance systems development is on Natural Language Processing (NLP) and text search and retrieval, or the search engine.

In modern indexing and computer classification systems, terms are extracted and term vectors are considered using local information from a text (its terms and sometimes its structure) and “global” information from collections of texts or documents. Considering that most texts are about interrelationships between people and things or principles in the world, it seems modern computer systems may be missing something.

Consider the interrelationships of any text to everyday affairs. Consider Aesop’s fables, and then consider how most of the text on the Internet is, at its root, about the interrelationships between people and their environment. This is the problem with term vector schemes. The signs are not interrelated to everyday affairs in the world and this is why there are limits to these computer methods and systems.

If I know what to look for on the Internet, I can look up documents, articles and references, about any person or event. Yet even though I am an expert, the experience is not always satisfying. I can ask my wife to watch out for print articles and news that will interest me as she is an avid reader. I don’t have to specify the structural relationship or the Boolean logic or look up any vocabulary for her to use. Obviously computer software cannot do the kind of intelligent processing that my wife can. This is why I am producing Readware: a semantic framework for software engineering of text analysis, classification, search and retrieval applications.

Readware can identify the interrelationships in a text using a call to its API:

rwAnalyze Aesops fable

In the image above, you can see the input string on the left and you can see that the output is a ranked listing of the interrelated topics. The top five categories and topics characterize the categorical implications of this text. The topics are representative of everyday affairs. Readware categories and topics can be used as filing, filtering and routing options in a computer application. Some of them repeat because the topics come from different perspectives.

My company has theory and logic for interrelating expressions on web pages and in messages. We developed our theory and logic into a platform with a well-tested API for use on various sorts of content. We developed our computational solution at about the same time people were thinking of WordNet and the Internet and before the Semantic Web.

Theory is harder to adopt than standards in the modern engineering communities of software developers and computer programmers. The Semantic Web has enjoyed tremendous support. Their philosophy also follows in the traditional practices of AI where you identify all the special properties and relations of everything in your content. The only difference with the Semantic Web is that those folks want you to use web formats and standards for your data that put it in a form that is provable using the various implements and standards endorsed by the W3C (OWL, RDF, XML, etc.).

So that I am clear, there is no theory of semantics, interpersonal, or otherwise, in the confines of the Semantic Web. Nowhere in any of the published standards of the W3C will you find any hint of the properties and relations of interpersonal semantics. Most AI theory has been focused on NLP methods from mainstream linguistics along with functional grammars that have been around since the 1960’s. Not that it is bad, they have been at it a little longer than Tom Adi and myself. I just mean that none of these well-funded systems have enjoyed ultimate success; otherwise this topic would be moot.

Google and several other companies have risen to prominence using search engine technology that indexes the full text of documents, articles and web pages. They showed that you don’t need expensive language processing and AI methods. Google supplemented the text with a quickly calculated popularity measure. This measure made it possible for Google’s search engine to provide the most popular links in their results. However, if your query is the least bit ambiguous it will not matter how popular the sites in the results are because they will be more or less irrelevant.

Besides Google, the recently arrived alternative technology to the Semantic Web are highly refined (and very expensive) NLP systems, and “semantic search engines” represented by great organizations, such as Hakia and PowerSet. You can see for yourself that these NLP systems do not “understand” much in the way of the world. While they are called “natural language systems” you cannot have a conversation with them. If you could, you would find that they have very limited knowledge of the way things really fit together.

This effect can be seen at the new Semantic Web based online search service for news and current events called Silobreaker, where they claim:

It recognises people, companies, topics, places and keywords; understands how they relate to each other in the news flow, and puts them in context for the user.

I tested that out on Tuesday, 12 February 2008. I clicked on a link to Presidential candidate Barrack Obama, from the Silobreaker front page. I was expecting to be shown something about Barrack Obama. Instead I was presented with a network graph:

Valid Relationships ??
It reportedly drew a relationship between Barack Obama and Justin Timberlake in the context they identified as a story from the Denver Post entitled “West, Winehouse take early Grammy lead”. It did the same for the other artists listed in the graph.

Frankly, it is just too much information for me. In this instance, it is adding to information overload because there really isn’t any relationship there. The relationship, if any, is between the Grammy Awards and the people mentioned. There is no relationship between Barack Obama and Justin Timberlake. It is wrong to infer otherwise from the context.

Part of the problem here is that the Grammy Awards are not a known entity in this static entity relationship model. The problem stems from the lack of comprehension of the interpersonal relationships involved in the context.

Neither the Semantic Web, nor AI, nor natural language processing can solve this problem. The problem is that natural language understanding is too naive and the truth-based semantics are too superficial. Yet, scientists have been reluctant to delve too deeply into the abyss of personal psychology and interpersonal beliefs on the basis that such beliefs are illogical and unscientific.

We did not buy that argument then or now. The basis or ground rules of interpersonal relationships are often characterized as psychotic and pathological, religious zealotry or astrological nonsense, although more concrete grounds have existed. No one thought of abstracting from the possibility of personal action devoid of any other notions.

That is how we are born, devoid of belief, character, nationality, philosophy or religion. It is only after grasping sounds and a language that people start filling their minds with these notions. While I have been subject to these dubious notions, Tom Adi drew us both to elements abstracted from existence and action. We found them right there, literally in front of our face, and in the smallest elements of meaning in natural language.

In part 1, I began by presenting my experience with computer systems and how we began looking for natural systems and their semantics. In part 2, I identified two indispensable sets of universal properties represented by being human: 1) a body-centered reference, and: 2) power. I trust my reader perceives these properties as self-evident.

My claim is that these properties are an affordance of individual influence and worldview. Recall in parts 1 and 2, I explained that a major premise of this work is that the elements and relations of natural languages should correspond with (the elements and relations of) other systems of natural phenomena at all times.

In the original research that I cited in part 2, Dr. Tom Adi found a natural correspondence between elements of natural language systems (sounds/phonemes) and the abstract objects used in individual cognition or recognition. I am going to expand on his finding and show how the semantics he proposed correspond with the actions and interactions of people.

The fundamental elements of language are its signs. Linguists have told us that words are signs; so are names. These signs are linguistic signs that reflect elements of social conventions and elements of design and even some modicum of chance. The same signs also reflect the power and influence of the individual voice and imagination. The fact that this power is latent in words and in text means that it is present and accessible in the unconscious mind but not consciously expressed.

This supports my charge that something is missing when all that we are left with is an alphabetical index of keywords. Using NLP on sentences of a text to decompose it into nouns, verbs and other parts of speech does not capture or address the power and influence afforded by the author, and therefore, computer processed text looses its luster, it looses its possibility for action and it looses its capacity to influence the reader.

This is another reason the linguistic, NLP or AI-like approaches to the problem of understanding the functional role of language do not work very well. When they parse a text into sentences and a bag of words, they leave out the import of words. By that I mean that modern text processing methods miss the part that carries or holds the meaning– the influential part — the significant part.

Hidden in the smallest particles of meaning, the phoneme, are elements of action (power) and interaction, these are abstract signs of the conditions existing in social interactions. They are abstractions of the boundary and engagement conditions afforded by our body-centered reference. The interpretation of meaning appears to flow from the axis of the abstractions of power and those of the perceived boundary and engagement conditions.

In psychological studies of the interpersonal relationships between people, psychologists agree that people interact implicitly and explicitly. People can be focused on the interaction, as in a sales situation, or unfocused and interacting, as people together on a bus going to the same ball game, for example. I do not want to take you from here all the way to social psychology and symbolic interactionism though both of these fields are related to what we are talking about here because language is our main tool for socialization.

I will borrow terms and polarities from social psychology and interpersonal relations, such as implicit/explicit, focused/unfocused, and open/closed, to explain the boundary and engagement conditions. In “Society as Symbolic Interaction” (1962): Herbert Blumer claimed that people interact with each other by interpreting or defining each other’s actions instead of merely reacting to each other’s actions.

Blumer wrote that the response of people is not made directly to the actions of one another but instead is based on the meaning which they attach to such actions. We do not disagree that human interaction is mediated by the use of symbols and signification, by interpretation, or by ascertaining the meaning of one another’s actions.

Interaction is always (at the very least) bipolar. Because interaction is a fundamental pillar of group dynamics it should come as no surprise that polarity is a feature of the language used by the group. We found that polarity is represented by compound abstract objects related to boundary and engagement conditions. In the book Semiotics and Intelligent Systems Development, Tom Adi wrote:

We believe that the phonemes of a word are signs that refer to abstract objects that are somehow related to the properties of the object to which the word refers:

    word X refers to object A
    each phoneme P of word X refers to an abstract object B
    abstract object B is related to property T of object A

Moreover, we believe that the human mind constantly interprets such abstract objects and that the resulting interpretations also can be abstract objects that may in turn be reinterpreted. Both the original abstract objects and their successive interpretations are related to the properties of the object to which the word refers.

    abstract object BP is interpreted as abstract object B’P
    abstract object B’ is related to property T’ of object A

In addition, we believe that the morphology of a word, its structure, is also a sign that refers to an abstract object structure that is somehow related to the structure of the object to which the word refers. The human mind also constantly interprets and reinterprets this abstract object structure.

    structure of word X refers to an abstract object structure S
    abstract object structure S is related to structural property T of object A
    abstract object structure S is interpreted as abstract object structure S’
    abstract object structure S’ is related to structural property T of object A

The repeated interpretation of the abstract objects to which the phonemes of a word refer, in light of the repeated interpretation of the abstract structure to which the morphology of that word refers, will establish more and more relationships in the human mind to the properties of the object to which that word refers. We call this principle cognitive growth by reinterpretation.

A similar growth by reinterpretation is found in biosemiotics, the study of DNA as signs of life processes. In the book “Signs of Meaning in the Universe“, Jesper Hoffmeyer wrote that Repeated DNA interpretation produces biological growth along a path called the ontogenetic trajectory. This parallel is not surprising since human cognition is born of human life processes. We expected to find relations of symmetry between the abstract objects to which phonemes refer since language is a natural phenomenon and there usually is symmetry in nature.

In Tom Adi’s study of the Arabic language he found that certain sounds interpret a compound abstract object he named closed and self as an interpretation of the influence on his mind. Certain sounds interpreted the compound objects open and self to his mind. He arranged these sounds in symmetrical columns and found that the sounds in these two columns expressed a kind of polarity, he called inward and outward.

He expected to find similar abstract objects interpreted by the remaining sounds and he found the abstract objects, closed, others and open, others. And he arranged those sounds according to the polarity they expressed, that of, focused interface, and its inverse, unfocused interface.

In English, sounds are far more ambiguous, yet they can be arranged in a similar fashion. By examining small words with a single consonant or sound we can see how the sounds of English interpret the same abstract objects. The personal pronoun “I” indicates the bipolar object: closed, self. This expression (I) directs attention inward with the focus on self. Its counterpart, the personal pronoun “You” indicates the symmetrical compound object: open, others. Our attention is directed outward and the other is targeted.

How about the pronouns “We” and “Us”. These interpret the compound object: closed, interface. Our attention is directed to a collective of objects or entities as in ‘bringing us together’. Its symmetrical counterpart: open, interface, is indicated by the personal pronoun “He”. It is not neither of us but a third person, him. It is the symmetrical opposite of we and us, it is they and them.

Now, with the principle of cognitive growth by reinterpretation in mind, as I mentioned above, try to imagine yourself a small child before acquiring language, if you can. Imagine how often you might hear just these sounds and the situations in which they are used. Imagine how often you would interpret these sounds, at first tentatively to be sure, and then with more confidence, as the learning became ingrained in your mind and in the neurons of the brain.

It seems that every phoneme, every element of meaning, in a natural language interprets a single compound abstract object, in the same way as I have shown for the personal pronouns of English. These abstract objects are called compound because each is composed of the abstraction of a definite set of boundary and engagement conditions.

Most consonantal sounds of a natural language also indicate an abstraction of power. This is an action predicate defining specific sorts of processes. Action may appear to just be, while it is really built up; it is a construction. According to Blumer and his teacher Mead, action is built up step-by-step through self-indication. Tom Adi recognized the objects he found as basic categories of abstraction. This is a set of elementary categories for all types of identification, all types of manifestation, and all types of ordering.

Identification deals with identities The sounds of words dealing with identities are who, which, I, you, we, he, it and units and elements one, a, an (the bold letters emphasize the chief sound). Those are static interpretations. Dynamic interpretations include assignments (at, and, a-) and existence (is, are, on, off, at).

Manifestation is the way things present or manifest themselves. The sounds of these words indicate this abstraction: Matter, mass, medium, field, pool, form, domain and theme are examples of static manifestation. Motion, formation, phase, application and doing exemplify dynamic manifestation.

Ordering is expressed by the sounds of the English words: numbers, names, quality, quantity, quick, quadrate, quarter, as well as energy and force (knock, quake, quench, quell), awareness , perception and feeling (notice, qualm, numb), sound (ring, quiet) and cognition (know). The negative meaning of no, non-, un-, in-, etc. comes from the bipole attached to n. In English, negation is an organizing act.

These elementary categories expand into a power set of eight categories that are represented by all the sounds/phonemes in the language.  Many consonantal sounds form bonds (inside a word) that interpret combined actions or processes.

I see this piece has gotten rather long. My aim was not to make a complete presentation but to highlight some of the features of the approach and intelligent computer software I am producing. I hope that I have clarified that personal action and the dynamic boundary and engagement conditions relevant to every situation give way to the interpretation of meaning from language. If I get to another post on this subject, it will explore the bonds of abstract objects inside words and how meaning is composed in ways similar to how action is constructed.

As always, feel free to leave me a comment by clicking the link to comments below.

Older Posts »