Archive for May, 2007

With the rise of social and interpersonal uses of computers on the Internet, it may be time to re-examine text processing models along with assumptions about semantic search engines. The question that comes to my mind contrasts semantic indexing and search against the data-oriented semantic web. I’ll start by setting the context to semantics for text analysis and comprehension.

My conception of semantics for text analysis has to do with filtering, comparing and contrasting the principle concepts and topical categories often implicit of texts– and of conversations between regular people. An interpersonal semantics can be seen when someone makes a statement like “let’s not argue over the semantics”. In such a situation, it usually means one of the parties’ objects to the way some term or language is being used as a characterization or representation for a particular object or state of affairs.

This can happen when you are reading a text as well. You may not agree with the author’s choice of words and more likely, you do not agree that a particular word is the proper sign in a certain case. When examining hits from a search engine, you do not agree they are relevant. While processing text, a semantic search engine must deal with an interconnected system of meanings, including; ideation, interpersonal semantics, having to do with control and exchange, and also the textual semantics concerning how a text may be constructed. A semantic search engine is concerned with valid connections.

Words interconnect in various contexts and there is a network of text on the Internet exhibiting a wealth of interpersonal and social relations for a semantic search engine to exploit. Access to these relations need not involve adding meta-data to every word.

A semantic search engine must be capable of interrelating words with personal ideas.

There are all kinds of inferences one can draw in any case, including authoritative or technical implications, for example: turtle -> tortoise -> chelonian -> reptile. People may or may not realize that many words have conceptual and social implications that can link them, for example: job -> employment -> work -> chores -> knowledge ->skill ->craft -> … And of course there are false or even mistaken implications. More significantly, words often have hidden or unknown implications to any given speaker of the language.

A semantic search engine must be capable of reading these implications and making the correct connections in order to orient and carry out a deep and meaningful search that turns up substantive results. Just as our personal semantics help us infer and contrast the implications in the face of uncertainty, a semantic search engine that maps expressions onto interpersonal functions uses the functionality to link individual, social and cultural ideas. In such a case, the computer is not just retrieving, it is assisting our understanding.

This does not mean we must redefine or add definitions for every element of text on the World Wide Web. The semantic search engine maps the inherent symbolic structures to the domain of interpersonal relations as readily as we read and write their representations.

The construction of social reality aside, ideation is certainly not bound to social institutions. Ideation is individually grounded. It functions in the built-in domain of interpersonal relations and it is bounded by physical processes.

There is a direct mapping from the expressions of a language onto a set of semantic functions in the associated domain of interpersonal relations. There is no intermediate step of obtaining truth conditions. It is more a matter of what is fitting than what may be ascertained true or false. When I am looking for my keys: it is only after exhausting all the likely places they might be found that the search is really on. Likewise, I have a need to become informed just to realize the likely relevant questions I should be asking. A semantic search uncovers equivalent and relevant mappings between statements. Whether they are logically sound and true statements is not a consideration.

What do you do (every time) before you start looking for your keys or some other object? I will bet that you do the same thing I do or that anyone else does. I have to orient myself, determine where to look, choose a starting point and a path. The likelihood (or similarity, equivalence, etc.) and distance between point A and point B have a lot to do with the actions I take.

A meaningful inquiry orients a meaningful or semantic search.

A semantic model with mapping functions that are grounded in interpersonal affairs effortlessly compares and contrasts abstract ideas. Expressions formed from words and names are mapped onto interpersonal ideas in a model where ideas and abstracted things occupy a reflective space. Consider the physical relationship between a ball and a person where the relationship between the ball and that person becomes more significant should the ball be thrown at the person. This is the stuff of meaning.

Meaning, for me, emerges from my location and orientation in relation to all those things that come into my field of vision, hearing, or my field of perception. It is a process. To arrive at the meaning of a thing, fundamentally I am comparing what is in my field of perception [ a ] with what is actually happening [ b ] in the physical field. Of course the points of significance in my perception should line up with what is actually happening. The same principle drives text searching on the internet. In a semantic search engine, the composition of expressions from an inquiry should line up with pertinent parts of texts.

Most people think linguistics and NLP and reasoning (AI) when they think of semantic modeling. It should be rather obvious that in the type of modeling I have described, one is working with points in an abstract space rather than the conventions of language. Some folks may be surprised to learn that there are other equally valid conceptions of semantic and cognitive models for computers. Some of them have advantages worth considering.

Adding a semantic model to full text search techniques.

I want to expand a little on the general full-text indexing processes I mentioned above. Most people are familiar with search engines and how they are applied. Take the Google Desktop Search or MSN Search for your personal computer. When you install these packages, you do not have to re-write your files, or even mark up you files. Most full-text indexing algorithms have built-in procedures that do not require tagging, markup and the like. Text search engines can even deal with formatted files.

Just imagine what the full-text search engine would be capable of, if it were capable of semantic reasoning with the plain words that are being indexed. A semantic search engine should be able to produce meta-data and tags for people to use for browsing.

If you have a semantic model of this sort (not other worlds, NLP, AI oriented, FOL bound, etc.) which measures those relationships between topic a and topic b, you can obtain some real-world concepts like law and rule, nursing and ranching, job and work– really anything at all –and obtain a measure of the relevance and affinity between them. Semantics involves inheritance rules and expressions inherit semantic properties from single word concepts such as those mentioned.

The more related the concept and or topical structures being modeled, the more pertinent and relevant the actual words and phrases are to each other. The concepts for job and for work are related, while the job concept and the concept of sob are not. A search engine algorithm can use that in order to enrich the user experience by locating conceptually related matter in the “discovery inventory” that are the words and pages accessible and addressable in the network. There is no need for a new web and more data glut.

Where you exercise this kind of semantic search algorithm you obtain a notion of the relevance between topics of inquiry and expressions in texts. For example, a search using ‘job’ as the query should be capable of locating a document with the title: “Earn money while learning a new craft”. In a good semantic search algorithm, that document would only be offered up, should the locations of the literal equivalent (job) and the relatives, e.g., career, employment, work, apprenticeship, etc. (that may be more pertinent) come up empty or be exhausted. That can make for a richer experience for the user.

A good semantic search engine would not force the user to drill into and select additional search terms or senses unless they so choose. It must offer controls on how a query entry may be treated due to the expansive array of options available for processing the queries. Every query item can have a literal or a conceptual as well as a logical role to play.

I should be able to choose to use job as a keyword, as in “Job lists”; so that rather than interpreting job -> craft and extending the search there, the semantic search engine will go in the direction of job listings. This may be what Google already does but that does not mean a semantic search engine can dispense with doing it. I think searching in that direction with a semantic search engine would also turn up To-Do Lists. After all, aren’t they job listings too? Google would not do that.

Semantically correct results may also be irrelevant. In fact, in recognition of the fact that a semantic search must be personally meaningful, a semantic search engine needs controls on sorting by conceptual relevance or timeliness, expanding on concepts or being literal, and in sizing the locale or abstract space and context for different search circumstances and other implicit conditions not easily inferred. That is a pretty sophisticated semantic search engine and I don’t think there is a development on the Semantic Web or within the W3C that is anything like the semantic search engine I have described here.

Now let’s take a look at the Semantic Web.

Answers.com defines the semantic web as the “defined” web. That is the most concise and most meaningful definition I have seen until now. To most people, it should imply that things, pages, etc., on the web, are defined. Unfortunately it imparts the impression that semantics are definitions. Many people I have spoken with naturally think that means such definitions can be used to link and relate words and names, in ways similar to the linked words in the examples above. NLP systems have shown this can be done very nicely in limited circumstances. Semantic web techniques follow in the AI and NLP tradition and build on principles long established in the field. Yet, these models seem to fall short of achieving success for lack of one critical feature or another.

I found another definition of the semantic web along with some of the best examples I have seen that demonstrate the semantic concepts of the semantic web.

The Semantic Web is a framework that rigidly defines a means for creating statements of the form “Subject, Predicate, Object” or “triples,” in a machine-readable format, where each of Subject, Predicate, Object is a URI.

The means of “rigid definition” is a series of standards published by the World Wide Web consortium, namely RDF, RDF Schema, and OWL.

Paul Ford, the author of the article, ”A Response to Clay Shirky’s “The Semantic Web, Syllogism, and Worldview” published in November 2003, goes on to explain the kind of deductive reasoning that is the foundation of the semantic web. This reasoning is added to the source data as metadata in the form of RDF triples. Paul Ford explains the deductive reasoning and the triples in a very elegant and understandable way:

Let’s say you’re reading the book Defenders of the Truth, which is sociologist Ullica Segerstråle’s intellectual history of the debate over sociobiology. Interested in finding what the book has to say about Steven Jay Gould, you turn to the index, and find:

Gould, S.J.
  and adaptationism  117-18
  and Darwinian Fundamentalists  328
  and Dawkins  129-31
  Ever Since Darwin (1978)  118
  and IQ testing  229-31
  Marxism  195, 226
  unit of selection dispute  129

(p 482, example much abridged)

If you are an experienced reader with some knowledge of the field of sociobiology, you can make a variety of deductions using the index. Take the third item, “and Dawkins 129-31.” Looking at this statement, and drawing on your memory, you could deduce:




Dawkins Is a synonym for Richard Dawkins
Steven Jay Gould Interacted in some way with Richard Dawkins
Information on Steven Jay Gould’s interactions with Richard Dawkins Can be found on Pages 129, 130, and 131
Pages 129, 130, and 131 Are found in The book Defenders of the Truth

And so forth. Internally, you wouldn’t go to so much effort; if you had to think using predicate logic, it’d be hard to get out of the house in the mornings. As Shirky writes:

When we have to make a decision based on this information, we guess, extrapolate, intuit, we do what we did last time, we do what we think our friends would do or what Jesus or Joan Jett would have done, we do all of those things and more, but we almost never use actual deductive logic.

Let’s examine what a semantic search engine can do without the use of the book index and without an RDF file, or stored “triple” or an OWL ontology made specifically to use it. The semantic search engine using semantic reasoning can parse the book, i.e., read the book Defenders of the Truth, and then, any reader, irrespective of experience, whether or not they are familiar with science or sociobiology, upon seeing the title might assume that someone is fighting about truth if it needs a defender, select it, and begin:

Processing a query for the subject A semantic search engine dynamically makes this link and retrieves the instances
Who is fighting Gould can be co-located with Dawkins and fight concept in TOC of the Defenders of the Truth Shooting past each other: Gould’s and Dawkins’ drawn out dual.
Dawkins can be equivalent to Richard Dawkins, Dawkins’
Gould can be equivalent to Stephan J. Gould, Gould’s
Gould, Dawkins Co-locations in the book. (more co-locations than mentioned in the index) pp. 8, 23, 144, 129-131, 133-137, 178, 244, 259, 260-262, 312, 320-329, 399-400, and backmatter

The information obtained from the semantic search engine is rich enough to lead to many other assumptions and deductions about the views of Gould, Dawkins, Emlen, Barash Wilson, and others mentioned in the book; and about the range of conflicts in the science of animal sociobiology in general. This is the substance of a rich user experience.

Neither the bibliographic entry nor the RDF file is needed to obtain this information. This seems to prove my point that the search engine using semantic reasoning does not need or require pre-recording such inferences as RDF triples and having them processed as rigid data structures under explicitly defined semantics before you or I can realize them.

The web resource Answers.com goes on to say that the goal of the “defined” web is “to identify more Web-based data and their “interrelationships” for more effective search. This could be quite true; a query language specification is being drafted and more RDF structured data will certainly become available. Yet, just like many things do not fit into typical database field and record structures, there will still be orders of magnitude more words and text and messages, and social and cultural discourse, that will not fit into an RDF tuple than what will fit.

I have demonstrated that a semantic search engine is not the same thing as the semantic web. The semantic search engine informs its user using a rich framework for conceptual interpretation married with the best of modern indexing and search techniques. This does not mean that a semantic search engine is a replacement for the semantic web. In fact they are two different things. The semantic web is for data processing not conceptual processing.

In my estimate, people should not expect the Semantic Web to “conceptualize” or “organize” or be “capable of searching” much more than, say, 5% of the web pages and resources already indexed by Google. I believe they could work together. The logical inferences stored in a triple store (an RDF file) could be produced on the outputs of a semantic search engine. There is probably a promising line of research there.

Let me know what you think about whether semantic search and the semantic web is the same thing.

Ken Ewell, Gainesville, Florida, May 2007

Ken Ewell is the founder and CEO of Management Information Technologies, Inc. (MITi) an independent software research and development company found in 1984. Since 1987 Ken has been involved with the development, production and application of Readware technology, a semantic classification framework for document indexing, analysis, classification, search and retrieval.


Read Full Post »