Archive for June, 2007

The world of semantic searching just got a little bigger thanks to some pretty savvy investors and talented scientists and computational engineers over at Feedster. My hat is off to them.

Applying semantic search to raw RSS feeds is very challenging and is no small undertaking. Feedster has more than three hundred million posts in their archives and they “semantically” index from ten to thirty-thousand posts, from a collection of more than 70 million feeds, every ten minutes– using Readware technology.

As a semantic search framework, Readware, the underlying search engine at Feedster, is primarily an indexing, search and retrieval engine built on a theoretical semantic foundation. That means that it parses, tokenizes and indexes words from texts as search engines tend to do; it treats tokens as keywords and also as semantic objects that have interconnecting relations with similar objects.

Let me give you an example. I guess we can begin with the example Marti Hearst brought up about compound noun phrases. I think this is a good example because many search queries take this form. I would rather take some nouns where the search engine results can be a little more revealing of the power of the underlying algorithms. So I will choose trends and economy as my nouns.

The noun “economy” is ambiguous, because of course, it could refer to macro economic ideas or it could refer to a state of parsimony. The other noun “trend” is much less ambiguous and has multiple and even metaphoric means of indication. By that I mean that the idea of a state of affairs that someone might correctly call a “trend” can be represented with many other words and word phrases, such as increase, decrease, boom, etc..

To that we add the pragmatic aspect, in the realm of search. To what ends do we search for trends in the economy, say. What is it that we should desire to obtain from the exercise. For that is what drives the algorithms that take these signs and foster a search for valid and relevant responses.

My internal, or intellectual, algorithm tells me that what the phrase “trends in the economy” means depends on the subject. Without the subject there is no way to distinguish whether the phrase refers to trends in fuel economy or trends in the macroeconomic indicators like jobs, housing and trade, such as country GDP.

The easy way out is to take the more popular position on the axiom that you can please some of the people some of the time. It takes more time and an –interaction– to find out what the user really means. In person, if someone asked: “trends in the economy”? I might respond: Do you mean my car’s economy or the general economic situation?

I do not want my search engine to do what is popular, I want it to try to help me. If all search engines pragmatically assume the category or subject is macroeconomics, I will not be able to use the search engine effectively because all it will do is give me pointers to information fitting its narrow interpretation.

In the worst case, the search engine algorithm should take that search phrase and return a representative and relevant set of results. A representative set of semantic results should not only consist of macroeconomic hits, the set should include some things relevant to saving and economy– particularly since the search term is economy and not economic. That is a hint that the popular search engines do not seem to pick up on, although it is obvious and easily distinguished. Let me show you what I mean.

Below are the results I obtained from Google, Hakia and the semantic search engine at Feedster on the search “trends in the economy”:

First, Google:

“trends in the economy”

As you can see, Google seems to have interpreted the phrase as “economic trends” and these first five hits are representative of those that followed. It seems that Google’s pragmatics are claiming to know better than us what our own words may mean. By changing around my search phrase, they altered the meaning and gave me keyword hits as authority for that meaning– a little too 1984ish for me, but it did bring hits with both noun forms (economy or economics and trends).

Because there is nothing about savings or fuel economy in the hits and due to the fact the both nouns are highlighted, I can definitively claim there is nothing “semantic” in Google’s search. Fixing the meaning of the noun economy to denote economics is not meaningful. In my opinion– it is destructive of meaning.

Hakia, interpreted macro economy too. On the other hand Hakia tried to stick to the form trends in the economy and their top five results show this quite clearly:

trends in the economy

Still, there is nothing semantic about Hakia’s results. This is mainly because Hakia’s sort of semantics are not useful for distinguishing semantic objects from nouns, verbs or any other forms of words. Hakia distinguishes sentence meaning using natural language grammar and literal semantics of sentences.

Their search pragmatics seem to be that they fall back on keywords when no sentence exists or they are unable to decipher the semantics of a phrase. This is evidenced by the top five hits, shown above and by the rest that followed.

Now here are the results from Feedster using Readware as the semantic search engine:

trends in the economy

It is telling, I think, that the search term “trend” in not among the titles and summaries of the top five results from Feedster, yet each hit is a relevant hit about trends or a trend in the economy. One hit even refers to a fuel economy increase. Neither Google or Hakia, and I dare say, no other search engine has a semantic algorithm quite so powerful as the one here.

The pragmatics of the Feedster search are to choose semantically relevant results from the most recent posts collected from the Internet. Because perception plays a huge role in the appreciation of a search result, Readware searches for passages with the same words, just as Google and Hakia do, yet for Readware, the concept of economics is not the same as the concept for economy. Also, in these results, the concept of trend is interpreted as growth, increase, boom, etc.

Obviously, Readware distinguishes meaning in much more sophisticated ways than either Google or Hakia can. Feedster is using Readware as their search engine because they are committed to the RSS and Blog community and they are determined to bring better tools to everyone interested.

Many people would think that it is right to interpret economy as the macro economy, and that a search engine should not bother collecting a hit about any other sense of the meaning or reference for the term economy. I would like to know what you think. Feel free to comment.

Read Full Post »

For every software vendor that claims they have semantics, the semantic web or semantic search, or tells you that their product “understands” human or natural language, or that they can answer questions instead of (merely) retrieving a relevant set of possible answers, ask the software:

What is the product of two-thirds and forty-eight?

It is a question that can be interpreted and rather easily answered by many 3rd graders and certainly by a majority of 5th graders. I can guarantee you though, there is not one search engine on the list at alt search engines or anywhere on the web, that can answer it. There is a program (not a search engine) on the Internet, though, that can answer this and other questions put to it in English natural language. That is my only hint to you.

Can anyone tell me what the web address is?
Go ahead! Type the question into your favorite search engine.

Share the answers you get from the alt search engine list, and find the web site that can answer the question most directly and correctly.

Read Full Post »

As a follow-on to my post from yesterday, another interview confirms that Tim Berners Lee would really rather refer to the semantic web as the data web instead. Sir Tim Berners-Lee said as much at the beginning of this interview by ZDNet Executive Editor David Berlind.

Even though it may not sound like much, it really clarifies the status-quo and should go a long way towards insulating the W3C from criticism. By that I mean that “Semantic Web” proponents opened themselves up to criticism by adhering to the preposterous idea that their work was about meaning or interpretation of meaning from natural language.

It is really about linking data and not about meaning at all, other than the indirect link to how one gets meaning out of combining different sorts of data.

The interview is interesting because TBL talks a little about the long road from the humble beginnings until now. In a lot of ways, we had a similar development path beginning with how to clean, clarify and verify the source pages, how to capture relations and links in human discourse, how to transform phrases and expressions into executable query structures, and then how to use the results through automata (rules, programs).

When people learn that Tom Adi and I have been working on Readware for more than twenty years, they wonder why it takes so long. It is really not so unusual. Dr. Raskin over at Purdue and Hakia has also been working for decades on semantics and theoretical and computational models. It is not something that develops overnight.

The dream of Tim Berners-Lee is to link data from disparate data sources and to have uniform means for storing such data on computers.

The Data Web will make it much easier to get data out of databases.

I am all for that. How about you?

Read Full Post »

Owl + RDF = Semantic Web

I thought I heard Tim Berners-Lee say that if the ontology is not built with OWL and the data are not in RDF, it is not a “semantic web” application. That is pretty defining. So the definition of the “Semantic Web” is a web of data converted to URIs stored in RDF tuples using an ontology specified with OWL.

Check it out over at eWeek, Chief Technology Analyst Jim Rapoza speaks to Tim Berners-Lee, creator of the Web and head of the World Wide Web Consortium, about the current status of the Semantic Web, the challenges it faces and its future. Jim also speaks to Eric Miller and Stephen Downes, a researcher at the National Research Council’s Institute for Information Technology in Canada.

About the Semantic Web buzzword “interoperability” I heard Stephan Downs say: “That is what you do when you can’t achieve ubiquity.” I think I agree with that.

I also heard TBL or Eric Miller, I cannot remember who, saying that a key feature is the resilience of semantic web applications. I think Larry Elison said the about the RDBMS in the late 70s. I wonder what people think about the “resilience” of semantic web programs?

Read Full Post »