The world of semantic searching just got a little bigger thanks to some pretty savvy investors and talented scientists and computational engineers over at Feedster. My hat is off to them.
Applying semantic search to raw RSS feeds is very challenging and is no small undertaking. Feedster has more than three hundred million posts in their archives and they “semantically” index from ten to thirty-thousand posts, from a collection of more than 70 million feeds, every ten minutes– using Readware technology.
As a semantic search framework, Readware, the underlying search engine at Feedster, is primarily an indexing, search and retrieval engine built on a theoretical semantic foundation. That means that it parses, tokenizes and indexes words from texts as search engines tend to do; it treats tokens as keywords and also as semantic objects that have interconnecting relations with similar objects.
Let me give you an example. I guess we can begin with the example Marti Hearst brought up about compound noun phrases. I think this is a good example because many search queries take this form. I would rather take some nouns where the search engine results can be a little more revealing of the power of the underlying algorithms. So I will choose trends and economy as my nouns.
The noun “economy” is ambiguous, because of course, it could refer to macro economic ideas or it could refer to a state of parsimony. The other noun “trend” is much less ambiguous and has multiple and even metaphoric means of indication. By that I mean that the idea of a state of affairs that someone might correctly call a “trend” can be represented with many other words and word phrases, such as increase, decrease, boom, etc..
To that we add the pragmatic aspect, in the realm of search. To what ends do we search for trends in the economy, say. What is it that we should desire to obtain from the exercise. For that is what drives the algorithms that take these signs and foster a search for valid and relevant responses.
My internal, or intellectual, algorithm tells me that what the phrase “trends in the economy” means depends on the subject. Without the subject there is no way to distinguish whether the phrase refers to trends in fuel economy or trends in the macroeconomic indicators like jobs, housing and trade, such as country GDP.
The easy way out is to take the more popular position on the axiom that you can please some of the people some of the time. It takes more time and an –interaction– to find out what the user really means. In person, if someone asked: “trends in the economy”? I might respond: Do you mean my car’s economy or the general economic situation?
I do not want my search engine to do what is popular, I want it to try to help me. If all search engines pragmatically assume the category or subject is macroeconomics, I will not be able to use the search engine effectively because all it will do is give me pointers to information fitting its narrow interpretation.
In the worst case, the search engine algorithm should take that search phrase and return a representative and relevant set of results. A representative set of semantic results should not only consist of macroeconomic hits, the set should include some things relevant to saving and economy– particularly since the search term is economy and not economic. That is a hint that the popular search engines do not seem to pick up on, although it is obvious and easily distinguished. Let me show you what I mean.
Below are the results I obtained from Google, Hakia and the semantic search engine at Feedster on the search “trends in the economy”:
First, Google:
As you can see, Google seems to have interpreted the phrase as “economic trends” and these first five hits are representative of those that followed. It seems that Google’s pragmatics are claiming to know better than us what our own words may mean. By changing around my search phrase, they altered the meaning and gave me keyword hits as authority for that meaning– a little too 1984ish for me, but it did bring hits with both noun forms (economy or economics and trends).
Because there is nothing about savings or fuel economy in the hits and due to the fact the both nouns are highlighted, I can definitively claim there is nothing “semantic” in Google’s search. Fixing the meaning of the noun economy to denote economics is not meaningful. In my opinion– it is destructive of meaning.
Hakia, interpreted macro economy too. On the other hand Hakia tried to stick to the form trends in the economy and their top five results show this quite clearly:
Still, there is nothing semantic about Hakia’s results. This is mainly because Hakia’s sort of semantics are not useful for distinguishing semantic objects from nouns, verbs or any other forms of words. Hakia distinguishes sentence meaning using natural language grammar and literal semantics of sentences.
Their search pragmatics seem to be that they fall back on keywords when no sentence exists or they are unable to decipher the semantics of a phrase. This is evidenced by the top five hits, shown above and by the rest that followed.
Now here are the results from Feedster using Readware as the semantic search engine:
It is telling, I think, that the search term “trend” in not among the titles and summaries of the top five results from Feedster, yet each hit is a relevant hit about trends or a trend in the economy. One hit even refers to a fuel economy increase. Neither Google or Hakia, and I dare say, no other search engine has a semantic algorithm quite so powerful as the one here.
The pragmatics of the Feedster search are to choose semantically relevant results from the most recent posts collected from the Internet. Because perception plays a huge role in the appreciation of a search result, Readware searches for passages with the same words, just as Google and Hakia do, yet for Readware, the concept of economics is not the same as the concept for economy. Also, in these results, the concept of trend is interpreted as growth, increase, boom, etc.
Obviously, Readware distinguishes meaning in much more sophisticated ways than either Google or Hakia can. Feedster is using Readware as their search engine because they are committed to the RSS and Blog community and they are determined to bring better tools to everyone interested.
Many people would think that it is right to interpret economy as the macro economy, and that a search engine should not bother collecting a hit about any other sense of the meaning or reference for the term economy. I would like to know what you think. Feel free to comment.



Hi Ken, I would love to see this analysis made with other terms as well. After all, as a blog/feed search engine, Feedster always looks up “trends”. Ad they have never claimed that they are a semantic search engine, please let me know what other results takes you to this conclusion.
Thanks for reading Emre,
It was reported elsewhere that Feedster has implemented a semantic search engine. They report it also in their blog entry on the roll-out of the new version. You can find that on Feedster’s home page.
Be my guest and go over to http://www.feedster.com and try your own word combinations and then try the same query at other search engines. Some other queries that I tried were:
food prices
government spending
alternative energy
In each case, I got a more representative response from Feedster than from the other engines. I am not saying they are more relevant nor am I claiming that the Feedster results are better or worse than others. I am just saying, it is easy to see whether or not any sense information is in play. I think the user experience is enhanced when the search engine is smart enough to use the sense of the word to interpret relevant relations.
Also, another test can be done because the semantic system indexing Feedster’s archives is fluent in German and has passable French. Both Google and Hakia, along with most others, claim to be “language independent”
Try the queries above in German and see the responses — there is no translation going on — the system is interpreting the meaning of the terms and determining conceptual relations.
Use L:G to tell the system the query is in German and L:F to tell it is in French. For example, government spending in German is:
L:G Regierung Ausgabe
Hakia claims that language understanding is essential to semantic search, yet it is clear in my tests that they are not interpreting the meanings of the words themselves.
You know, well, if you do not I will tell you: business is crazy.
I have to tell you that we had to terminate our relationship with Feedster in late July 2007. Since we began working with the Feedster group, we knew they were undergoing change.
We were successful in raising their statistics (via Alexa) since we began working with them in October 2006. But it became a business issues in May and finally terminated in July.
During our engagement, we (readware developers) believe that we directly contributed to a large increase in page views and other statistics of web presence at Feedster.
We also believe we demonstrated that semantic search can scale to the levels required for Internet access because we indexed more than 100 million feeds post on more than 30 million feeds at Feedster. This is the first semantic search engine in the world to scale to this level.
[...] feeds while we hosted Feedster’s service for a short while. I wrote about this experience in an earlier post. Internet statistics show that Readware had a substantial impact on page views and reach for [...]