Posts Tagged ‘lexical database’

The underdog throws you the bone of LSI

October 23, 2008 6 comments

(or where I howl on about Latent Semantic Indexing)

You know what?  I’m a bit surprised.  I haven’t seen a lot of blog posts about Latent Semantic Indexing.  It’s pretty good stuff LSI.  I’m telling you.  Not quite Aldos Huxley, Doors of Perception good (as an aside I’m not suggesting any young pups go about trying to open the old perception Huxley style…..drugs are bad K? good, glad we cleared that up).  But unlike Huxley opening his mind and allowing his senses to become fully aware of everything around him that he was previously filtering out), search engines are historically very bad at filtering out unwanted or irrelevant results. LSI will act like a natural filter to your favourite search engine’s mind.  Kind of like an anti-door of perception.  or a door of anti-perception. erm.  something like that.

Basically Latent Semantic Indexing is all about associating language and terminology, creating lexemes I had a geography teacher who used to say “if you don’t know the jargon associated with a subject, you can never discuss it”.  He was a pain in the arse, but he was right, and Google knows he was right.  So LSI attempts to understand the jargon around a subject area in much the same way as a human would.  A search engine using LSI will rank a page higher if it seems to contain more of these associated words, because the search engine will assume that whoever wrote the page knows about the subject matter.

Hmmmmm, you say.  That’s gonna really fuck up keyword stuffing.  Probably.  Unless you start to understand your subject area, you are going to be screwed when it comes to ranking if latent semantic indexing becomes more important.

How will the search engines associate words you ask? Good question, young pup!  I don’t know is the answer, but a couple of methods spring to mind……the use of a lexical database.  Essentially a lexical database groups words into cognitive synonyms.  What?  Basically the words mean similar things.  So finish and complete – both words mean the same thing, though there can be subtle differences in meaning and usage….we generally “finish a meal” we don’t really “complete a meal”…in this sense finish has a subtle side meaning of consume, but at the same time it still means we have come to the end.  The other thing I suppose the search engines could do, is take currently well ranking pages on subject, analyse them and use the language in them as benchmarks for constructing an LSI database.  I hope they don’t go down this route, because it would really need a lot of clean up, simply because there are so many of “us” around who are trying to dominate the search rankings by hook or by crook.  Just because a page is number 1 at the moment, doesn’t mean it is well written, or semantically correct……it usually means it has the most number of backlinks.

Now the astute among you will be wondering what the bone is that I’m throwing.  You’ve been patient, and probably read a bunch of stuff about language that you couldn’t care about so here you go……surf over to your favourite search engine – Google, and type the following in ~weight.  Now consider the search results.  Pretty cool eh? (If you can’t figure out what is going on, don’t be afraid to ask).

I don’t know how much LSI google uses at the moment, but one thing is for sure, when latent semantic indexing becomes more and more important, we will get a big shuffle in the search results and we will all have to start becoming experts in subjects we currently know nothing about, like for example the lexemes of language.

Now that was worth the price of my blog alone.  You all owe me a drink!