(or where I howl on about Latent Semantic Indexing)

You know what?  I’m a bit surprised.  I haven’t seen a lot of blog posts about Latent Semantic Indexing.  It’s pretty good stuff LSI.  I’m telling you.  Not quite Aldos Huxley, Doors of Perception good (as an aside I’m not suggesting any young pups go about trying to open the old perception Huxley style…..drugs are bad K? good, glad we cleared that up).  But unlike Huxley opening his mind and allowing his senses to become fully aware of everything around him that he was previously filtering out), search engines are historically very bad at filtering out unwanted or irrelevant results. LSI will act like a natural filter to your favourite search engine’s mind.  Kind of like an anti-door of perception.  or a door of anti-perception. erm.  something like that.

Basically Latent Semantic Indexing is all about associating language and terminology, creating lexemes I had a geography teacher who used to say “if you don’t know the jargon associated with a subject, you can never discuss it”.  He was a pain in the arse, but he was right, and Google knows he was right.  So LSI attempts to understand the jargon around a subject area in much the same way as a human would.  A search engine using LSI will rank a page higher if it seems to contain more of these associated words, because the search engine will assume that whoever wrote the page knows about the subject matter.

Hmmmmm, you say.  That’s gonna really fuck up keyword stuffing.  Probably.  Unless you start to understand your subject area, you are going to be screwed when it comes to ranking if latent semantic indexing becomes more important.

How will the search engines associate words you ask? Good question, young pup!  I don’t know is the answer, but a couple of methods spring to mind……the use of a lexical database.  Essentially a lexical database groups words into cognitive synonyms.  What?  Basically the words mean similar things.  So finish and complete – both words mean the same thing, though there can be subtle differences in meaning and usage….we generally “finish a meal” we don’t really “complete a meal”…in this sense finish has a subtle side meaning of consume, but at the same time it still means we have come to the end.  The other thing I suppose the search engines could do, is take currently well ranking pages on subject, analyse them and use the language in them as benchmarks for constructing an LSI database.  I hope they don’t go down this route, because it would really need a lot of clean up, simply because there are so many of “us” around who are trying to dominate the search rankings by hook or by crook.  Just because a page is number 1 at the moment, doesn’t mean it is well written, or semantically correct……it usually means it has the most number of backlinks.

Now the astute among you will be wondering what the bone is that I’m throwing.  You’ve been patient, and probably read a bunch of stuff about language that you couldn’t care about so here you go……surf over to your favourite search engine – Google, and type the following in ~weight.  Now consider the search results.  Pretty cool eh? (If you can’t figure out what is going on, don’t be afraid to ask).

I don’t know how much LSI google uses at the moment, but one thing is for sure, when latent semantic indexing becomes more and more important, we will get a big shuffle in the search results and we will all have to start becoming experts in subjects we currently know nothing about, like for example the lexemes of language.

Now that was worth the price of my blog alone.  You all owe me a drink!

  1. October 23, 2008 at 10:56

    Very interesting read, bloody good post in fact. I know google have been experimenting with this technology for a while. They use it for there adwords keyword suggest tool. I’m also guessing they use this idea to provide add to keyword matching. Meaning that the advert doesn’t have to match the search terms exactly. Of course this is annoying if you have paid for a specific keyword and google matches it against another (take weight and mass for instance), but is very good for google who can show your advert to many more people.

  2. underdogblogger
    October 23, 2008 at 11:20

    Good point Clog Money – in theory, you can use the sandbox in adwords to test out your pages to see what sort of LSI the big G does on a particular body of text, but as you say – annoying to get your advert slapped on stuff you don’t want to target just because G says its related…..weight and mass are excellent examples – especially in the hotly contested weight loss niche….be funny if we set up a bunch of weight loss pages with a bunch of “mass” like keywords and ended up dominating the niche….hmmmmm

  3. Dan
    October 23, 2008 at 11:58

    Very interesting post there, I hadn’t really considered anything about LSI’s but what your saying makes sense.

    The thing to type into google is quite handy – I tried out a couple of theasureses (is that the plural?) and didnt get nearly as nice results (how you would fit ‘ponderousness’ in your text relating to weight Id never know). I guess that if advertising on google, then best to use their theasurus anyway!

  4. underdogblogger
    October 23, 2008 at 15:10

    lol – ponderousness…..hmmmmm lemme try. “eatLessMoveMore ™ miracle weight loss is guaranteed to rid you of the preposterous ponderousness of a sloth fed on a diet of big macs and full fat coke until she has reached the mass something akin to that of a very very heavy thing. Weighing tons can be tough, but eatLessMoveMore tastes just like a lettuce leaf infused with the subtle jus-de-sweat from one too many gym towels. Try it and you will no longer be imbued with the ponderousness of a hippo. Soon, the new svelte you could be poncing down the high street with the carefree attitude of a drunken Kerry Katona on daytime TV”

    erm…how was that? 🙂

  5. Dan
    October 23, 2008 at 16:30

    Haha very well done – am loving the Kerry Katona reference, just watched that clip – what a mess she is!!

  6. October 24, 2008 at 19:09

    Man.. lol

    Aldos Huxley, Brave New World good? I absolutely love Huxley he was one fucked up dude indeed. Brilliant, but fucked.

    @ Gym towel jus-de-sweat.. mmmm…

    I could have sworn LSI was something you put under your tongue an hour before decorating your Christmas tree?!?

