Custom meta tags in search results and full stops

May 4, 2006 in Google Mini,GSA,XML API | Comments (4)

When you’re using custom meta tags on your pages so you can serve up or search very specific information in your Google Mini or Search Appliance, it’s important to chose a meta name that will not conflict with other meta tags that might exist on your site or the public sites you are spidering (as pointed out to me yesterday by Nathan, the host of two Mini’s I’m working on currently.)

You can put a little code of your own before or after your meta tag’s name to make it unique for your project. This is like ‘namespaces’ in programming – where you try to keep your variables separate from anything that might conflict with them and over-write them with different data. For instance the Dublin Core project puts ‘DC.’ in front of their names, so you know what standard it’s related to. So instead of…

<meta name=”Publisher” content=”Web Positioning Centre” />

you have:

<meta name=”DC.Publisher” content=”Web Positioning Centre” />

Letting you know they are working within Dublin Core standards, and it’s unlikely any page is all ready using a tag called ‘DC.Publisher’, whereas it could be using ‘Publisher’ on it’s own.

If you’re setting up your own meta tags for use with a Mini or GSA, do not use full stops, ‘.’, to separate your code from the general name. When you pull back the results, they use full stops to separate different tags that you want to bring back using the ‘getfields’ flag.

So if you wanted to bring back the information in ‘DC.Publisher’ with the rest of the search results data, it will actually try to bring back information from the meta tag named ‘DC’ and another tag named ‘Publisher’

To avoid this happening, use something else to separate your namespace code (your ‘DC’) from the rest of the name. It would be a good idea not to use anything that needs to be ‘URL escaped‘ which pretty much limits you down to the following: $-_+!*'() – personally I tend to use a hyphen, ‘-‘, as it’s quite readable and is unlikely to cause problems in programming, unlike 4 or ()

Your meta information with your namespace code could look something like this:

<meta name=”gsad-site” content=”Spidertest” />
<meta name=”gsad-author” content=”Web Positioning Centre” />
<meta name=”gsad-image” content=”http://www.spidertest.com/images/wpc-logo.gif” />

Then you can get back these fields in your search results by using:

&getfields=gsad-site.gsad-author.gsad.image

In the XML of your results, you will get these additional fields:

<MT N=”gsad-site” V=”Spidertest”/>
<MT N=”gsad-author” V=”Web Positioning Centre”/>
<MT N=”gsad-image” V=”http://www.spidertest.com/images/wpc-logo.gif”/>

Comments (4)

RSS feed for comments on this post.

  1. Comment by Dave Lemen — May 6, 2006 @ 2:01 pm

    Excellent advice! Unfortunately, one of my customers had previously specified a whole list of meta tags, all prefixed with the period (.). The appliance requires you to *double* URL encode the field names when constructing your query which further aggravates the problem. In the Dublin Core DC.Publisher example, it becomes DC%252EPublisher.

  2. Comment by Paul — May 6, 2006 @ 2:23 pm

    Thanks Dave, that’s very useful to know. I couldn’t find a way of URL encoding them at all, I obviously wasn’t persistant enough!

  3. Comment by Nate Baxley — May 19, 2006 @ 8:35 pm

    Paul,
    Since you mention using the meta tags with filtering, I’m wondering if you have found a way to link meta tags to non-HTML pages. I have a situation where I’ve loaded documents into a content management system and would like to have the GM search the content of those documents, but still be able to limit the search results based on the meta informaiton that is collected when the documents where uploaded. Any ideas?

    Thanks,
    Nate Baxley

  4. Comment by Eric — May 2, 2011 @ 11:51 pm

    Paul,

    was wondering, do you know if you can specify an inmeta search… something like so:

    q=conservation+inmeta:organization=foo

    and also specify that all pages that are missing the organization meta tag be included in the search?

    Thanks

Leave a comment

Sorry, the comment form is closed at this time.