Tag formats: Can’t we all just get along? Matt 05 Dec 2005

58 comments Latest by Richard Smallwood

First, there was tagging with spaces between keywords and all seemed good:

deli_tags.png
Tagging at del.icio.us.

But there are some problems with this method. What about compound phrases? What if I want to tag something “white house” instead of with the separate tags “white” and “house” (which changes the meaning) or “whitehouse”/”white+house” (unintuitive)? Enter comma delimited tagging.

43things_tags.png
Tagging at 43things.

myweb_tags.png
Tagging at Yahoo’s My Web.

And then there’s Flickr’s version which offers tagging that’s space delimited but allows for compound phrases within quotes.

flickr_tags.png
Tagging at Flickr.

Now Amazon joins the fray and adds some Ajax freshness to the process: One tag per textbox with suggested phrases appearing as you start typing. Here’s a look:

amazon_tags.png

amazon_tags2.png
Tagging at Amazon.

When you’ve got a new technology, inconsistency is to be expected. Will all these different formats still be around a year from now or will a standard emerge?

58 comments so far (Jump to latest)

Joshua Blankenship 05 Dec 05

I tried quotes on a non-quote supporting site and it didn’t work and I got annoyed. So I left. Without some standardization, tags are going to become annoying reeeeally quickly.

But then we get the corporate pissing match between sites… whose method gets to be the standard? Because then there are winners and losers, and nobody likes being a loser.

(I vote Flickr should win, but only because i’m completely indoctrinated from using that site so much.)

Darrel 05 Dec 05

Seeing as how comma delimited files have been around forever, that seems to be the logical ‘standard’. If anything, Flickr broke from the established method. But, that said, none of these implementations seem terribly complex to grasp as they’re clearly labeled.

brad 05 Dec 05

I would vote for the quotes, because that’s how you search for specific phrases in Google and I reflexively put quotes around key phrases in other programs/search engines just as a force of habit.

Jan 05 Dec 05

I guess you could just do both. If the user uses quotes, then use the flickr style to parse it. If the user uses commas, then use 43things style to parse the string. How often do you use commas or quotes *inside* a tag anyway?

Dan Boland 05 Dec 05

Ditto what Brad said.

Andie 05 Dec 05

Question is, when do tags lose meaning as tags? Is “ski trip to Chamonix” still a tag, and how would it help others find meaning in my tagged content unless they are specifically looking for this phrase?

When tags grow beyond keywords, can’t we just as well use a normal search engine? Maybe I’m too focused on links and text content like del.icio.us has, for photos there is a need for textual metadata that could take almost any form. But my main question remains: are tags useful when they are no longer tags but whole sentences, and what does that do to search behaviour and usefulness?

Zzypt 05 Dec 05

Smart parsing is the answer. I want quotes to work when I use them; so that I can include commas and spaces. Then commas should separate tags and if there are no commas then use spaces. Ever used a credit card validation that expected dashes or spaces to separate numbers into groups of four, then the next site requires no separation? I wrote a credit card validation routine once that got a higher sales rate when I accepted spaces, dashes or even no spaces. I checked the validation error file regularly and if there were new separation styles being tried, then I added those. When money is at stake you want to accept all users idea of a credit card number.

Jules 05 Dec 05

Proposal:

cameras photographs

(white house)

space means list, (…) means group:

(one (two.one two.two (two.three.one two.three.two)) two)

and there are implicit ( and ) around tags:

one two three = (one two three)

This is better than the-comma-way, because you can’t do:

(one (two.one two.two (two.three.one two.three.two)) two)

with commas.

And all old tags:

camera photograph

are still valid. They aren’t if you use commas:

camerate photograph

would mean:

(camera photograph)

and not the intended:

camera, photograph

Tomas Jogin 05 Dec 05

I like Flickr’s approach; compound phrases are joined together in a similar way in search queries at Google, etc.

GWG 05 Dec 05

It seems to me that the problem lies in the searching of tags, not in the formatting of tag entry. Would it be wrong to accomodate all types of entry?

The flaw in Deliciious-style system is that the software forgets the order of the orginal tagging and simply files each word away somewhere. Storing the full string of tags in the original order could solve this problem.

Of course from a software development POV, that’s a pain, but that’s outside the scope. I’m concerned with the users.

Let’s pretend that a photographer has posted a picture of 1400 Penn. and tagged it “Deli style”:

white house president home washington dc

The Deli system breaks it down as:
- dc
- home
- house
- president
- washington
- white

Because the tags are filed into tiny buckets of individual words, the meaning of “white” and “house” being entered side by side has been lost, and a user search for “white house” is pretty much useless.

However, if we can store the orignal string of tags in order (quotes and commas be damned) a search for “white house” would immediately place an image with the terms “white” and “house” at the top.

If I as a user want to find all images of white houses (a search for: white house), the building at 1400 Penn still qualifies.

Granted, a search for “white house” would find all images that have tags of white and house immediately adjacent in that order. However, as a user, I should not be surprised that my search for “White House” returns more than the US president’s home. In fact, I would be a bit annoyed by technology that assumes so much.

The person posting an image of a regular house painted white could have as easily tagged it as:

house white paint

and been excluded from a search for “White House” but included in a search for white house. Two words that do not rely on each other to derive a specific meaning will be placed in any random order by the tagger (house paint white, white house paint, paint house white etc), but two words that have a specific meaning will always be placed in that order (white house). A smart system would recognize this.

The best system allows for flexibility on the tag entry side and the user search side.

Jim Jeffers 05 Dec 05

Why not just build your tagging system to accomodate spaces, commas, or quotes? I don’t see any method as mutually exclusive just run a regular expression to first check if the user used commas. I think an adaptive solution would be the most elegant allowing the user to intuitively type in tags whichever way they wanted. The AJAX method used on Amazon, though clear, is a lot less elegant than just being able to quickly type in my opinion.

Don Wilson 05 Dec 05

Write a web app that will accomodate everything, just like Google Maps does with addresses.

Katie Dixon 05 Dec 05

I have grown fond of Flickr’s method for tagging. The same method of tagging is used in a plugin for Technorati tags in Wordpress, so I am now accustomed to the single word with spaces or “quoted phrase.”

I do agree with Don Wilson, though. It would be nice if everything supported all methods like Google Maps does with addresses. Of course that would add a level of complexity for the developer.

Regardless, I find the Flickr method both simple and effective.

Great article by the way :)

Rahul 05 Dec 05

Tags aren’t a new technology, they’re a shift from 1:n relationships between content and categories to n:n with folksonomy based names. Fundamental structurally, but not a technology.

As a structural change, it doesn’t matter how you input them (comma delimited, space delimited, compound support, one input field for each, Ajax or not) — as long as the user knows what to do with tags and how to manage them on content items. The rest is pretty irrelevant and/or nitpicky.

patrick h. lauke 05 Dec 05

“When youíve got a new technology, inconsistency is to be expected. Will all these different formats still be around a year from now or will a standard emerge?”

it’s not different formats…it’s different input mechanisms. quite a different matter, i’d posit.

NathanB 05 Dec 05

Tags are nothing… ten million versions of wiki syntax, now THAT’S annoying.

Jacob 05 Dec 05

I vote for commas. Any normal person who’s asked to write a list of things and has a basic understanding of grammar will use commas. One, two, three. It’s instinctive. Using spaces is unnatural, then enforcing quotes is just obscure. How many ordinary people actually put quotes around their search phrases? Isn’t this about empowering users? Who are the users? Amazon’s solution (effectively click delimited, pity there’s no tab/enter detection), it has to be said, leaves no room for confusion.

Dan Boland 05 Dec 05

How many ordinary people actually put quotes around their search phrases?

Me. But I agree with some of the other folks who say that a tagging system should accommodate commas and quotes.

Marcelo Calbucci 05 Dec 05

SEMI-COLON!

People that ask for commas (because CSV has been around for eons), or quotes (because Google use it), or a complex syntax have their merit if they think tags are only going to be used by developers or power-users.

But what about the rest of the world?

Semi-colon is the more natural way to separate tags. It is used on many popular applications, like e-mail, to separate entities. Semi-colon is also very safe because it doesn’t interfere with any natural language.

If you want to tag something with a person’s name, like in “Smith, John” you can’t use commas. There many other examples like this where become is used on language.

Semi-colon, however, it is never used in that sense.

pwb 05 Dec 05

Commas or semi-colons. Quotes are just extremely unnatural. As long as there is an example near the box, I can’t imagine anyone having a great deal of trouble.

Tags: [ ]
Example: president, white house

Richard 05 Dec 05

I like the del.icio.us way best, because it’s the _simplest_ thing that works :)

Everything that’s more complex can be easily entered into the description field.

Jonathan Fenocchi 05 Dec 05

My personal favorite is the space-separated tags, but I find comma-delimited tags the most intuitive.

Tim 05 Dec 05

I like to keep words separate. Bundling words creates two problems. First, to second Andie’s point above, it makes it harder for someone searching on “bird flu” to find things tagged as “avian flu” even though they share the word flu. This is of course assuming the parsing app isn’t smart enough to also include the words separately.

Second, an individual’s own terminology changes over time and what I may have tagged as “social networking” 2 years ago I may be tagging as “social software” now. Do I have to go back and unbundle? It’d be easier to leave them separate and let the search app parse them out.

For this reason I also like to use as many descriptive tags as possible. Rather than using the 2 or 3 best words, I’ll add in as many other appropriate words as possible. Even after years of tagging this way, I don’t have so many tags under a given term that I can’t easily find the relevant results i’m looking for— the signal to noise ratio is still very high. This also aids other users who may use different terminology.

Google is very good at finding relevant results from paragraphs of plain text without the aid of quotes or commas. On the other hand it does benefit from tracking adjacency of words, so GWG’s idea of remembering the order is a good one.

The more cumbersome you make the saving of tags the less likely a user will do it. When searching through tags a user will take much more time to try different search combinations (adding quotes, wildcards, etc) because the rewards are immediate. But with saving tags it’s some vague future reward in which the user isn’t quite sure which method will produce the best results, as they don’t yet know what they’ll be looking for.

Erik Weibust 05 Dec 05

I doubt there will be a standard for tag definition in the near future. See rss, rdf, and atom. They’re still “duking” it out.

ErikWeibust
Erik_Weibust
Erik Weibust
“Erik Weibust”

chris h. 05 Dec 05

talk about your can of worms. the only delimeter should be white-space. all we’re doing is adding unusual keywords to the index.

PJ Hyett 05 Dec 05

The software should figure it out, constraining visitors on how to tag instead of just letting them use it is a mistake.

Tyler 05 Dec 05

Well from a programming stand-point, I can create an array from these keywords very easily if they use commas. If it is the Flickr style, I would have a problem parsing because I would have to split the array at every space NOT within a quote. I think allowing quotes is okay, but it does promote “keyphrases” and not “keywords”.

Darrel 05 Dec 05

So, in summary, accept all methods.

Ben Bleything 05 Dec 05

Heh, I wrote about this a couple of months ago.

The real fun begins when you use an inappropriate tagging method for the site and then the application breaks when you try to delete the malformed tag.

For instance, on a space-separated app, try enclosing a string in quotes and watch in amazement as you get two tags, one that starts with a quote, and one that ends with it. If the app is poorly (or hastily) written, it might break when you try to access those tags with the quotes in them. I’ve seen it happen!

Boris Anthony 05 Dec 05

“Nooo! Standards are evil! Tags are free! Keep your dirty structural standards hands off our beautiful free tags!” ;)

SKOS. If we get the foundations right, the UI will follow. (This is why we see all these differing UIs… )

Web 2.0 is to the Semantic Web what kindergarden is to junior high school: a (difficult) stage (of playful learning) we must pass through… and eventually unlearn again… ;)

Jared White 05 Dec 05

Great discussion. I like the very simple tags with spaces approach of delicious. But I also would like to have tags with spaces. Quotes seem like a decent method, but I also like the idea of brackets []. So a tag entry might look like:

president usa [white house] capital

And the visual presentation of that kind of tag would always be in brackets as well so there’s no confusion.

As for searching, ideally a smart system would sort multi-word tag matches first when someone enters “white house”, but that’s a whole ‘nuther ballgame.

Robert Gremillion 05 Dec 05

Using commas seems more natural to me. The del.icio.us method of cramming words together just looks wrong on the screen.

You could get real fancy and create new tag fields on the fly.

Start off with two fields: Tag1 [ ] Tag2 [ ] as you type into the second tag field, Tag3 is generated…

timb 05 Dec 05

GWG: The flaw in Deliciious-style system is that the software forgets the order of the orginal tagging and simply files each word away somewhere. Storing the full string of tags in the original order could solve this problem.

actually, we do save and display tags in the order people typed them.

this is a good defense of the single word tag limit (but we should fix things so “quoted tags” don’t break).

Robert Gremillion 05 Dec 05

Will del.icio.us ever add a ‘copy all suggested tags’ or ‘copy [username’s] tags’ feature?

Jesse Hattabaugh 05 Dec 05

I’m working on a tag based site, and I grapled with this issue. My main concern was that alllowing spaces, commas, and other less-significant characters like underscores and dashes in the tags only caused tag junk. For example, when I took a vacation to St Louis last year and uploaded the pictures to Flickr, I had to tag them all; “Saint Louis” “St. Louis” “St Louis” SaintLouis St.Louis StLouis. I could have gone on to use underscores and dashes in place of the spaces, and what if they were case sensitive (are they case sensitive?). And yet all of those tags mean the same thing.

So my conclusion was to only allow lowercase letters and numbers, and no spaces. If Flickr enforced that policy there would only be two tags I could possibly need to tag my vacation photos; saintlouis and stlouis.

In addition to making the tags in my system more useful because of less junk, I hope that imposing this system will cause people to use shorter tags as well. I wouldn’t want to tag a photo “mytriptosaintlouisin2004”, and I shouldn’t because who else would use such a tag? The more things that share a tag, the more useful the tag is in my opinion.

Sherwin Techico 05 Dec 05

Seems that its a split between loyal users of respective apps. But as a developer, I do agree that CSV would probably be best as Tyler said above. Meanwhile, being consumed w/ flickritis ain’t good at all if this is the point that you are trying to go for. So like a handful of you, I also second the quotes+spaces combo of tagging as in flickr/google.

Ryan 05 Dec 05

I think commas are most intuitive. Your average type person is probably most used to commas as a delimeter. For example, seperating email address in their mail clients since 1995 kind of thing.

When are you going to need to include a comma in a tag? Commas are easier to explain than ‘wrap your compound words up in double quotes’ IMHO.

So with commas you can say to Aunt Marie when you write her Tag-Based CMS: “You seperate your tags with commas, just like you do in email when you want to send an email to Uncle Bob and Nephew Johnny at the same time.”

Tyler 05 Dec 05

Well I think each of us has found a flaw in each of the listed tagging methods. I work for a couple of newspapers, and we don’t use tagging so much anymore (we’ve been doing it for about 10 years now), we’ve been moving everything over to a taxonomy system. That (from our angle) has been way more effective. But without a standard taxomony tree, I can understand it would be hard to incorporate.

I still think that we need to seperate ourselves from “keywords” and “keyphrases”. The flicker way really is about keyphrases, which makes a programmers life hell.

Perhaps some of you Web 2.0 folks can come up with a way to use the auto-suggest Ajax amazon method on a single textbox instead of doing keyword by keyword as they do now. Then, when somebody starts to type a keyword, suggested words come up. When they hit the comma, the suggestions go away until the next word is typed. Possible?

victor 06 Dec 05

commas are faster than quotes.

as i see it (in my own experience) tags can be annoying if you don’t really care about them when you have to enter them. usually you care about them later on, when you cannot find what you’re looking for. but they’re still a(nother) time-consuming task.

i’d use fast, thus i’d use commas.

Mike 06 Dec 05

I would vote for comma separated tags. That appears to work very well for users on www.blinklist.com. I like the comment about using semi-colon instead since that would interfere even less in foreign languages that make heavy use of the comma.

Noss 06 Dec 05

How is it an issue if I mark up my pictures of a house music festival outside the white house with

house music white house

just allow me to search for all pictures that intersect having the tags ‘house’ and ‘music’.

The only problem i can see is if someone wants a picture of the white house, but not see all the house music pictures. The obvious search: white house -music would exclude my pic. But hey.

Richard Rodger 06 Dec 05

Oh, I wouldn’t be *at all* biased or anything, but why don’t we just use the good old CSV format as suggested above and be done with it:
http://www.rfc-editor.org/rfc/rfc4180.txt

Or, if you really really want to be user-friendly, add adjacency tracking. It’s always the same: the implementation gets harder as the usage gets easier.

notany 06 Dec 05

The right answer is to use fast-company, get-real etc. for tag names. Just like in written text we combine words into one tag.

Jon 06 Dec 05

What’s wrong with tagging it white and house? searching for white house will produce this item, no? I’m not sure I understand what’s wrong with this system.

Bill Brown 06 Dec 05

On a new blog product I’m developing, we first check for the presence of semicolons. If there are none, then we check for commas. If there are none of either, then we assume that it’s one big category. It’s not perfect, but we grappled with this very problem over many variations.

Semicolons let you use commas and spaces if you so desire.

Ken 06 Dec 05

Well, since we’re on the subject of blogs, I’d LOVE for someone with skills to make a Firefox extension that allowed users to select from a local/cookie-based list of frequent tags. Perhaps as a right-click option within the BlogThis popup.

Just sayin’.

zhenggz 07 Dec 05

If we are tagging an HTML page, why not obtain the meta info of keywords from the HTML page and recommend these words as tags by listing them under the input box. And the listed keywords can be accessed with javascript programs.

Smackfu 07 Dec 05

LiveJournal uses commas. Everytime I put in multiple tags, I do it wrong, and end up with tags like “mp3 music”. I’m not sure why they felt the need to be different — they were way behind the curve on tags.

Matt 08 Dec 05

Another vote for commas. A note to users to “separate tags with commas” (cf. Urbantic) is clear, concise, and allows for all the flexibility most users need.

But I’m not sure if I agree with the idea that there should be a one-size-fits-all approach, either. With e-mail, it makes sense to use semicolons as delimiters, because names include commas. With, say, events, tags with commas are less likely to be necessary.

Patrick H. Lauke 09 Dec 05

i’m amazed that the discussion is still going on, and looking at some of the comments it appears that people still didn’t get that this is a data entry issue, and that no matter how the tags are originally entered, once the system processes them it makes absolutely no difference whether you entered them with a space, comma, colon, whatever…
it’s not about incompatible formats, but simply different ways to enter information into different systems. from the end users’ perspective it’s irrelevant once the system accepted the data and broke the string down into individual tags.

Matt 09 Dec 05

It seems to me that the confusion of slightly different tags which mean the same thing is the largest problem, not the input format.
If, while getting at tagged info I could build relationships between tags on the fly (while it was important to me), that would bypass some of the problem of having to enter a zillion equal yet different tags for something. maybe somehow this could be used to reduce repeat tags to only the few that were most used for a certain subject…

bo 09 Dec 05

tags = keywords .. (‘member that ‘meta’ tag before smarter indexing?) the photography industry has been using this for EONS .. i’d wish ‘tags’ and ‘ajax’ and ‘{insert buzz word here}’ would realize they are just another noun (be it a shorter noun) then something that has really exsited for a _very_ long time already .. picture Photoshop .. you enter keywords with a comma, due to the ‘space station’ problem .. spaces are evil for such activity

Ben 10 Dec 05

What about a client side solution? A firefox extension that knows about popular tagging implementations and formats what you write into the proper format for the specific site. Then the little guys could use a standard and be instantly supported. All in all everybody could have their own “tag engine” through presets and a good extension preference panel. Maybe then I could type WD instead of webdesign.

FataL 11 Dec 05

Computer now smart enough to parse them all:
south asia, africa = [south asia] [africa]
“south asia” africa = [south asia] [africa]
‘south asia’ africa = [south asia] [africa]
(south asia) africa = [south asia] [africa]
south asia - africa = [south asia] [africa]
It’s not so hard to program all this I believe.

James E. Lee 20 Mar 06

I’m glad to hear that others see this as something that needs to be fixed!

I wrote about the problem here:
http://jameselee.alwaysaskwhy.com/blog/2006/02/standard_tag_delimiter.html

James Hicks 20 Mar 06

“Perhaps some of you Web 2.0 folks can come up with a way to use the auto-suggest Ajax amazon method on a single textbox instead of doing keyword by keyword as they do now. Then, when somebody starts to type a keyword, suggested words come up. When they hit the comma, the suggestions go away until the next word is typed. Possible?”

That is the solution I implemented for my tag based message board I’m currently working on
http://www.flickr.com/photos/45078456@N00/115394214/

note 27 Mar 06

“Proposal:

cameras photographs

(white house)

space means list, (Ö) means group:

(one (two.one two.two (two.three.one two.three.two)) two)

and there are implicit ( and ) around tags:

one two three = (one two three)

This is better than the-comma-way, because you canít do:

(one (two.one two.two (two.three.one two.three.two)) two)

with commas.

And all old tags:

camera photograph

are still valid. They arenít if you use commas:

camerate photograph

would mean:

(camera photograph)

and not the intended:

camera, photograph”

thanks, jules, just learn more.

Richard Smallwood 09 Jul 06

I would be interested to know if you have looked at Tagging for Basecamp. We have started to look at it for the next release of Hot Project, but am yet to be convinced that our users will make avantage of the approach.

Post a comment

(Basic HTML is allowed)

NOTE: We'd rather not moderate, but off-topic, blatantly inflammatory, or otherwise inappropriate or vapid comments may be removed. Repeat offenders will be banned from commenting. Let's add value. Thank you.