giganews blog

Corporate culture, personal experiences, and unique observations about Giganews, Usenet, Newsgroups, and Usenet related technologies.

Thursday, February 01, 2007

Accurately Measuring Usenet Retention

newsgroups, usenet, retention
Accurately Measuring Usenet Retention
Notes explaining Usenet retention statistics
As you may have seen, Giganews recently announced a storage upgrade which will raise our binary retention to 100 days over the next two weeks. This got me thinking about how retention is measured and reported by various Usenet servers.

Articles on a news server are commonly stored "first in / last out". What this means is that every time a new article is posted to a Usenet system the oldest article is deleted. The oldest available article on a news server is generally what defines a news server's retention.

Some Usenet systems will also apply this "first in / last out" rule based on hierarchy.

For example, Giganews does not expire any text articles so our text retention is 1300+ days. Our binary retention (based on available storage) is 100 days. This means that it takes 100 days for a newsgroup article to drop off of our servers in the binary hierarchies.

When you're discussing a news server's retention make sure you understand exactly which hierarchy you're referencing. If you see people refer to a news server's retention based on text hierarchies then chances are they're embellishing to make the news server seem better. In reality their retention in the more challenging binary hierarchies is probably much lower.

In addition to people using text retention to embellish the quality of a news server, you'll also see some Usenet systems carry long retention rates in just a handful of newsgroups. If we use our simple definition of retention— "the oldest available article on a news server"— then this would be an accurate description of that news server's retention. Of course most people aren't going to want long retention on just a handful of newsgroups, so you could consider this misleading. Many people sign up for Giganews after using other Usenet servers which advertise long retention rates but provide those retention rates in just a couple of newsgroups.

The final thing to look out for when trying to measure retention is "invalid date headers". In some newsgroups the headers of certain articles will contain the wrong date. In the beginning of this post, I said that most news servers apply a "first in / last out" rule to newsgroups and that the oldest article on a news server defines its retention. What I didn't mention is that the "first in / last out rule" is based on article numbers (number assigned to an article based on when it is posted) and not the date displayed in the headers. This means that if an article contains a date in the header older than the retention of the news server it still may appear in the newsgroup because it hasn't been purged based on its article number.

The best measure of a news server's retention is to look at the oldest article date in *many* popular binary newsgroups. This will generally give you the best idea of the news server's retention. If you notice a few groups with longer than normal retention, the news server is either hand picking certain newsgroups to misrepresent their overall retention levels or there is an article with an invalid date header.

Labels: , ,