giganews blog

Corporate culture, personal experiences, and unique observations about Giganews, Usenet, Newsgroups, and Usenet related technologies.

Thursday, February 01, 2007

Accurately Measuring Usenet Retention

newsgroups, usenet, retention
Accurately Measuring Usenet Retention
Notes explaining Usenet retention statistics
As you may have seen, Giganews recently announced a storage upgrade which will raise our binary retention to 100 days over the next two weeks. This got me thinking about how retention is measured and reported by various Usenet servers.

Articles on a news server are commonly stored "first in / last out". What this means is that every time a new article is posted to a Usenet system the oldest article is deleted. The oldest available article on a news server is generally what defines a news server's retention.

Some Usenet systems will also apply this "first in / last out" rule based on hierarchy.

For example, Giganews does not expire any text articles so our text retention is 1300+ days. Our binary retention (based on available storage) is 100 days. This means that it takes 100 days for a newsgroup article to drop off of our servers in the binary hierarchies.

When you're discussing a news server's retention make sure you understand exactly which hierarchy you're referencing. If you see people refer to a news server's retention based on text hierarchies then chances are they're embellishing to make the news server seem better. In reality their retention in the more challenging binary hierarchies is probably much lower.

In addition to people using text retention to embellish the quality of a news server, you'll also see some Usenet systems carry long retention rates in just a handful of newsgroups. If we use our simple definition of retention— "the oldest available article on a news server"— then this would be an accurate description of that news server's retention. Of course most people aren't going to want long retention on just a handful of newsgroups, so you could consider this misleading. Many people sign up for Giganews after using other Usenet servers which advertise long retention rates but provide those retention rates in just a couple of newsgroups.

The final thing to look out for when trying to measure retention is "invalid date headers". In some newsgroups the headers of certain articles will contain the wrong date. In the beginning of this post, I said that most news servers apply a "first in / last out" rule to newsgroups and that the oldest article on a news server defines its retention. What I didn't mention is that the "first in / last out rule" is based on article numbers (number assigned to an article based on when it is posted) and not the date displayed in the headers. This means that if an article contains a date in the header older than the retention of the news server it still may appear in the newsgroup because it hasn't been purged based on its article number.

The best measure of a news server's retention is to look at the oldest article date in *many* popular binary newsgroups. This will generally give you the best idea of the news server's retention. If you notice a few groups with longer than normal retention, the news server is either hand picking certain newsgroups to misrepresent their overall retention levels or there is an article with an invalid date header.

Labels: , ,

4 Comments

Anonymous Anonymous said...

Hi there,

I bet and predict that by end of the year 2007, Giganews will offer 180+ days of binary retention :)

That's 6+ months, really cool ^^

best regards,

iNsuRRecTiON

6:20 AM 
Anonymous Anonymous said...

This comment has been removed by a blog administrator.

12:11 PM 
Anonymous Anonymous said...

Many companies are able to claim higher retention rates, (is it really that important to have headers older than 90 days?! If you haven't found what you need in 90 days, odds are you don't really need it!), by restricting the amount of posts made to certain binary groups and by basically censoring the groups by which posts are allowed to be retained at all. Hopefully this isn't happening at Giganews and that the sudden reduction of headers in certain binary groups since the end of January is only coincidental and caused by some other anomalous action...

12:37 PM 
Anonymous Anonymous said...

I don't think there are any upper limit to where retention becomes excessive. If I want to find something I have double the chances if the retention is 180 days, compared to 90 days.

This may not be the correct assumption in all cases; but it seems quite logical if that which I'm trying to find is quite old.

11:39 PM