I need to get a new stats program that accurately reports successful downloads of binaries.
The default webalizer is fine for pages and images, but because of byte serving of large files it's an almost impossible task to determine from the logs exactly how many times a large PDF was downloaded. For a given user-session-IP, one can see as many as 20-30 GET requests for the same document. one could write (and I have, but I don't trust it at all) an analyzer to aggregate those and correlate them to a 1 successful download. But then we also have the well known issue of browsers hanging where someone is still silly enough to use the PDF web plug-in (installed by default in most IE defaults) which tries to open a single PDF Acrobat thru a Web browser window. Typically, this has been known to fail in cases where the PDF is over 5MB in size. But, who knows if the browser does not continue to issue GET requests for the whole document, even as the user experience is that his system hung? I am unable to see from the logs any cases of failed downloads, which is "insane" because there has to be at least 1 or 2 such cases.
We typically serve large PDFs, running from 18 MG to 48 MB in size. I have some desktop client agents that download the file in one go and I can see these in the webalizer logs as a single GET request. But all browser requests are byte served in parts. XOOPS framework has the number of downloads, but these only register where the user actually gets the PDF through that GUI. But the PDF's are linking in many other locations, pages and other sites such that a large number of successful hits will not appear in the XOOPS downloads count. Google Urchin, does not do binaries... So, bottom line: we are completely in the dark when someone asks "How many successful downloads of MyIncredibleDocISpentMonthsMaking.pdf were there this month?" and soon we will be also deliver free.epub (=.zip) files and I expect to have the same issue (I don't know if .zip are served in parts or not)
Are there an stats packages/solutions for this already available that we an install?
Any insights will be much appreciated.
Thanks!
Per haps you can use the extra section in AwStats?
http://awstats.sourceforge.net/docs/awstats_extra.html
I also saw this in a forum
http://forums.webtrends.com/eve/forums/a/tpc/f/666103057/m/821105057
Ronald, thanks for the links:
The second link, a java script "ping counter" does not cut it... what if someone downloads the PDF through, for example, a link in an HTML email? The java script will not be touched. PDF aggregators who build web site using "OPC" Other People's Content will have links to our PDF's on their pages... again, the JS will not be triggered as they are hot linking to the file on our server (OK by us in this context). The GET request will appear on the logs, but not touch any top GUI framework of any kind on our own server.
Awstats extras appears interesting, from what I read I don't see an obvious way to analyze X number get requests as comprising the download of a single PDF. Or possibly simply over my head...For all the code required to make that extras thing work I would write my own analyzer in RevTalk. But, I'm looking for an already "professional" method, as I find it hard to believe I need to reinvent this wheel.
I read in the Adobe forum it is not possible to track a pdf file.
http://forums.adobe.com/thread/614522
OK, I'm taking this over to my Experts' Exchange account and see what we get from the wizards there....