PDA

For continued disscussion on this topic : Perl Distinct User Counter



jtown
08-10-2000, 05:04 PM
Does anyone have a Perl counter that doesn't count refreshes??? I want to keep a count for user that a are distinct....I know some people pay money for these types of things, but I dont' even know how to start which was sort of the basis for my previous logging IP post....

Could I just check to see if the IP's are the same and then if they are, not update the hit??....

Pondering.....

James

John Pollock
08-10-2000, 06:33 PM
It should work provided you store the unique IP addresses in a file someplace. Then, you would need to make sure the incoming IP does not match any of the IPs that have been stored before updating the counter. Of course, if this counter will be really busy then you may want to look into other methods.

If your server allows you access to your log files it should have the unique and page view info. You would just need something to analyze the logs and give you a report.

:)

jtown
08-10-2000, 06:40 PM
I do have access to a log file, but I'm not sure what it means and how I could program a counter for it.....here's a sample line...any disection would be wonderful

216.35.116.93 - - [10/Aug/2000:09:52:32 -0400] "GET / HTTP/1.0" 200 327 "-" "Slurp/si (slurp@inktomi.com; http://www.inktomi.com/slurp.html)"

what is this? I'm not sure I grabbed a single line or if I'm missing some data.....thanks....

John Pollock
08-10-2000, 06:53 PM
Unless you really want to go through the trouble of coding it on your own, I'd suggest using a log analysis program already made. There are some here:

http://cgi.resourceindex.com/Programs_and_Scripts/Perl/Logging_Accesses_and_Statistics/

If you do want to do it on your own though, looking at the codes in the programs listed there will probably help you figure out what the log file line means. :)

jtown
08-10-2000, 06:54 PM
ok.....thanks john...if I have more ques, I'll post here...

thanks

james

big_wreck
08-10-2000, 07:06 PM
You could read the log file and first parse out the IP address and the time it hit the page, then search the log file back ~24 hours top see if you have a match - incrementing the counter only if you see no IP match in that time frame.

Let me know if you need more

jtown
08-10-2000, 07:09 PM
a line in that log file is created when????


everytime what....???

someone loads?

big_wreck
08-10-2000, 07:41 PM
a line in that log file is created when????

... when a browser requests a page from the server


OK,

I will assume that the logfile you are referring to and posted a line from is an access_log file on your webserver, yes?

You can modify your perl script to read the logfile and look for a certain string format.



216.35.116.93 - - [10/Aug/2000:09:52:32 -0400] "GET / HTTP/1.0" 200 327 "-" "Slurp/si (slurp@inktomi.com; http://www.inktomi.com/slurp.html)"


The first number is the IP address of the machine accessing the servers resources, next is the datetime stamp, and then the command the browser is sending, and then the page.

So you could look for a square brace [ and say 'the next 11 characters after it are the date', and that previous to that is the IP address. Once these are massaged properly you can store their values into variables for comparison.

I never said it was easy but it is possible.

If your not experienced with Perl as a whole I suggest:
http://www.pageresource.com/cgirec/
http://www.cgi101.com
http://agora.leeds.ac.uk/Perl/basic.html

After reading up you'll understand more of what Im saying.

jtown
08-10-2000, 08:58 PM
thanks big_wreck

I'm very familiar with perl....I'm programming with it for my summer job....I just didn't/don't know what to do with the info....

what's the part that looks like an email address??...how bout the url??..is that the referring url?

thanks
james

big_wreck
08-10-2000, 09:09 PM
Sorry jtown didnt mean to come across as condescending or anything ...you probably know more about Perl than I do if its all you do ...

As far as the log entries, I am familiar with the apache access_log that's created on one of my servers.

I actually havent seen an entry with an e-mail address before...if you have access to the same (and I stress same) log file then it shows the originating IP address, date,time,command,and the URL its trying to access.

If that address / email is unfamiliar to you then we're talking about different logs which changes things drastically.

What platform is your server?

jtown
08-10-2000, 09:15 PM
Originally posted by big_wreck
Sorry jtown didnt mean to come across as condescending or anything ...you probably know more about Perl than I do if its all you do ...


Don't worry about it....Perl is just something I learned by myself and just happened to pick up a summer job with it =)

I think the server is a cobalt server if that's even correct.....I think linx cobalt.....I could be totally wrong and sound like an idiot right now, but oh well...mistakes are the best way to learn so make me look dumb.....

haha

james

big_wreck
08-10-2000, 09:20 PM
Startin' to get away from familiar territory now ....

Why dont you do a test where you hit your page from computers with known IP addresses and then look at the logs .. at least you'll have something better to work with ..


Anyone else out there with any enlightening suggestions ??????

jtown
08-10-2000, 09:23 PM
yeah sounds good to me....just see what happens when I view it myself....

hopefully someone out there will know more than us about this....

I suppose the one hit per day thing could work....or one hit per hour or something.....some algorithm designed by me.....

thanks for all your help wreck

james

big_wreck
08-15-2000, 11:30 PM
Did you ever figure out that log file ??

jtown
08-16-2000, 01:13 AM
still workin on it....I guess I dont' really need all the info in the log file....I just need something to acurately count hits.....what's the algorithm for all those hit counter programs out there???.....there isn't a "hit" standard?

james

gzazJim
08-16-2000, 02:20 AM
jtown,

Heaven forbid there was a standard fo hit-counters - then we'd all begin to realize that "Huge" sites weren't REALLY getting as many hits as they thought they were! ;)

Seriously though, hit counters, while being nice, really aren't a terribly important piece of equipment. That being said, I have a script at work (not here with me at the moment) that uses a cookie to track individual visite, so perhaps by combining this script with a bit of CGI wizardy, you'd be able to come up with something more to your liking.

After all, if all you record are IP Addresses, all most people with a decent ISP would have to do was to log off, log back in with a different dial-up and they'd be assigned a new dynamic IP address. I know someone who drove their hit counter thru the roof to get an advertiser interested by having friends log on and off his website with different IP addresses on a weekend.

Anyway, if you're interested, I'll try to rustle up the script I mentioned. Just remember "HITS" all-too-often stands for "How Imbeciles Tally Success" :)

Thanks and good luck!

Jim

jtown
08-16-2000, 03:59 AM
ya sure....I'd definitely like to see the script you have...I haven't read up on cookies yet so maybe I can learn by looking at the code.....thanks

if you wanna email it to me, my mail is jhf7@cornell.edu

thanks

james

ps I guess hits aren't really that important....I was just curious about what was going on with mys tie.....thanks

gzazJim
08-16-2000, 05:18 PM
Howdy jtown,

Here's a script I'd like you to try out before I get into the long convoluted one (that I'm tweaking for your purposes) that I have. Basically what the script does is print all the environment variables available to you. The one you'll most want to note is (I believe) REMOTE_ADDR - which SHOULD give the IP address that's being used.



#!/usr/local/bin/perl


print "Content-type: text/html\n\n";
print "<HTML><HEAD><TITLE>Environment Variables</TITLE></HEAD><BODY>";

foreach $env_var (keys %ENV) {
print "<BR><FONT COLOR=RED>$env_var</FONT> is set to <FONT COLOR=BLUE>$ENV($env_var)</FONT>";
}

print "</BODY></HTML>";


Basically the "foreach" loop will go through the ENV table (or hash) - which is where the environment variables are stored.

Try is out and let me know if there are any surprises!

Good luck,

Jim

big_wreck
08-16-2000, 05:36 PM
Good day,

A slight mod to Jim's script and you can generate a tab-delimited flat file for later import into a spreadsheet / database





#!/usr/local/bin/perl
$logfile = "some/path/to /file";
open(LOGFILE,">>$logfile");
foreach $env_var (keys %ENV) {
print LOGFILE "$ENV($env_var)\t";
}





Just watch because the way this simple script is set up it will simply grow continuously ...

Any other thoughts ??

gzazJim
08-16-2000, 05:44 PM
BW,

I should have thought of that - DOH!! Perhaps we should send Wraiden over this way as well, the above seems to have answered his/her question as well...

Thanks!

Jim

jtown
08-16-2000, 06:36 PM
Hey guys, I appreciate all the help...

I have access to my log file so I don't think I need to create a new one unless they're some info in the Environment variables that's not in the log file.

The CGI/Perl isn't really the problem, I just need to know a decent algorithm to count hits.

What DO all these sites with counters do??...I'm beginning to think hit counters are a bunch of BS now cause it's how the programmer decides to count hits.

The cookie idea was interesting. Broadly, how would that work? Otherwise I guess it's and IP/time algorithm.

thanks
james

gzazJim
08-16-2000, 06:57 PM
Hi jtown,

In a nutshell (no reference to the O'Reilly books there... :)), the cookie thing can be used to show a user how many times they've vivited a certain page, and display new information thath they haven't seen before to them based on it. This is very similar to how John has set up this very board (the "new posts" indicator, etc...).

It should be fairly simple to then hack thru an algorithim that can read the cookie, set the user specific information on the page and then write (or not write) data to the counter based on whether or not the user has visited before.

Now I just had a thought - You'll want to set that cookie to expire after maybe a week or a month - especially if the information is updated often. And then that raises another question - has the visitor REALLY visited the NEW page before? It gets to be quite a quandry!

Anyway, enough babbling... The code for creating a cookie is available at:

http://www.webxpertz.net/forums/showthread.php?threadid=521

Have a look at that, get it working, and then we'll tackle the rest of it!

Good luck,

Jim

jtown
08-16-2000, 07:14 PM
Ok....I'll take a look at this cookie idea...

C is for COOKIE.

hmm..so it still comes down the the "what is a count?" question. Delete the cookie after a week??....a day???..and hour??.......hmmm....oh well...all these counters that people are paying for are really dumb...what a scam. =)

James

thanks again.

gzazJim
08-16-2000, 07:34 PM
James,

It all depends what they want to count! :) Some people really get intricate with the counters, which is what I think you want - which really makes sense. Others just put a hit counter on their page to "show off" how many (or how few...) hits they are getting. As far as it being a scam - heck, that's what I call GREAT marketing! If I thought could sell something that simple, I'd do it in a second!

Try out the script, and then we'll really dig in and try to get things working the way you want.

Good luck,

Jim

big_wreck
08-16-2000, 08:48 PM
Jtown,

I guess it boils down to what you want to keep stats on ... your correct there is no decent bottom-line method for page hits .... remember its you that will use this data to modify your site ...