HTTPD Log Analysis
OVERVIEW: All Hits are Not Created Equal
- hits versus pages visited
- Visitors = slightly useful
- pages visited versus visitors
- Nuances
- People versus Machines
STATISTICS: Statistical approaches
- Time sampling -> People
- Depth sampling -> Interest
- Traffic tracking -> Redesign flow
- Noise
- Caches (ruin all statistics)
ANALYSIS: Trending
- Trends: what is popular when?
- Time tracking: when do peaks and lulls occur?
- Typos and 404s are useful information!
- Overanalyzing bad data (aka "caches ruin everything")
VISUALIZATION: Presenting the information
- Tabular or Graphical
- Trampled underfoot! Too much data is meaningless
Final points-- Access and Installation
- Novice "sys admins"
- Style of report
- Web based, text based, image based, or paper?
- Realtime or batched