Log All The Things!

Recently on Hacker News, a new product called Heap was linked and quickly rose to the front page. Heap is an analytics tool that captures everything that happens on your web site. I mean everything: clicks, submissions, even perhaps mouse movements, and it’s all dumped to a database that you can analyze later for marketing etc…

It’s a UI firehose

I’m not knocking the product since they’ve obviously spent a lot of time and energy and I can respect that. I hope they’re successful. But anyone using things like these have to keep one thing in mind…

Data is like a plucked flower; it starts to die and go stale the moment it’s captured. How soon it goes stale depends on the type of data, but you then need to make sure it’s shifted (another firehose) to analytics ASAP. It needs to be looked at for actual usable information and interpreted somehow to monetize, if that’s your intention, or feed further research.

By simply capturing everything, you’re effectively creating a very large feed of data that you may or may not use. You’re facing a similar conundrum as the U.S. intelligence agencies who (we’re told) collect and log virtually every cell transmission and, allegedly, every email passing through computers within and, also allegedly, outside our borders.

Who said or did what, and more importantly, why? Are they really interested? Curious? Can be driven to become interested? Marketed to? Can they get their friends, family and colleagues interested? Can they give me money? These are more important questions to answer first before you go about Logging All The Things. I have a criticism about this further below.

Which brings us to those users who don’t fall under the umbrella of “marketable” for technical reasons.

The problem of backward compatibility

Is it just me or IE8 is just not considered by web startups anymore ? Sorry to deviate a little but every time I try to look at a “Show HN” in IE8 (work computer), it fails for about 85% of the time. I could understand that some startups heavily depend on latest browsers but what about others ? – codegeek

No it’s not your imagination.

The web doesn’t all flock to Firefox or Chrome or Webkit or, the soon to be former independent, Opera. By limiting themselves to a relatively smaller subset of web users, Heap and similar products have effectively decided not to bother with the rest. To be fair to them: they’re probably not wrong.

The vast majority of analytics, web apps, shiny new HTML5 things are all geared toward the tech-savvy crowd who are connected to everything all the time and, for better or for worse, are geared toward the browser as an OS substitute. Those interested in these things in the first place will likely be running a browser capable of those very things, but this now takes away the potential consumption of the web.

The web is a product delivery network

It used to be a means of just communication, then trade, then sharing and now it’s all about content delivery. “Content” as in data is now a product itself so along with your shoes, fishing rods/reels, computers etc… you now have data being sold as well.

To sell data, you must first gather it, and then you bring yourself back full circle to the aforementioned stagnation problem. Captured data is useless if it cannot be applied in a meaningful way and it becomes harder to apply to anything useful if you have too much of it.

As DevOps Borat put so eloquently :

I don’t want to be marketed to

This isn’t really a secret, but I don’t think I’ve mentioned it here before. When generally I browse for leisure (which I sadly don’t get to do as often these days) and it doesn’t involve me watching YouTube videos or playing a game or some form of interaction, I browse with JavaScript and all plugins disabled in Firefox.

I realise this involves killing advertising on sites I enjoy and I don’t completely feel comfortable doing it, but the alternative is more intrusive and objectionable to me and I don’t think I’m alone in this.

As the old addage goes: If you’re using a service for free, you’re the product.

This holds true of YouTube, Facebook, Google+, Gmail and yes, even WordPress. Advertising and value added features are how these services stay afloat and I can appreciate that. I also hope that they can appreciate the sheer volume of crap constantly targeted at me through the use of cookies, JavaScript and of course my IP (I’m sure) no matter where I go and what I do.

I’m old fasioned

I remember a time when web pages were hastily constructed bits of content consisting of tables, poorly contrasting background images, tags and barely functioning CSS that broke in any browser other than IE5 or Netscape. It looks like we may be returning to these bad old days with newer technology.

Governments and universities controlled most of the internet connectivity — for better or for worse — and the few companies that did let you build a site for free on the newly emerging “web” were Tripod, Geocities, AOL Hompages et al… and they too made sure there was ample advertising (or value added as in the case of AOL).

But you know what?

Aside from the odd virus or two, since those are also ubiquitus (and antivirus was/is Snake Oil), the blistering popup storm that can be managed if you knew how to tweak Netscape or installed the latest popup blocker, It was still managable.

I could actually consume the web without being consumed

I didn’t mind the ads that sold me dates to college students, mostly cause I was still in junior high, but also cause they didn’t know which site I visited before or where I was looking or what I liked to buy (eBay started in 1995 and I thought it was the best idea since sliced bread).

But these days suddenly web sites that have nothing to do with what I was looking at before, know which ads to show me.

I’m visiting a site on chemistry books that’s showing me ads to fishing reels. How did they know I was looking for fishing reels before?

I’m visitng a site on telescopes and they’re showing me ads on test tubes and beakers.

I’m visiting a site on printing and homemade paper and there’s an ad on star tracking scopes and GPS.

What is this madness?

Now as for the folks at Heap who may be getting, an undeserved, flogging from me for contributing to this tracking malarkey, I apologize for coming off as somewhat irascible. It’s not your fault since you’re only contributing to the demand.

What worries me is that there is demand.

A proposal for a scientific Datestamp

We’re all accustomed to the BCE or CE suffixes on scientific publications. All of which invariably use the Gregorian Calendar. I believe this to be inherently intellectually dishonest.

Let’s face it, those people know exactly what “Common Era” entails. Either use the BC and AD suffix or don’t use the Gregorian Calendar at all. I have no objections against it, just the notion that we’re stepping on egg shells when we all know what we’re doing by using said calendar.

What’s the alternative?

There are definite benefits to seperating Months, Years, and Days, but science, more often than not, isn’t about minor conveniences. Science is about precision and repeatability. Hence the often used scientific notation of large numbers.

E.G. 2.5 x 105. It’s probably easier to just say “Two Hundred and Fifty Thousand”, but that’s an awful lot more characters. Besides, not everyone in the scientific world uses English as their first language. The scientific notation is immediately recognizable, understood and, more importantly, precise in any language.

Let’s do the same for dates and times. Days, Months, Years and even the exact time down to the second can be expressed in one concise datestamp.

E.G. While writing this sentence, the exact date is 21333.19.59.2.

This datestamp is very easy to caclulate. The number in front before the first period is the number of days since January 1st 1950. The second number is 7 by the 24 hour clock. The third is 59 minutes and the last is 2 seconds. Therefore, tomorrow will be 21334 and yesterday was 21332.

Fans of StarTrek would notice a similarity to Stardates used in that fictional universe. Well the premise of that timeline and date is sound, however its execution was less than… um… stellar.

The year 1950 was chosen by scientists as an arbritary measure to reflect timescales in reference to what is considered “Present”. I.E. Any event before 1950 is “Before Present” or “BP”. I think it’s a good idea.

If I were to write this date out in English, it would be May 29th, 2008 7:59:02PM. However this is only “standard” in the United States. Elsewhere the date format is the day first, month second and year last. Some other locals may have the year first month second and day last. By always having the days in front and only in numbers, we avoid all this confusion and it is far more precise. You don’t even have to remember AM or PM since it’s all on the 24 hour clock.

Calculating the days between events also becomes easier as it’s just a matter of subtracting or adding numbers. Much like the scientific notation. What’s more, you can even express the time to the last second if necessary.

All this in a blurb less than 15 characters long (including the periods and assuming two digit seconds) for today’s date. You could even memorize the whole thing down to the last second as it’s no longer than a long distance phone number.

Though the inital reference point for calculating Before Present was based on the Gregorian Calendar, it still defines an arbritary starting point. And hence forth it’s on its own reference point.

This is a constructed international datestamp using an existing reference in much the same way Esperanto and Interlingua are constructed international languages using existing reference languages.

To find out the exact date right now, I wrote a little script to help things go forward. It can also calculate a future or past date. It defaults to today’s date and time based on your computer’s time settings (make sure it’s accurate). Try putting in a value before 1950 and you get a negative date value. No more “Before” or “After” nonsense. It’s all purely numeric.