My JavaScript book is out! Don't miss the opportunity to upgrade your beginner or average dev skills.

Tuesday, February 23, 2010

JavaScript Override Patterns

Once we have understood JavaScript Overload Patterns, a good start point to write efficient base classes, it comes natural to wonder about How To Override.
First of all, please let me quote one of my favorite sentences from Mr D.

I have been writing JavaScript for 8 years now, and I have never once found need to use an uber function. The super idea is fairly important in the classical pattern, but it appears to be unnecessary in the prototypal and functional patterns. I now see my early attempts to support the classical model in JavaScript as a mistake.

Douglas Crockford, on Classical Inheritance in JavaScript


Override

In classical OOP, override means that a sbuclass can declare a method already inherited by its super class, making that method, the only one directly callable for each instance of that subclass.

<?php
class A {
function itsAme() {
$this->me = 'WebReflection';
}
}

class B extends A {
function itsAme() {
// B instances can access only
// this method, which could access
// internally to the parent one
parent::itsAme();
echo $this->me;
}
}

$a = new A;
$a->itsAme(); // nothing heppens

$b = new B;
$b->itsAme(); // WebReflection
?>


Override In JavaScript

Since there is not a native extends, we could say that in JavaScript an override is everything able to shadow an inherited property or method.

var o = {}; // new Object;

o.toString(); // [object Object]

o.toString = function () {
return "override: " +
// call the parent method
Object.prototype.toString.call(this)
;
};


Moreover, while in classic OOP override is usually specific for methods, and nothing else, in JavaScript we could even decide that at some point a method is not a method anymore:

// valid assignment
o.toString = "[object Object]";

// NOTE: toStirng is invoked
// every time we try to convert
// implicitly an object
// above code will break if we
// try to alert(o)
// will work if we alert(o.toString)

This introduction shows how much we are free to change rules, reminding us that if these rules are there and we would like to emulate classic inheritance patterns, maybe it's a good idea to deeply analyze if makes sense to change a method into a variable, or vice-versa. Most of the time, this is not what we want, so let's analyze just methods.

Why We Need Override

Specially in a non compilable language as JS is, and generally speaking as good practice for whatever developer, we would like to reuse code we already wrote. If a super method provides basic configuration and we would like to add more, does it make sense to rewrite all super method logic and procedures plus other part we need?

Extends ABC

A must know in JavaScript, is how to inherit a constructor.prototype via another constructor, considering there is not a native way to extends constructors.
In JavaScript, (almost) everything is an object that inherits from other objects.
To create an inheritance chain, all we need is a simple function like this:

var chain = (function () {
// recycled empty callback
// used to avoid constructors execution
// while extending
function __proto__() {}

// chain function
return function ($prototype) {
// associate the object/prototype
// to the __proto__.prototype
__proto__.prototype = $prototype;
// and create a chain
return new __proto__;
};
}());

function A (){}
function B (){}
// create the chain
B.prototype = chain(A.prototype);

new B instanceof A; // true
new B instanceof B; // still true

With this in mind, we can already start to create basic subclasses.

Hard Coded Override

One of the most simple, robust, fast, efficient, explicit, and clean pattern to implement overrides, is the hard coded one.

// base class
function A() {
console.log("A");
}
// enrich native prototype
A.prototype.a = function () {
console.log("A#a");
};
A.prototype.b = function () {
console.log("A#b");
};
A.prototype.c = function () {
console.log("A#c");
}

// subclass, first level
function B() {
// use super constructor
A.call(this);
console.log("B");
}

// create the chain
B.prototype = chain(A.prototype);

// enrich the prototype
B.prototype.a = function () {
// override without recycling
console.log("B#a");
};

// more complex override
B.prototype.b = function () {
// requires two super methods
A.prototype.a.call(this);
A.prototype.b.call(this);
console.log("B#b");
};

// subclass, second level
function C() {
// call the super constructor
// which will automatically call
// its super as well
B.call(this);
console.log("C");
}

// chain the subclass
C.prototype = chain(B.prototype);

// enrich the prototype
// every override will
// recycle the super method
// we don't care what's up there
// we just recycle code and logic
C.prototype.a = function () {
B.prototype.a.call(this);
console.log("C#a");
};
C.prototype.b = function () {
B.prototype.b.call(this);
console.log("C#b");
};
C.prototype.c = function () {
B.prototype.c.call(this);
console.log("C#c");
};

Above example is a simple test case to better understand how the chain works.
To do this, and test all methods, all we need is to create an instanceof the latest class, and invoke a, b, and c methods.

var test = new C;
console.log('-------------------------');
test.a();
console.log('-------------------------');
test.b();
console.log('-------------------------');
test.c();

That's it, if we use Firebug or whatever other browser console, we should read this result:

A
B
C
-------------------------
B#a
C#a
-------------------------
A#a
A#b
B#b
C#b
-------------------------
A#c
C#c

The first block shows how the B constructor invokes automatically the A one, so that the order in the console will be "A", as first executed code against the current instanceof C, "B", as second one, and finally "C".
If A, B, or C, define properties during initialization, this inside those methods, will always point to the instance of C, the "test" variable indeed.
Other logs are to follow logic and order inside other methods.
Please note that while B.prototype.b does not invoke the super method, B.prototype.c does not exist at all so that B.prototype.c, the one invoked by C.prototype.c, will be directly the inherited method: A.prototype.c.
More happens in the B.prototype.b method, where there are two different invocation, A.prototype.a first, and A.prototype.b after.

Pros

With this pattern, it's truly difficult to miss the method that caused troubles, if any. Being this pattern explicit, all we read is exactly what is going on. Method are shareable via mixins, if necessary, and performances are almost the best possible one for each instance method call.

Cons

Bytes speaking, this pattern could bring us to waste lot of bandwidth. Compilers won't be able to optimize or reduce that much the way we access the method, e.g. A.prototype.a, plus we have to write a lot of code and we are not even close to the classical super/parent pattern.
Another side effect, is surely the fact most developer don't even know/understand perfectly the difference between call and apply, or even worst, the concept of injected context, so that this as first argument could cause confusion.
Finally, the constant look up for the constructor, plus its prototype access, plus its method access, could let us think that performances could be somehow improved.

Hard Coded Closure

Specially designed to avoid last Cons, we could think about something able to speed up each execution. In this case, the test is against the B.prototype.b method:

B.prototype.b = (function () {
// closure to cache a, and b, parent access
var $parent_a = A.prototype.a;
var $parent_b = A.prototype.b;
// the method that will be available
// as "b" for each instance
return function () {
// cool, it works!
$parent_a.call(this);
$parent_b.call(this);
console.log("B#b");
};
}());

We still have every other Cons here, we had to write twice and in just two following lines, the A.prototype.methodName boring part.
Considering that properties access, at least the first level, is extremely fast in JavaScript, we could try to maintain performances as much as possible, reducing file size and time to write code:

B.prototype.b = (function () {
// cache just the parent
var $parent = A.prototype;
return function () {
// and use it!
$parent.a.call(this);
$parent.b.call(this);
console.log("B#b");
};
}());

The next natural step to think about, is an outer closure able to persist for the whole prototype so that every method could access at any time to the $parent:

var B = (function () {

// cache once for the whole prototype
var $parent = A.prototype;

function B() {
$parent.constructor.call(this);
console.log("B");
}

// create the chain
B.prototype = chain($parent);

// enrich in this closure the prototype
B.prototype.a = function () {
console.log("B#a");
};

B.prototype.b = function () {
$parent.a.call(this);
$parent.b.call(this);
console.log("B#b");
};

return B;

}());

OK, now we have removed almost every single Cons from this pattern ... but somebody could still argue that call and apply may confuse Junior developers.

Libraries And Frameworks Patterns

I do believe we all agree that frameworks are good when it is possible to do more complex stuff in less and cleaner code and, sometimes, with a better logic.
As example, the base class A could be simply defined like this:

var A = new Class({
a: function () {
console.log("A#a");
},
b: function () {
console.log("A#b");
},
c: function () {
console.log("A#c");
}
});

// please note to avoid a massive post
// I have intentionally skipped the
// constructor part for these cases

Isn't that beautiful? I think it is! Most of the frameworks or libraries we know, somehow implements a similar approach to define an emulated Class and it's prototype. I have written a more complete Class example at the end of this post but please don't rush there and be patience, thanks.

A Common Mistake

When we think about parent/super, we think about a way to access the "inherited stuff", and not something attached to the instance and completely different from classical OOP meaning, the one we are theoretically trying to emulate.
I have already provided a basic example of what I mean with the first piece of code, the one that shows how are things in PHP, and not only (Java, C#, many others).
The parent keyword should be an access point to the whole inherited stack, able to bring the current this reference there.
Unfortunately, 90% of the frameworks out there got it wrong, as I have partially described in one of my precedent posts entitled: The JavaScript _super Bullshit. Please feel free to skip the later lecture since I have done both a mistake against MooTools, still affected with the problem I am going to talk about tho, and I did not probably explain limitations in a proper way.
In any case, this post is about override patterns, so let's see what we could find somewhere in the cloud ;)

Bound $parent

As first point, I have chosen the name $parent to avoid to pollute methods scopes with a variable that could be confused with the original window.parent, while I have not used super since this is both a reserved keyword and not familiar with PHP developers.
This pattern aim is to make things as simple as possible: all we have to do is to invoke when and if necessary the $parent(), directly via the current method.

To be able to test the Class emulator and this pattern, we need to define them. Please note the provided Class is incomplete and not suitable for any production environment, since it has been created simply to support this post and its tests.

function Class(definition) {
// the returned function
function Class() {}
// we would like to extend via
// the extend property, if present
if (definition.extend) {
// attach the prototype
Class.prototype = definition.extend.prototype;
// chain it, recycling the Class itself
Class.prototype = new Class;

// enrich the prototype with other definition properties
for (var key in definition) {
// but only if functions, since
// we would like to add the magic
if (typeof definition[key] === "function") {
Class.prototype[key] = callViaParent(
Class.prototype[key],
definition[key]
);
}
}
} else {
// nothing to extend
// just enrich the prototype
for (var key in definition) {
Class.prototype[key] = definition[key];
}
}
// be sure the constructor is this one
Class.prototype.constructor = Class;
// return the class
return Class;
}
// let's imagine new Class SHOULD BE an instanceof Class
// or remove the new when you create one
Class.prototype = Function.prototype;


// magic callback to add magic, yeah!
function callViaParent(parent, method) {
// create runtime the wrapping method
return function () {
// create runtime the bounded $parent
// note that ONLY $ is local scope
// $parent will defined in the GLOBAL scope
var $ = $parent = function () {
return parent.apply($this, arguments);
};
// trapped reference for the $parent call
var $this = this;
// invoke the current method
// $parent will be the one defined
// few lines before
var result = method.apply(this, arguments);
// since the method could have another $parent call
// we want to be sure that after its execution
// the global $parent will be again the above one
$parent = $;
// return the result
return result;
};
}

We've got all the magic we need to define our classes, ready?

var A = new Class({
a: function () {
console.log("A#a");
},
b: function () {
console.log("A#b");
},
c: function () {
console.log("A#c");
}
});

var B = new Class({
extend: A,
a: function () {
console.log("B#a");
},
b: function () {
// oooops, we have only one
// entry point for the parent!
$parent();
console.log("B#b");
}
});

var C = new Class({
extend: B,
a: function () {
$parent();
console.log("C#a");
},
b: function () {
$parent();
console.log("C#b");
},
c: function () {
$parent();
console.log("C#c");
}
});

The B.prototype.b method cannot emulate what we have tested before via Hard Coded Pattern. The $parent variable can obviously "host" one method to call, the A.prototype.b and nothing else.
Here we can already spot the first limitation about the magic we would like to bring in our daily code. Let's test it in any case:

var test = new C;
console.log('-------------------------');
test.a();
console.log('-------------------------');
test.b();
console.log('-------------------------');
test.c();

The result?

B#a
C#a
-------------------------
A#b
B#b
C#b
-------------------------
A#c
C#c

Cool, at least what we expected, is exactly what happened!

Pros

This pattern is closer to the classic one and simpler to understand. The global reference is something we could ignore if we think how many chars we saved during classes definition.

Cons

Everything is wrong. The parent is a function, not a reference and the real parent is bound for each method call. This is a performances killer, a constant global namespace pollution, and quite illogical, even if pretty.
Finally, we have only one method to call, and zero parent access, since each method will share a runtime created global $parent variable, and it will never be able to recycle any of them since the this reference could be potentially always different for every method invocation.

Slightly Better Alternatives

At least to avoid global scope pollution with a runtime changed $parent, some framework could implement a different strategy: send the $parent as first variable for each method that extends another one. Strategies to speed up this process are different, but one of the most efficient could be:

function callViaParent(parent, method) {
// create once, and sends every time
function $parent($this, arguments) {
return parent.apply($this, arguments);
}
return function () {
// put $parent as first argument
Array.prototype.unshift.call(arguments, $parent);
// invoke the method
return method.apply(this, arguments);
};
}

This could produce something like:

var B = new Class({
extend: A,
a: function ($parent) {
console.log("B#a");
},
b: function ($parent) {
$parent(this);
console.log("B#b");
}
});


Runtime Parent Method

What is the only object that travels around this references? The current instance, isn't it? So what a wonderful place to attach runtime the right parent for the right method?
This pattern seems to be the most adopted one, we still have inconsistencies against the classical OOP parent concept, but somehow it becomes more natural to read or write:

// A is the same we have already

var B = new Class({
extend: A,
a: function () {
console.log("B#a");
},
b: function () {
// here comes the magic
this.$parent();
console.log("B#b");
}
});

var C = new Class({
extend: B,
a: function () {
this.$parent();
console.log("C#a");
},
b: function () {
this.$parent();
console.log("C#b");
},
c: function () {
this.$parent();
console.log("C#c");
}
});

With a runtime attached/switched/twisted property at least we have solved the binding problem. In order to obtain above behavior, we should change just the "magic" callViaParent callback.

function callViaParent(parent, method) {
// cached parent
function $parent() {
return parent.apply(this, arguments);
}
return function () {
// runtime switch
this.$parent = $parent;
// result
var result = method.apply(this, arguments);
// put the $parent as it was
// so if reused later, it's the correct one
this.$parent = $parent;
// return the result
return result;
};
}

Et voila', if we test again the C instance and its methods, we still obtain the expected result via our new elegant, "semantic" way to call a parent:

B#a
C#a
-------------------------
A#b
B#b
C#b
-------------------------
A#c
C#c

Pros

The execution speed is surely better than a constantly bounded reference. We don't need to adopt other strategies and somehow we think this is the more natural way to go, at least in JS (still a nonsense for Java and classic OOP guys).

Cons

Again, the class prototype creation is slower due to all wraps we need for each method that is extending another one.
The meaning of parent is still different from classic OOP, we have a single access point to the inherited stack and nothing else.
This simply means that one more time we cannot emulate the initial behavior we were trying to simplified ... can we define this more powerful? Surely cleaner tho.

Runtime Parent

The single access point for the parent stuck is truly annoying, imho. This is why we could use analogues strategies to obtain full access.
To obtain this, we need again to change everything, starting from the "magic" method:

function callViaParent(parent, method) {
// still a wrap to trap arguments and reuse them
return function () {
// runtime parent
this.$parent = parent;
// result
var result = method.apply(this, arguments);
// parent back as it was before
this.$parent = parent;
// the result
return result;
};
}

We have already improved performances using a simple attachment but this time, parent will not be the method, but the SuperClass.prototype:

function Class(definition) {
function Class() {}
if (definition.extend) {
Class.prototype = definition.extend.prototype;
Class.prototype = new Class;
for (var key in definition) {
if (typeof definition[key] === "function") {
Class.prototype[key] = callViaParent(
// we pass the prototype, not the method
definition.extend.prototype,
definition[key]
);
}
}
} else {
for (var key in definition) {
Class.prototype[key] = definition[key];
}
}
Class.prototype.constructor = Class;
return Class;
}

With above changes our test code will look like this:

// A is still the same

var B = new Class({
extend: A,
a: function () {
console.log("B#a");
},
b: function () {
// HOORRAYYYY, Full Parent Access!!!
this.$parent.a.call(this);
this.$parent.b.call(this);
console.log("B#b");
}
});

var C = new Class({
extend: B,
a: function () {
this.$parent.a.call(this);
console.log("C#a");
},
b: function () {
this.$parent.b.call(this);
console.log("C#b");
},
c: function () {
this.$parent.c.call(this);
console.log("C#c");
}
});

We are back to normality, if we test above code we'll obtain this result:

B#a
C#a
-------------------------
A#a
A#b
B#b
C#b
-------------------------
A#c
C#c

The multiple parent access in method "b" is finally back, which means that now we can emulate the original code.
We have introduced a regression tho! For performances reason, and to avoid crazy steps in the middle of a simple method invocation, call and apply are back in the field!

Pros

Finally we have control over the parent, and even if attached, we can be closer to the classical OOP. Performances are reasonably fast, just one assignment as it was before, but more control.

Cons

We inevitably reintroduce call and apply, hoping our users/developers got the difference, and understood them. We are still parsing methods and wrapping them around, which means overhead for each Class creation, and each extended method invocation.

Lazy Module Pattern

On and on with these patterns that somebody could have already spotted we are back to the initial one:

// last described pattern:
this.$parent.b.call(this);

// hard coded closure
$parent.b.call(this);

The thing now is to understand how heavy could be to place a bloody $pattern variable inside the prototype scope, still using the new Class approach.
Wait a second ... if we simply try to merge the module pattern with our Class provider, how things can be that bad?

function Class(extend, definition) {
function Class() {}
// if we have more than an argument
if (definition != null) {
// it means that extend is the parent
Class.prototype = extend.prototype;
Class.prototype = new Class;
// while definition could be a function
if (typeof definition === "function") {
// and in this case we call it once
// and never again
definition = definition(
// sending the $parent prototype
extend.prototype
);
}
} else {
// otherwise extend is the prototype
// but it could have its own closure
// so it could be a function
// let's execute it
definition = typeof extend === "function" ? extend() : extend;
}
// enrich the prototype
for (var key in definition) {
Class.prototype[key] = definition[key];
}
// be sure about the constructor
Class.prototype.constructor = Class;
// and return the "Class"
return Class;
}

We have lost the "magic" method, less code to maintain ... good! The Class itself seems more slick than before, and easier to maintain: good!
How should our classes look like now?

// A is still the same

// we specify the super class as first argument
// only if necessary
var B = new Class(A, function ($parent) {
// this closure will be executed once
// and never again
// it will receive as argument and
// automatically, the super prototype
return {
a: function () {
console.log("B#a");
},
b: function () {
// Yeah Baby!
$parent.a.call(this);
$parent.b.call(this);
console.log("B#b");
}
};
});

// same is for this class
var C = new Class(B, function ($parent) {
// we could even use this space
// to define private methods, real ones
// those showed in the Overload Patterns
// sounds pretty cool to me
return {
a: function () {
$parent.a.call(this);
console.log("C#a");
},
b: function () {
$parent.b.call(this);
console.log("C#b");
},
c: function () {
$parent.c.call(this);
console.log("C#c");
}
};
});

Did we reach our aim? If we don't care about call and apply, surely we did!
With this refactored Class we are now able to create, without worrying about inline function calls, everything we need.
If the prototype is an object, we can simply use the classic way:

var A = new Class({
a: function () {
console.log("A#a");
},
b: function () {
console.log("A#b");
},
c: function () {
console.log("A#c");
}
});

While if we need a closure to do not share anything outside the prototype, we can still do it!

var D = new Class(function () {
function _doStuff() {
this._stuff = "applied";
}
return {
applyStuff: function () {
_doStuff.call(this);
}
};
});

In few words, we are now able to perform these operations:

new Class(prototype);
new Class(callback);
new Class(parent, prototype);
new Class(parent, callbackWithParent);

I think I gonna change my base Class implementation with this stuff pretty soon :D

Pros

Performances speaking, this pattern is the fastest one in the list. No runtime assignments, no wrappers, no look up for the super, simply a local scope variable to access whenever we need and only if we need, from public, "protected", eventually privileged, and private methods, those we can easily code in the function body. The only microscopic bottleneck compared to native Hard Coded way is provided by the Class and nothing else, but classes are something we define once and never again during a live session, we care about execution speed!

Cons

call or apply ... but "dooode, please learn a bit more about JS, call and apply are essentials for your work!".

Inline Override

This last pattern is all about "do what you need when you need it" approach. In few words, there are several cases where we need to change, maybe temporary, one single method.
This is the way to proceed:

// before ...
var c = new C();

// somewhere else ...
c.doStuff = (function (doStuff) {
return function () {
// some other operation

// back as it was before
this.doStuff = doStuff;
};
}(c.doStuff));




// before ...
var d = new D();

// somewhere else ...
d.doStuff = (function (doStuff) {
return function () {
// some operation with the overridden method
doStuff.call(this);
// something else to do
return this._stuff;
};
}(d.doStuff));

There are several possible combination but the concept is the same: we override inline a method because for a single instance/object there is only one method that does not suite properly with our requirements.
Of course methods to override could be more than one, but if we are shadowing 4 methods or more, we could better think to inherit that constructor prototype and simply create different instances from the subclass.

As Summary

The override emulation via JavaScript is not an "easy to threat" topic. As cited at the beginning, we'll never be able to obtain what we expect in classical OOP since JavaScript is prototypal based.
Somehow we can at least try to understand what kind of similitude with classical OOP we would like to reach, and which pattern could be more suitable for our purpose.
This is what I have tried to explain in this post, in order to complete the other one about overloads.

Saturday, February 20, 2010

Function.prototype.bind

Quick post about a fast Function.prototype.bind implementation.

Function.prototype.bind


function bind(context:Object[, arg1, ..., argN]):Function {
return a callback able to execute this function
passing context as reference via this
}

In few words, if we have a generic object, we don't necessary need to attach a function to create a method:

// a generic function
function name(name) {
if (name == null) return this.name;
this.name = name;
return this;
}

// a generic object
var wr = {name:"WebReflection"};

// a callback with a "trapped" object
var fn = name.bind(wr);

alert([

// wr has not been affected
wr.name, // WebReflection

// fn can always be called
fn(), // WebReflection

fn("test") === wr, // true
fn(), // test
wr.name // test

].join("\n"));

Designed to ensure the same context whatever way we decide to use the callback, included call, apply, setTimeout, DOM events, etc, Function.prototype.bind is one of the most useful/powerful JavaScript concept.
Unfortunately, this will be standard only with newer browser ...

Optimized bind for every browser


Update: as I have said I code what I need, I rarely google for simple tasks like this. Most of this code, in any case, has been readapted and tested to reach best performances, based on this proposal.

if (Function.prototype.bind == null) {

Function.prototype.bind = (function (slice){

// (C) WebReflection - Mit Style License
function bind(context) {

var self = this; // "trapped" function reference

// only if there is more than an argument
// we are interested into more complex operations
// this will speed up common bind creation
// avoiding useless slices over arguments
if (1 < arguments.length) {
// extra arguments to send by default
var $arguments = slice.call(arguments, 1);
return function () {
return self.apply(
context,
// thanks @kangax for this suggestion
arguments.length ?
// concat arguments with those received
$arguments.concat(slice.call(arguments)) :
// send just arguments, no concat, no slice
$arguments
);
};
}
// optimized callback
return function () {
// speed up when function is called without arguments
return arguments.length ? self.apply(context, arguments) : self.call(context);
};
}

// the named function
return bind;

}(Array.prototype.slice));
}


Why Bother

The concatenation optimization is something rarely present in whatever framework/library I have seen. While it makes the code a bit bigger than usual, performances will be the best for most common cases, those where optional arguments are not passed at all.
A simple test case, one of my favorites, able to solve the classic IE problem with extra arguments passed to setInterval/Timeout:

(function(){

var callback = function(time){
if (1000 < new Date - time)
return alert("1 second with a default argument");
setTimeout(callback, 15);
}.bind(null, new Date);

setTimeout(callback, 15);

}());

JavaScript Overload Patterns

Update I have continued with patterns into JavaScript Override Patterns.

We all know JavaScript does not implement a native methods overload concept and what we usually do on daily basis is to emulate somehow this Classic OOP behavior.
There are several ways to do it, all of them with pros and cons, but what is the best way to implement it?

A common situation

Let's imagine we would like to have a method able to accept different arguments types and return something accordingly with what we received.
Usually in Classic OOP an overload cannot redefine the returned type but in JavaScript we can do "whatever we want", trying to be still consistent, so that we can consider us more lucky than C# or Java guys, as more flexible as well.
For this post, the behavior we would like to obtain will be similar to the one we can find in jQuery library, one of the most used, famous, and friendly libraries I know.

// create an instance
var me = new Person();

// set some property
// via chained context
me
.name("Andrea")
.age(31)
;

// retrieve some property
me.name();
// Andrea

While the classic get/set concept is boring, old style, and not optimized, the chainability one is really smart, semantic, easy to understand, and bytes saver.
As example, compare above piece of code with this one, following comments to understand which part is not that convenient:

// create an instance
var me = new Person();

// set some property
me.setName("Andrea");
// unoptimized operation, a setter
// does not usually return anything
// we would expect at least the set parameter
// as if we did
// (me.name = "Andrea")
// we could expect true/false
// as operations flag
// but when do we need a "false"
// able to break our code in any case?
// throws an error if the operation
// was not successful since we cannot
// go on with what we expect

// set something else
me.setAge(31);

// we need to rewrite the var name
// for every single method
// plus we need to write methods with
// everytime a prefix: get or set
// this is redundant and inefficient
me.getName(); // Andrea

If we pass under minifiers and mungers this silly little example, we'll find out that the chain strategy goes from 72 to 56 bytes, while the get/set goes from 76 to 68.
As summary, while overload strategies could be adopted in both cases, the example code will use the first strategy to perform the task: "create a Person class and a method able to behave differently accordingly with received arguments"

Inevitably via "arguments"

As is for every scripting language, the basic strategy to emulate an overload, is the lazy arguments parsing.
Lazy, because we need to understand for each method call what should be the expected behavior.
This is surely a performances problem so even if the topic in this case is overload, we should never forget that sometimes the best way to set or get something, is setting or getting something:

// if we can set something
// the internal property is somehow exposed
// in any case since it is reconfigurable
me.setName("Andrea");
me.setName("WebReflection");
me.getName();

// against
me.name = "Andrea";
me.name = "WebReflection";
me.name;


Avoid pointless overheads


Of course getters and setters could be considered safer, but how many times we have spotted code like this?

// somewhere in some prototype
setStuff: function (_stuff) {
this._stuff = _stuff;
},
getStuff: function () {
return this._stuff;
}

Above code simply adds overhead for each operation performed against the _stuff property, and what's the point to do it in that way?
Unless we don't ensure a type check for each call, above code style over a programming language without type hints could be simply considered an error, since every time we spot a badly implemented design, error is usually the first thought we have.

The Simplest Overload Implementation

Latest code could be already compressed into a single method. We want to perform different operations, accordingly with a generic argument, plus we would like to "kick Java Devs ass" overloading the returned value as well to make chainability possible.

function Person(_name) {
// we would like to have a default name
this._name = _name || "anonymous";
}

Person.prototype = {
constructor: Person,
name: function (_name) {
// what should we do?
// if _name is not undefined or null
if (_name != null) {
// we want to set the name
// (eventually doing checks)
this._name = _name;
// and return the Person instance
return this;
} else {
// nothing to set, it must be a get
return this._name;
}
}
};

A simple test case:

var me = new Person();
alert([

me.name(), // anonymous

me // magic chain!
.name("Andrea")
.name() // Andrea
].join("\n"));


Where Is The Overload

When we talk about overload, we think about multiple methods invoked runtime accordingly with received arguments.
Since as I have said we cannot do this in JavaScript:

// what we are basically doing
// ECMAScript 4th Representation
function name(_name:String):Person {
this._name = _name;
return this;
}
// overload
function name(void):String {
return this._name;
}

// what Java/C# guys should write
// to emulate the current behavior
class Person {
object name(string _name) {
this._name = _name;
return this;
}
// overload
object name() {
return this._name;
}
}

// example
string name = (string)(
(Person)(
new Person()
).name("Andrea")
).name();

all we usually do is to put a lot of if, else if, else, switch statements inside the same method.

Overload Pros

I do believe it's hardly arguable that overloads are able to make code easier to maintain and both more clear and more linear.
While strict typed languages do not need to take care about arguments type, scripting languages are often focused about this matter and the code inside a single method could be messy enough to be hardly maintainable or debuggable.
In few words, with overloads we can split behaviors incrementing focus over single cases delegating arguments parsing, when and if necessary, in the single exposed entry point: the method itself.

Overload Via Polluted Prototype

The first overload implementation to obtain exactly the same behavior is this one:

Person.prototype = {
constructor: Person,

// exposed public method
name: function (_name) {
// exposed entry point
// where we decide which method should be called
return this[_name == null ? "_nameGet" : "_nameSet"](_name);
},

// fake protected methods
_nameSet: function (_name) {
this._name = _name;
return this;
},
_nameGet: function () {
return this._name;
}
};

Pros

We can "cheat" a little bit in the entry point referencing the this variable to decide which "protected" method we need to call.
In my case I don't even mind if _nameGet does not accept an argument, which is still a bad design since the operation is just tricky without a valid reason, but at least we have more flexibility. If we think that every method could have common operations inside, e.g. a User class with a protected this._isValidUser() performed via every other method call, this approach could be considered good enough to split all matters/logics we are interested about.

Cons

Specially for debug, all those faked protected methods could be really annoying. All we want to show/expose are public methods so that a for in loop or a console.log operation, as example, will show just the "public status" of an instance and not every single internal method we are not interested about. Moreover, some "clever" developer could decide to use internals calls but we would like to avoid any kind of critical operation, don't we?

Overload Via Closure

This second approach is able to maintain overload benefits, exposing only what we want:

Person.prototype = {
constructor: Person,
name: (function(){
// method closure
// we can share functions
// or variables if we need it

// the method, named, easier to debug
function name(_name) {
return _name == null ? get.call(this) : set.call(this, _name);
}

// overload
function set(_name) {
this._name = _name;
return this;
}

// overload
function get() {
return this._name;
}

// the exposed method
return name;
}())
};

Pros


A for in loop will pass only the constructor and the public method so that nobody can change, use, or modify its internals. Even other methods cannot access into "name" closure so we are free to use common meaningful name as is as example for both get and set.
If at some point every overload needs to perform the same task, we can simply create another function so that this will be shared in the whole closure.

Cons

Somebody could argue that those internal functions are not Unit Test friendly ... at least this was my guess before I talked with a skilled programmer that said: "dude, you should test what you expose and not what is the exposed internal logic. In other words, it's the name method the one you care about, and not what happens inside which will be implicitly tested via 'name' calls".
Due to the closure, it is not possible to share across the prototype a single function reused for each call.
This is not something truly problematic, since in this case we can always use an outer closure:

var User = (function () {

// private method shared in the whole prototype
function _isValidUser() {
// db operations here
// this is just a silly example
return this._name != null && this._pass != null;
}

// User class
function User() {}

User.prototype = {
constructor: User,
updateAge: function (_age) {
if (!_isValidUser.call(this)) {
throw new Error("Unauthorized User");
}
this._age = _age;
},
verifyCredentials: (function (){
// internal closure
function verifyCredentials(_user, _pass, create) {
if (create === true) {
_create.call(this, _user, _pass);
}
return _isValidUser.call(this);
}

function _create(_user, _pass) {
this._name = _name;
this._pass = _pass;
}

return verifyCredentials;
}())
};

return User;

}());


Optimized Overload via Closure

Since size always matters, and this is valid for performances too, this is an alternative closure example:

Person.prototype = {
constructor: Person,
name: (function(){
function name(_name) {
return _name == null ? get(this) : set(this, _name);
}
function set(self, _name) {
self._name = _name;
return self;
}
function get(self) {
return self._name;
}
return name;
}())
};

Everything else is the same, except we assume that every private/internal method won't require a call/apply, simply the instance as mandatory first argument.

Patterns Size And Performances

All patterns have been passed under YUICompressor with all features enabled. This is the result, in numbers:

  • Simplest Overload (nothing is splitted, everything inside the single method): before 308, after 162, compression ratio: 47%.
  • Polluted Prototype: before 384, after 233, compression ratio: 39%.
  • Overload via Closure: before 464, after 240, compression ratio: 48%.
  • Optimized Overload via Closure: before 464, after 224, compression ratio: 52%.


Simplest Overload


We can easily deduct that the most common approach to behave differently is the classic method with "everything inside". Under the microscope, performances will be slightly better than every other approach since the number of function calls is reduced. But, we should consider that my example does basically nothing and that in most common cases the whole function body is polluted with large amount of variables, sometimes potentially disturbing (e.g. for (var i, ...)) and we can hardly debug long methods with hundreds of checks and if else there. Of course the example I have chosen does not really represent the perfect scenario where it's evident how much cleaner overloads are against common code style, so in that particular case, I would have chosen the first approach, but this is up to us, and "case per case" dependent.

Polluted Prototype

While performances and semantic could appear a bit better, the compression ratio in this case is the worst one. Moreover, this approach implicitly suffers name clashes problem. We should simply think about chained inheritances, and how many times we could have the same name over different mixins or classes.
If we add what we have already understood about Cons, I would define this approach the less convenient one, while it is still probably the most adopted one starting from the classic protected method approach.

Overload via Closure

This approach is already good for its purpose. Not a single compiler can optimize a this reference so I do hope developers will start to get rid of the classic call/apply approach in favor of the self one. THere are no benefits against the polluted prototype, byteswise speaking, but it is still valid all Pros against the latter one.

Optimized Overload via Closure

This pattern is the real winner:

function Person(a){this._name=a||"anonymous"}Person.prototype={constructor:Person,name:(function(){function b(d){return d==null?a(this):c(this,d)}function c(e,d){e._name=d;return e}function a(d){return d._name}return b}())};

As we can see, the number of this references inside the overloaded method is reduced to three, rather than N for each private method call. The compression ratio is best one as is the size, except for the basic case.
Lots of pros, and "just a matter of style" as cons, I already like this latest pattern!

As Summary

We should always analyze different patterns and pros and cons every time we decide to adopt a strategy to emulate something not truly part of the language nature. This is what I have tried to do in this post, hoping some developers will appreciate it, starting to use some suggestion, or giving me more ;)

Tuesday, February 16, 2010

JSLint: The Bad Part

Every programming language has somehow defined its own standard to write code. To be honest, as long as code is readable, clear, and indented when and if necessary, I think we do not need so many "code style guides" and, even worst, sometimes these "code standards" let us learn less about the programming language itself, helping if we are beginners, sometimes simply annoying if we are professionals.

Disclaimer

This post aim is absolutely not the one to blame one of my favorite JS gurus, this post is to inform developers about more possibilities against imperfect automation we'll probably always find in whatever excellent spec we are dealing with. This post is about flexibility and nothing else!

JSLint And Code Quality

Wehn Mr D says something, Mr D is 99.9% right about what he is saying ... he clearly represents what we can define a Guru without maybe or perhaps but he, as everybody else here, is still a programmer, and we all know that every programmer has its own style if we go down into details.
Mr D programming style has been historically summarized in this Code Conventions for the JavaScript Programming Language page.
All those practices are basically what we can find out about our code using a well known and widely adopted parser: JSLint, The JavaScript Code Quality Tool.
I write JavaScript and ActionScript (based over the same standard) since about 2001 and generally speaking, as experienced developer, I trust what I meant to do, and rarely what an automation tool supposes to teach me about the language.
This is like studying remembering rules rather than being able to understand them, something surely academical, nothing able to let us explore and discover new or better solutions ... we are stuck in those rules, but are we machines? Are we Mr D code style clones?
This post is about few inconsistent points regarding JavaScript Code Quality, with all due respect for somebody that simply tried to give us hints!

Indentation

The unit of indentation is four spaces. Use of tabs should be avoided because there still is not a standard for the placement of tabstops
While size always matters, since to move 1Gb or 600Mb in a network still makes a big difference, I wonder what's wrong with just two spaces. The monospace font we all use to write code on daily basis is "large enough", and while 2 spaces rather than 4 are extremely easy to spot, 4 spaces are absolutely ambiguous.
How many times we had to check via cursor if some other editor placed a tab rather than 4 spaces there? Ambiguity is something that does not match at all with the concept of Code Quality, am I wrong?
Finally, while everybody in this world has always used innerHTML due its better performances against the DOM, Mr D. tells us that tabs are not defined as a standard ... have we never heard something like de facto standard?
Specially in a unicode based language as JavaScript is, where tabs are indeed replaced by "\t" even in Mr D JSON specs, how can we think about whatever JavaScript IDE or Engine unable to understand tabs? Let's avoid them, but still, at least let's replace them with something that does NOT occupy exactly the same space!

Variable Declarations

All variables should be declared before used. JavaScript does not require this, but doing so makes the program easier to read and makes it easier to detect undeclared variables that may become implied globals.
OK, got the point, minifier or munger will take care about this so it could make sense. Now, everything is fine to me, as long as I don't read the next immediate point:
The var statements should be the first statements in the function body.
... are we serious here?
So, if var declaration is the first thing I should do, why on earth I should spend 10 times my time to write semiclons and var again?

// first thing to do
var a = 1;
var b = 2;
var c = 3;
var d = {};

// AGAINST
var a = 1, b = 2, c = 3,
d = {}
;

Faster to type, easier to read, I can even group blocks of variable declaration to define different types (primitive first, object after, undefined later if necessary) and thanks to indentation, the precedent point, I must be an idiot to do not understand what the hell I wrote. Is it just me?
I prefer to perfectly know the difference between comma and semicolon and these two buttons are in a different place, there is NO WAY I coul dmake a mistake unless I don't know the difference ... but in that case I guess we have another problem, I know nothing about JS!
Furthermore, something kinda hilarious to me:
Implied global variables should never be used.
OK, assuming for whatever reason we don't consider global functions references/variables, we should forget:
  • undefined
  • window
  • navigator
  • document
  • Math
  • jQuery (dollar $ is global baby!)
  • everything else that supposed to be reached on the global scope
I may have misunderstood this point so I do hope for some clarification, but again, the difference between a local scoped variable and a global one should be clear to everybody since JSLint cannot solve anything, and I'll show you later.

Function Declarations

I agree lmost everything about this chapter, but there are surely a couple of inconsistencies here as well.
There should be no space between the name of a function and the ( (left parenthesis) of its parameter list. There should be one space between the ) (right parenthesis) and the { (left curly brace) that begins the statement body.
...
If a function literal is anonymous, there should be one space between the word function and the ( (left parenthesis). If the space is omited, then it can appear that the function's name is function, which is an incorrect reading.
Let me guess: if there is a name, there must be a space to identify the name, if there is no name, the must a space as well? A function called function which is a reserved word?
I am sure Mr D has much more experience than me, but I wonder if that guy that wrote function function() {} has been fired, is in the Daily WTF website, it is part of the shenanigans group, or if it is still working beside Mr D ... in few words, how can be no space and a bracket ambiguous?

var f = function(){};
I want to honestly know who is that programmer able to confuse above code ... please write a comment with your name and your company, I will send you a Congratulation You Are Doing Wrong card ... I'll pay for it!!!
Same is for the space after the right parenthesis, as if an argument could accept brackets so that we could be confused about the beginning of the function body, isn't it?
Some bad code there in this chapter as well, which let me think truly weird stuff:

walkTheDOM(document.body, function (node) {
var a; // array of class names
var c = node.className; // the node's classname
var i; // loop counter results.
if (c) {
a = c.split(' ');
for (i = 0; i < a.length; i += 1) {
...

So we have to declare every variable at the beginning, included variable used for loops, the i, so the day engines will implement a let statement we'll have to rewrite the whole application?

var collection = (function () {
var keys = [], values = [];

return {
....

wait a second, consistency anybody? How come the next paragraph uses a natural assignment as that one while the Code Quality is to do not use it?

Names

Names should be formed from the 26 upper and lower case letters (A .. Z, a .. z), the 10 digits (0 .. 9), and _ (underbar). Avoid use of international characters because they may not read well or be understood everywhere. Do not use $ (dollar sign) or \ (backslash) in names.
I am not sure if Prototype and jQuery guys shouted something like: epic fail but I kinda laughed when I read it. It must be said that these practices are older than recent framework, and this is why I have talked about do not explore code potentials at the beginning of this post.
Do not use _ (underbar) as the first character of a name. It is sometimes used to indicate privacy, but it does not actually provide privacy. If privacy is important, use the forms that provide private members. Avoid conventions that demonstrate a lack of competence.
So if I got it right, if we identify private variables inside a closure, where these variable actualy are private, we should not identify them as private so we can mess up with arguments, local public variables (remember cached vars?) and everything else ... ambiguous!

Statements

Labels
Statement labels are optional. Only these statements should be labeled: while, do, for, switch.
...
continue Statement
Avoid use of the continue statement. It tends to obscure the control flow of the function.

Labels are fine, continue, which is basically an implicit label similar to goto: nextloopstep should be avoided. But as far as I read the return statement able to break whatever function in whatever point has no hints about being only at the end of the function bosy as is for ANSI-C programmers?

Block Scope

In JavaScript blocks do not have scope. Only functions have scope. Do not use blocks except as required by the compound statements.

with (document) {
body.innerHTML = "is this a scope?";
}
Never mind, somebody in ES5 strict specs decided that with statement is dangerous ...

=== and !== Operators.

It is almost always better to use the === and !== operators. The == and != operators do type coercion. In particular, do not use == to compare against falsy values.
This is the most annoying point ever. As a JavaScript developer, I suppose to perfectly know the difference between == and ===. It's like asking PHP or Ruby people to avoid usage of single quoted strings because double quoted are all they need ... does it make any sense?

// thanks lord null is falsy as undefined is
null == undefined; // true
null == false; // false
null == 0; // false
null == ""; // false
null == []; // false
null === null; // true, this is not NaN

In few words, as I have said already before, null is == only with null and undefined, which means we can avoid completely redundant code such:

// Mr D way
if (v !== null && v !== undefined) {
// v is not null neither undefined
}

// thanks JavaScript
if (v != null) {
// v is not null neither undefined
}

Moreover, since undefined is a variable it can be redefined somewhere else so that the second check could easily fail ... and where is security in this case?

// somewhere else ..
undefined = {what:ever};


// Mr D way
if (v !== null && v !== undefined) {
// this is never gonna happen AS EXPECTED
}

Please Mr D whatever you think about this post, think more about that silly warning in JSLint: it's TOO ANNOYING!!!
One last example about ==, the only way to implicitly call a valueOf if redefined:

if ({valueOf: function(){return !this.falsy;}} == true) {
// hooray, we have more power to deal with!
}


eval is Evil

The eval function is the most misused feature of JavaScript. Avoid it.
eval has aliases. Do not use the Function constructor. Do not pass strings to setTimeout or setInterval.

aliases? It seems to me that eval is misunderstood as well since there is no alias fr eval. This function may be evil specially because it is able to bring the current scope inside the evaluated string.
Function and setWhaatever do not suffer this problem, these global function always ignore external scope.
Moreover, it is thanks to Function that we can create ad-hoc, runtime, extremely performer functions thanks to dynamic access to static resolution:

function createKickAssPerformancesCallback(objname, case, stuff, other) {
return Function(objname, "return " + [objname, case, stuff, other].join("."));
}

Where exactly is the eval here and why we should avoid such powerful feature when we are doing everything correct?

Unit Test IS The Solution

If we think we are safe because of JSLint we are wrong. If we think we cannot do mistakes because of JSLint we are again wrong. JSLint could help, as long as we understand every single warning or error and we are able to ignore them, otherwise it won't make our code any better, neither more performance killer, surely not safe.
There are so many checks in JSLint that somebody could simply rely in this tool and nothing else and this, for the last time, is wrong!
Next part of the post is about a couple of examples, please feel free to ask more or comment if something is not clear, thanks.

JSLint: The Bad Part

All these tests are about possibly correct warnings, often useless on daily basis code. Please note all codes have been tested via The Good Parts options and without them.

Conditional Expressions

I twitted few days ago about this gotcha. An expression to me does not necessary require brackets, specially it does not require brackets when it is already inside brackets.

"use strict";
var lastValidArgument;
function A(b) {
// I mean it!!!
if (lastValidArgument = b) {
return "OK";
}
}

Somebody told me that if we put extra brackets, as JSLint suggests, it measn that we meant that assignment, rather than a possible missed equal.
First of all, if == is discouraged in favor of === how can be possible developer forgot to press the bloody equal sign three times? It does not matter, the inconsistency here is that if I properly compare there two variables leaving there brackets, theoretically there to let me understand I am doing an assignment rather than comparing vars, nothing happens:

"use strict";
var lastValidArgument;
function A(b) {
if ((lastValidArgument === b)) {
return "OK";
}
}

I would have expected a warning such: Doode, what the hell are you doing here? You have superflous brackets around! ... nothing, inconsistent in both cases, imho.

Unused variable

A proper syntax analyzer is truly able to understand if inside a whole scope, we used a variable as we meant to do. Unfortunately, here we have a totally useless warning/error about an extremely silly, but possible, operation. To avoid the error:

"use strict";
(function () {
var u;
// we forgot to assign the "u" var
// how can this if remove such error from this scope?
if (u) {
}
}());

Agreed, u has been used, but why on earth I have no warnings about a completely pointless if? As far as I understand, variables are truly well monitored ... this requires an improvement ... or remove the Unused variable check since in this case it does not change anything.

Pattern to avoid global scope pollution

The new operator is powerful, but it could be omitted and JSLint knows this pretty well! We, as developers, are able to modify behaviors, using functions duality to understand what's going on. Example:

"use strict";
function A() {
if (!(this instanceof A)) {
return new A();
}
}

// always true
A() instanceof A;
new A() instanceof A;

Cool, isn't it?
Problem at line 7 character 9: Missing 'new' prefix when invoking a constructor.
How nice, thanks JSLint!

Look behind without look forward

More about new, who said this requires parenthesis? As soon as I write new I am obviously trying to create an instance. Since the constructor will be exactely the one referenced after, and since after this operation there will be a semicolon, a comma, a bracket (return without semicolon), how can it be possibly a problem for JavaScript?

var a = new A;

If I don't need arguments, I feel kinda an idiot to specify parenthesis ... maybe it's just my PHP background, isn't it?

class A {}
$a = new A; // OK
$b = 'A';
$a = new $b; // still ok
$a = new $b(); // WTF, $b is a string

// oh wait ...
$b = 'str_repeat';
$b('right', 2); // rightright


Strictly Compared

I have already talked about null and == feature, now let's see how dangerous can be JSLint sugestion:

"use strict";
function pow(num, radix) {
return Math.pow(num, radix === null || radix === undefined ? 2 : radix);
}

// somewhere else, since not only eval is evil
[].sort.call(null)["undefined"] = 123;

Above code will both pass JSLint and always fail the check || radix === undefined since undefined will be 123 primitive.
You know what this mean? Unless we do not define an undefined local scope variable for each function (the most boring coding style ever, imho) we should simply avoid === undefined checks, these are both unsafe, these slow down code execution since undefined is global and requires scope resolution, and these are a waste of bytes

function pow(num, radix) {
return Math.pow(num, radix == null ? 2 : radix);
}

That's it, there is no value that could mess up that check: undefined, null, or a number, 0 included!

Nothing bad in this code

This is the best way I know to create a singleton, or a prototype, to have a private scope as well, shadowing the runtime constructor:

"use strict";
var Singleton = new function () {
// private function (or method)
function _setValue(value) {
// lots of stuff here
_value = value;
}
var _value;
this.get = function () {
return _value;
};
this.set = function (value) {
if (!_value) {
_setValue(value);
}
};
this.constructor = Object;
};

Here we have a Jackpot:
Error:
Problem at line 4 character 14: Unexpected dangling '_' in '_setValue'.
Problem at line 6 character 9: Unexpected dangling '_' in '_value'.
Problem at line 6 character 9: '_value' is not defined.
Problem at line 8 character 9: Unexpected dangling '_' in '_value'.
Problem at line 10 character 16: Unexpected dangling '_' in '_value'.
Problem at line 13 character 14: Unexpected dangling '_' in '_value'.
Problem at line 14 character 13: Unexpected dangling '_' in '_setValue'.
Problem at line 2 character 17: Weird construction. Delete 'new'.
Problem at line 18 character 2: Missing '()' invoking a constructor.

We can nullify the Singleton variable itself, but still nobody else can have access to that scope. This is what I call a variable or method private, and I mean it!
Underscore is perfect as first character, since it helps us to understand a "protected method/property", which is never truly protected, and a real private method/property, shared across instances if it's a variable, usable as private method if called via a public one.
The best one ever, in any ase is:Weird construction. Delete 'new'. ... Weird what? A lambda based language cannot create runtime via lambdas instances?

Conclusion

I hope this post will help both Mr D. and developers, the first guru to improve JSLint syntax checks, while latter people to understand errors, being able sometimes to completely ignore them.
I am looking forward for some feedback or, even better, something I did not consider or some example able to demonstrate or underline my points, thanks for reading.

Saturday, February 13, 2010

arguments, callee, call, and apply performances

We have dozens of best practices to improve performances. We also have common practices to accomplish daily tasks. This post is about the most used JavaScript ArrayLike Object, aka arguments, and its performances impact over basic tasks.

Why arguments

While it's natural for JavaScripters to use such "magic" variable, as arguments is, in many other languages everybody knows it does not come for free and it is rarely used.

<?php
function myFunc() {
// function call for each execution
// rarely seen in good PHP scripts
$arguments = func_get_args();
}
?>

One clear advantage in PHP, Python, and many others, is the possibility to define a default value for each argument.

<?php

class UserManager extends MyDAL {
public function exists($user='unknown', $pass='') {
return $this->fetch('SELECT 1 FROM table WHERE user=? AND pass=?', $user, $pass);
}
}
?>

This approach may brings automatically developers to code as if arguments does not exist. It's not an important value, everything has a default ... so, why bother?

Why callee

Specially because of black sheep Internet Explorer and its inconsistent/bugged (it does not exist, imho) concept of named function expression, our laziness frequently bring us to use this "shortcut" to refer the current executed function.

// the classic case ..
setTimeout(function
/*
if named, IE will pollute the current scope
with chose name rather than let it "be" only
inside the function body, as is for every other browser
*/
() {
// do stuff
setTimeout(arguments.callee, 1000);
}, 1000);

Even if we are in a closure so that no risk will occur if we name the callback, we are still lazy and we don't even think about something like:

setTimeout(function $_fn1() {
// do stuff
setTimeout($_fn1, 1000);
}, 1000);

return function $_fn2() {
// if stuff $_fn2() again;
};

Of course, how many $_fn1, $_fn2, $_fn3 we can possibly have in the current scope?
Somebody could even think about silly solutions such:

Function.prototype.withCallee = (function ($) {
// WebReflection Silly Ideas - Mit Style
var toString = $.prototype.toString;
return function withCallee() {
return $("var callee=" + toString.call(this) + ";return callee")();
};
})(Function);

// runtime factorial
alert(function(n){
return n * (1 < n ? callee(--n) : 1);
}.withCallee()(5)); // 120

// other example
setTimeout(function () {
// bit slower creation ... but
// much faster execution for each successive call
if (animationStuff) {
setTimeout(callee, 15);
}
}.withCallee(), 15);


OK, agreed that 2 functions rather than one for each function that would like to use callee could require a bit more memory consumption ... but hey, we are talking about performances, right?

Why call and apply

Well, call and apply are one of the best JavaScript part ever. Everything can be injected into another scope, referenced via this, and while Python, as example, has a clear self as first argument, we, as JavaScripters, don't even think about such solution: we've got call and apply, who needs to optimize a this?
Well, somehow this always remind us that we are dealing with an instance, an object, rather than a primitive or whatever value sent as argument.
This means that even where it is possible to avoid it, we feel cooler using such mechanism:

function A(){};
A.prototype = (function () {
// our private closure to have private methods
function _doStuff() {
this.stuff = "done";
}
return {
constructor:A,
doStuff:function () {
_doStuff.call(this);

// it could have been a simple
_doStuff(this);
// if _doStuff was accepting a self argument
}
};
})();

Furthermore, apply is able to combine both worlds, via lazy arguments discovery, and context injection ... how cool it is ...

The Benchmark

Since we have all these approaches to solve our daily tasks, and since these cannot come for free, I have decided to create a truly simple bench, hopefully compatible with a wide range of browsers. There is nothing there, except lots of executions, defined by times parameter in the query string, and a simple table to compare runtime these results.

Interesting Results

The scenario is apparently totally inconsistent across all major browsers, and this is my personal summary, you can deduct your one as well:
  • in IE call and apply are up to 1.5X slower while as soon as arguments is discovered, we have up to 4X performances gap. There is no consistent difference if we discover callee, since it seems to be attached directly into arguments object.
  • in Chrome call is slower than apply only if there are no arguments sent, otherwise call is 4X faster than apply and, apparently, even faster than a direct call. arguments costs generally up to 2.5X while once discovered, callee seems to come for free giving sometimes similar direct call results.
  • in Firefox things are completely different again. Direct call, as call and apply, do not differ that much but as soon as we try to discover arguments.callee, for one of the first browser that got named function expression right, the execution speed is up to 9X slower.
  • Opera seems to be the most linear one. Direct call is faster than call, and call is faster than callee. To discover arguments we slow down up to 2X while callee does not mean much more.
  • In Safari we have again a linear increment, but callee costs more than Opera and others, surely not that much as is for Firefox


Summarized Results

A direct call is faster, cross browser speaking, and specially for those shared functions without arguments, we could avoid usage of call or apply, a self reference as argument is more than enough.
arguments object should be forbidden, if we talk about extreme performances optimizations. This is the only real constant in the whole bench, as soon as it is present, it
slows down every single function call and most of the time
consistently.

HINTS about arguments

To understand if an argument has not been sent, we can always use this check ignoring JSLint warnings about it:

function A(a, b, c) {
if (c == null) {
// c can be ONLY undefined OR null
}
}

If we compare whatever value with == null, rather than === null, we can be sure this value is null or undefined.
Since generally speaking undefined is not an interesting value and null is used instead, also because undefined is a variable and it costs to compare something against it and it could be redefined as well while null cannot, it does not make sense at all to do tedious checks like this:

function A(a, b, c) {
// JSLint way ...
if (c === undefined || c === null) {
// bye bye performances
// bye bye security, undefined can be reassigned
// hello redundant code, == null does exactly the same
// check in a more secure way since it does not matter
// if undefined has been redefined
}
}

Do we agree? That warning in JSLint is one of the most annoying one, at least this is my opinion.
Let's move forward.
If we would like to know arguments length we have different strategies:

function Ajax(method, uri, async, user, pass) {
if (user == null) {
// we know pass won't be there as well
// received probably 3 arguments
// if user is not null, we expect 5 arguments
// and we use all of them
}
if (async == null) {
// this is a sync call
// received 2 arguments
}
}

function count(a, b, c, d) {
// not null, we consider it as a valid value
var argsLength = (a != null) + (b != null) + (c != null) + (d != null);
alert(argsLength);
// rather than alert(arguments.length);
}

count();
count(1);
count(1, 2);
count(1, 2, 3);
count(1, 2, 3, 4);

About latest suggestion please consider that only Chrome is slower, but Chrome is already the fastest browser so far while in IE, as example, arguments.length rather than null checks costs up to 6X the time.
Every other browser will have better performances than arguments.length, then we need to test case after case since a function, as String.fromCharCode could be, cannot obviously use such strategy due to "infinite" accepted arguments.
In these cases, e.g. runtime push or similar methods, we don't have many options ... but these should be exceptions, not the common approach, as is for many other programming languages with some arguments support.

Conclusion

I do not pretend to change developers code style with a single post and things are definitively not that easy to normalize for each browser.
Unfortunately, we cannot even think about features detection when we talk about performances, we don't want 1 second delay to test all performances cases before we can decide which strategy will speed up more, do we?
At least we are now better aware about how much these common JavaScript practices could slow down our code on daily basis and, when performances do matter, we have basis to avoid some micro bottleneck.

Monday, February 01, 2010

CommonJS - Why Server Side Only?

There is one single thing I don't like about CommonJS Idea, the fact nobody is thinking about the client side!!!

CommonJS Why

Since everybody would like to use JavaScript as Server Side Programing Language, where right now I can count there about 50 implementations, somebody decided that at least basic stuff such IO operations, streams or sockets, and much more, should have a common behavior, namespace, API, across all different implementations.
In few words, these guys are trying to create their own WSW Consortium, and this is absolutely OK.
So what am I complaining about?

Client CommonJS

If we, as developers and libraries authors, would have adopted a similar strategy ages ago, rather than fight with each other about natives prototypes pollutions, web development would be probably even easier for everybody.
We failed, while those server side guys started correctly.
The problem, for both language usage and its environment, is that JavaScript on server has different problems than a bloody "attachEvent VS addEventListener" but while common practices are in any case appreciated and widely adopted world wide, CommonJS is not friendly at all with Client Side JavaScript.
The basic example?

require

This function aim is to retrieve from a namespace, where it is basically represented via dot notation, translated into folders paths, a generic variable, object, function, whatever.
The power a dedicated build could have over ouw miserable secured/sandboxed version of browser JavaScript engines is endless.
As example, the only way I could think about to implement a client side require, is this one:

if (typeof require === "undefined") {
// (C) WebReflection - Mit Style License
var require = (function (context, root, Function) {
function require(namespace) {
if (!hasOwnProperty.call(cache, namespace)) {
var xhr = new XMLHttpRequest;
xhr.open("GET", (root + "." + namespace.replace(/(^.*)(?:\.[0-9A-Za-z$_]+)$/, "$1")).replace(/\./g, "/") + ".js", false);
xhr.send(null);
cache[namespace] = Function(xhr.responseText + ";return function(){return eval(arguments[0])};").call(context);
}
return /(?:^.*\.|^)([0-9A-Za-z$_]+)$/.test(namespace) && cache[namespace](RegExp.$1);
};
var XMLHttpRequest = this.XMLHttpRequest || function () {
return new ActiveXObject("Microsoft.XMLHTTP");
};
var cache = {};
var hasOwnProperty = cache.hasOwnProperty;
return require;
})(this, "", this.Function);
}


How It Works

Let's say we have a file called mylib.js and let's say this file contains our libraries variables.
To obtain the base object, we could simply do something like this:

<script src="require.js"></script>
<script>
var base = require("mylib.base");
base.alert("hello");

var $alert = require("mylib.base").alert;
$alert("world");
</script>

Where mylib.js file is nothing different from:

var base = (function () {
var self = this;
return {
alert:function (msg) {
self.alert(msg);
}
};
}).call(this);

Easy? In few words if a file contains proper variable declarations, rather than global myvar = {} without var prefix, we can imagine we could have all our libraries automatically "sandboxed", at least the global namespace won't be polluted that much and via my implementation of require, each file will be loaded simply once and never again, and we can retrieve step after step just what we need.

P.S. please note that my implementation is just an imperfect proof of concept since require in CommonJS accepts syntax like require("mylib").base; but untile we won't have runtime cross browser resolved get/set, this is not possible to implement synchronously :(