Saturday, April 25, 2009

My first impressions of Python for the second time

I had worked a bit in Python many years back. Since then I have forgotten almost everything I learned back then. I think the phrase "Out of sight out of mind" applies perfectly to my mind.

Since the last few days, I have started relearning Python, and this time I am recording my impressions of Python after having come to it from a Java background.

Indentation: Python uses indentation to specify blocks of code, instead of curly braces. I like this, because we anyways indent code to increase readability, so why not achieve two tasks together. Code looks much cleaner without the curly braces. However there may be a little downside. Everyone in the team will have to set up their IDE's in the same way. Things might fall apart if some people use tabs and others use spaces for indentation.

Access modifiers: Python does not have public, private, and protected keywords. Everything is public. However, private members can be specified with a leading single underscore. If we use double leading underscores then Python's runtime will do name mangling and make it difficult to directly access those members. Note that I use the word difficult, and not impossible. Personally I like not having access modifiers. Using convention to specify a member as private is fine in my books, because I have never thought of access modifiers as a security feature. It is rather a feature that prevents well meaning programmers from making mistakes. If there is a simple convention like leading underscores signifying a private member, then well meaning programmers will not access it directly. We do not need to treat developers as children and put fences all around the place.

String interpolation: In a previous blog post I wrote about how Python supports string templating, but not interpolation. This is one feature I wish Python supported.

String and Unicode: Python strings are not Unicode by default. For Unicode strings there is a separate Unicode type. On the other hand in Java all strings are always Unicode. Update: Some friends just informed me that Python 3000 has Unicode strings by default.

Multiple inheritance: Python supports multiple inheritance. Again I am not sure I like this.

Numbers: I tried this code in Python

max64BitInt = 2**64 - 1
bigNum = max64BitInt + 126
print 'This is a big number ', bigNum

and it prints the answer correctly. So Python does not have the 64 bit limit for integers the way Java does. In Groovy, numbers can be BigInteger types and can be arbitrarily large. I do not know if this is true in Python. However what we have is definitely better than Java's support for working with large numbers.

Magic method: Python has magic methods. These methods begin and end with double underscores, like __str__(). Magic methods in Python have special meaning, in that they exist to be called back when a certain procedure is invoked. For instance the __len__() method of object o is invoked when we call the len(o) function. It returns the logical length of the object. This is a nice feature because it allows the system to have certain standard functions like len(), repr(), print(), etc and allow developers to have magic methods in their classes if they want their classes to respond to these standard functions. This is how Python also supports operator overloading, as well as list iteration and indexing.

Procedural and OO programming styles: Even though everything in Python is an object, Python supports both procedural and object oriented programming styles. I believe it is also possible to write functional code in Python. But I do not know enough about it. That is a topic for a future post. I like support for procedural programming, because it makes it easy to create small useful scripts. One of the reasons I could never write those in Java was because I would have to create a class and a main method and also put the script through the compile phase. But then again, Java was never meant to be a scripting language. Python also has globally available functions like len(), print(), repr() etc. For example len(o) takes an object o, (maybe a list or a string), and returns it's length. In Java we would have invoked o.length(). I guess there are pros and cons to both approaches. Java's style is more object oriented, but it does not guarantee uniformity. We use the length attribute to get an array's length and the size() method to get the length of a List. In Python if an object o has a logical length, we can get it by invoking len(o).

I have some more impressions on Python, which I will publish in the next post.

6 comments:

Anonymous said...

"Personally I like not having access modifiers. Using convention to specify a member as private is fine in my books, because I have never thought of access modifiers as a security feature. It is rather a feature that prevents well meaning programmers from making mistakes. If there is a simple convention like leading underscores signifying a private member, then well meaning programmers will not access it directly. We do not need to treat developers as children and put fences all around the place."

Interesting comment.

For me, access modifiers are the enablers of information hiding. I see them as reducing the potential cost of updates; if that's a fence, then that's ok with me.

The International Organisation for Standardization defines encapsulation as, 'The property that the information contained in an object is accessible only through interactions at the interfaces supported by the object.'

Thus, as some information is accessible via these interfaces, some information must be hidden and inaccessible within the object. The property such information exhibits is called information hiding, which Parnas defined by arguing that modules should be designed to hide both difficult decisions and decisions that are likely to change.

Note that word: change. Information hiding concerns potential events, such as the changing of difficult design decisions in the future.

Consider a class with two methods: method a() which is information hidden within the class, and method b() which is public and thus accessible directly by other classes. (This information hiding is achieved in, for example, Java, by access modifier.)

There is a certain probability that a future change to method a() will require changes in methods in other classes. There is also a certain probability that a future change to method b() will require changes in methods in other classes. The probability that such ripple changes will occur for method a(), however, will usually be lower than that for method b() simply because method b() may be depended upon by more classes.

This reduced probability of ripple impacts is a key benefit of encapsulation.

Consider the maximum potential number of source code dependencies - let us define this as the potential coupling - in any program. Extrapolating from the definitions above, we can say that, given two programs delivering identical functionality to users, the program with the lowest potential coupling is better encapsulated, and that statistically the more well-encapsulated program will be cheaper to maintain and develop, because the cost of the maximum potential change to it will be lower than the maximum potential change to the less well-encapsulated system.

Consider, furthermore, a language with just methods and no classes and hence no means of information hiding methods from one another. Let's say our program has 1000 methods. What is the potential coupling of this program?

Encapsulation theory tells us that, given a system of n public nodes, the potential coupling of this system is n(n-1). Thus the potential coupling of our 1000 public methods is 999,000.

Now let's break that system into two classes, each having 500 methods. As we now have classes, we can choose to have some methods public and some methods private. This will be the case unless every method is actually dependent on every other method (which is unlikely). Let's say that 50 methods in each class is public. What would the potential coupling of the system be?

Encapsulation theory tells us it's: n((n/r) -1 + (r-1)p) where r is the number of classes, and p is the number of public methods per class. This would give our two-class system an potential coupling of 499,000. Thus the maximum potential cost of a change in this two-class system is already substantially lower than that of the unencapsulated system.

Let's say you break your system into 3 classes, each having 333 classes (well, one will have 334), and again each with 50 public methods. What's the potential coupling? Using the above equation again, the potential coupling would be approximately 482,000.

If the system is broken into 4 classes of 250 methods each, the potential coupling will would be 449,000.

If may seem that increasing the number of classes in our system will always decrease its potential coupling, but this is not so. Encapsulation theory shows that the number of classes into which the system should be decomposed to minimise potential coupling is: r = sqrt(n/p), which for our system is actually 4. A system with 6 classes, for example, would have an potential coupling of 465,666.

The Principle of Burden takes two forms.

The strong form states that the burden of transforming a collection of entities is a function of the number of entities transformed. The weak form states that the maximum potential burden of transforming a collection of entities is a function of the maximum potential number of entities transformed.

In slightly more detail, the burden of creating or modifying any software system is a function of the number of program units created or modified.

Program units that depend on a particular, modified program unit have a higher probability of being impacted than program units that do not depend on the modified program unit.

The maximum potential burden an modified program unit can impose is the impacting of all program units that depend on it.

Reducing the dependencies on an modified program unit therefore reduces the probability that its update will impact other program units and so reduces the maximum potential burden that that program unit can impose.

Reducing the maximum potential number of dependencies between all program units in a system therefore reduces the probability that an impact to a particular program unit will cause updates to other program units, and thus reduces the maximum potential burden of all updates.

Thus, encapsulation is a foundation stone of object orientation and encapsulation helps us to reduce the maximum potential number of dependencies between all program units, to mitigate the weak form of the Principle of Burden, and to reduce the cost of any potential update to our software.

Regards,

Ed Kirwan.

Parag said...

Hello Ed,

Thank you for the excellent comment. I totally agree about the merit and need for proper encapsulation.

What I was trying to point out was how philosophically different Python and Java are in the way they implement it.

Both of them agree that code should be well modularized and encapsulated. However Java has access modifiers and locks the private members of a class thus disallowing their usage. Python on the other hand relies on a coding standard which simply warns developers that they are about to touch something which may be internal to the class. It's more like putting a "Do Not Disturb" sign and relying on the good taste of programmers to honor it.

--
Regards

Vasudev Ram said...

Hi Parag,

Interesting post overall.

>Multiple inheritance: Python supports multiple inheritance. Again I am not sure I like this.

I'm interested to know why you don't like multiple inheritance.

- Vasudev

Parag said...

Hi Vasudev,

Besides confusions which ensue due to the diamond problem in multiple inheritance, I prefer single inheritance in languages because it keeps the code clean.

Multiple inheritance is something which can be used injudiciously for code reuse.

Even as I write this, I realize that I am contradicting something I said in this blog post... I favored not having access modifiers even though they are something that can enforce encapsulation, but at the same time I am speaking for not having multiple inheritance.

I would like to know if you have any experiences where multiple inheritance helped you achieve a good design.

--
Thanks
Parag

Alstar77 said...

Great post. I'm interested in your points on encapsulation. I always took access modifiers in Java and C# to be useful security tools.

For example, a bank's login may ask for the 2nd, 3rd and 6th letters of the customer's secret word. The secret word property would be set to private to reduce the chance of anyone being able to maliciously read it. Would the single underscore achieve this in Python?

Parag said...

@AllStar77

For a very long time I too thought that access modifiers were for security... not sure, but I think I picked it up when I was in college :-)

I do not think making a property private will prevent malicious code from reading it. Making a property private only prevents code written within our own codebase from reading it. In case some code in our codebase really wants to be malicious it can still use reflection to access private members.

Data hiding is strictly a language feature to promote a certain style of coding, or to make it difficult for people to write bad code.

If malicious code (code which has somehow come into our system from elsewhere) gets a hook into the JVM (which I think is quite difficult) then it can access memory and will be able to read anything it wants to as long as it understands what it is reading. When something is stored in memory the fact that it is private will not prevent an external entity to read that block of memory.

I believe (and I could be wrong) that access modifiers work purely at the compiler level. They tell the compiler that you are trying to access code which you should not be accessing. Once the code is compiled I am not very sure if the JVM does any further access checks.