Wednesday, 5 May 2010

JVM and GC

I got training on JVM. Few words about it.

Every Java Programmer knows java virtual machine. It translates the bytecode (which is generated by java compiler) into machine instructions and executes them.
JVM makes java portable anywhere i.e. platform independent.

JDK used to be very slow till 1.2 because of caching of instruction set and no optimization of it.
From JDK 1.3, it became faster. Why? JVM included hotspot technology.
This hotspot developed by Oracle Corporation, has many optimizing techniques that made java to execute faster. Few of these techniques are -
1) Loop unrolling – e.g. it unrolls a for loop by copying the statements inside loop n number of times where n is the number of times for which for loop is going to be run. This avoids the use of counters. If you do not have counters during iterations. You do not have to spend time in reading, incrementing and writing them back. So this technique makes execution of loop faster.
2) Method inlining
3) Flow rearranging
4) Efficient garbage collection algorithms

Java Hotspot has 2 types of VMs –
1) Java HotSpot Client VM – During start up of an application, this VM loads only those classes which are required at start up. It loads other classes lazily (when required). This reduces start up time of an application but decreases the performance.

2) Java HotSpot Server VM – This is opposite. It loads all the classes at the start up and improves performance.

You can use the option -server or - client during execution of your application to enable respective hotspot VM.
Java HotSpot Client and Java HotSpot Server compilers translate bytecode to machine instructions.


Now something about Garbage Collector.
GC is part of JVM. Hotspot technology highly improved its performance.
But what does garbage collector do?
It collects and destroys the garbage of java which is collection of dead objects or unreferenced objects.

How does it come to know whether some object is to be removed or not?
It uses root set. Some basic information about root set is present here.

This root set is the set of those objects which have direct reference from JVM and not from some other object. E.g. object A was referenced directly by JVM. So any object say B which has its reference present inside object A is called an indirectly referenced object. This need not to be part of root set because JVM knows about it via object A. When JVM sees that object A is not referred anywhere in the program, it destroys A as well as B (unless B is also referred inside some other object or it is part of root set.) GC keeps checking for each object inside root set and destroys objects which have no direct or indirect references.)
This is how GC works with the help of root set.

Can we call GC explicitly in our program anywhere we want?
Yes. System.gc(); will do it.

You can see the GC output by executing your application with option –verbosegc.
e.g. java –verbosegc Test

GC works like a simple single threaded program or like multithreaded program. In multithreaded way, it improves the performance of program execution.

Some GC algorithms that were introduced by hotspot technology:
1) Sweeping GC:

It simply removes the object from the memory and frees the memory. Whenever next object is born, that will be placed at free location.

2) Compacting GC:

It is bit more advanced than sweeping GC but still simple. After freeing the memory, it gathers all the objects at one place in the memory, so to avoid memory leaks which are possible in sweeping GC.

3) Copying GC. – This is something interesting. It is bit complex and is assumed to be the most efficient GC algorithm if there is very much to clean at a time.

It divides memory heap into active and inactive regions. Whenever an object is created it gets space in active region. After freeing the active region during GC call, it copies the existing objects in inactive region. And this inactive region becomes active region and vice-versa. This is useful when you want to clean large number of objects and need to keep only few.

To understand this algorithm in better way we need to know that there are 2 types of memory heaps
[*In Java, all the objects are located inside memory heap and their references are present in method stacks. By the way, an object present in memory heap can carry a reference to any other object.]
1) Monolithic heap – It is as simple as a cupboard which has no shelf.

2) Generational heap –

From the diagram we can see that there are 2 main regions –
a) New object area – this has 3 divisions – new EDEN, survivor space 1 and 2
b) Old object area
New object area has application of copying GC.
Once EDEN + survivor space 1 = active region and survivor space 2 = inactive region.
And then after next iteration of GC EDEN + survivor space 2 = active region and survivor space 1 = inactive region.
It goes on. This GC is called minor GC.
(This is for the scenarios when objects get created in large number as well as destroyed in large numbers. This is for short living objects. E.g. transaction related or some use case related objects in an enterprise application.)

Old object area can have application of any of sweeping and compacting GC algorithms. This GC is called major GC.
(This is for the scenario when objects are created once and then are deleted much more later. i.e. this is for long living objects. E.g. start up objects which born on server start up and die when server shuts down.)

In case of long living objects, sweeping or compacting GC results in efficient algorithm as they are less complex.

When an object is triggered to be an old object?
There are 2 triggers that we can configure with command line parameters.
a) By counter - an object which lives for more than say n iterations of copying GC is moved to old object area.
b) By percentage. – say when n% of SP1 becomes full, m% of it is moved to old object area. (Older objects are preferred.)

When is GC called by JVM?
We can configure this duration in command line parameters.

Check this for command line parameters.

Of course there is lot more information on it.

No comments: