Performance Improvement Techniques in DataStage
I have come up with two set of superb documents on performance tuning:
1) DataStage Enterprise Edition (PX – Parallel Extender)
These documents are a result of experience gained across numerous successful deployments. It presents a practical handbook for performance improvement techniques which can be used for ETL architecture, job design and development as well as for review, analysis and performance optimization.
Information Server Overview
This is a sequel of DataStage Overview - Information Server Overview.
Information Server - One of the most significant IBM software releases of recent times. A revolutionary new software platform that helps organizations derive more value from the complex, heterogeneous information spread across their systems This ppt deck provides an overview of Information Server.
Read more
Process Management in DataStage
Purpose :
- How to avoid “Kill -9” or “SIGKILL” use.
- How to release a stuck job from DataStage Director.
- How to kill orphaned / runaway processes using DS Administrator.
- Use Kill command wihout “-9″.
- Clean up zombies / orphan phantom processes.
- How to increase number of jobs / processes to run in DataStage.
What are Phantom processes?
Read more
DataStage Overview | Tutorial | Essentials
Finally I got enough time to make “DataStage Overview” PPT deck online. A series of simple and informative slides about this great BI tool. This ppt deck provides an overview of DataStage and is a result of feedbacks and experience gained across numerous training sessions.
Surrogate Key Generation in DataStage - An elegant way
An elegant and fast way to generate surrogate keys in a parallel job!
This is a hot topic discussed and attempted by most of the ETL architects, designers and developers. This article looks at an elegant way for Surrogate Key Generation in a DataStage Parallel job, without having the overhead of creating multiple jobs or state file maintenance. This might fall slightly into the advanced way or for power users, as this includes creation of a parallel routine using DataStage Development Kit (Job Control Interfaces). But the strategy is definitely simple and elegant, and you can do it in one job and maintain the surrogate key in a centralised and editable location – an environment Variable defined in Administrator. Gives you wings to use it across the project in different jobs as well.
Read more

