DataStage Overview | Tutorial | Essentials
Finally I got enough time to make “DataStage Overview” PPT deck online. A series of simple and informative slides about this great BI tool. This ppt deck provides an overview of DataStage and is a result of feedbacks and experience gained across numerous training sessions.
Surrogate Key Generation in DataStage - An elegant way
An elegant and fast way to generate surrogate keys in a parallel job!
This is a hot topic discussed and attempted by most of the ETL architects, designers and developers. This article looks at an elegant way for Surrogate Key Generation in a DataStage Parallel job, without having the overhead of creating multiple jobs or state file maintenance. This might fall slightly into the advanced way or for power users, as this includes creation of a parallel routine using DataStage Development Kit (Job Control Interfaces). But the strategy is definitely simple and elegant, and you can do it in one job and maintain the surrogate key in a centralised and editable location – an environment Variable defined in Administrator. Gives you wings to use it across the project in different jobs as well.
Read more
Datastage Parallel (C++) routine to create files dynamically
A Datastage parallel (C++) routine to create files dynamically for every record in single job. This is a feature most of the time a SOA or real time environment demands. From a stream of sinlge source or multiple sources create files for each input record or for a set of records according to a condition. With this routine you can dynamically pass your file path, file name and extension and also the records to be written into the file. If you want multiple records to be written into one file, store each record in a stage variable with new line character and finally pass that to this routine call. Records can be of different metadata.
Read more
Real Time Integration using Datastage
An exploration of an IBM WebSphere DataStage based Real Time Integration Architecture (RTIA) solution featuring event-driven, message-based and trigger based infrastructure for the cost-effective integration of multiple applications. DataStage can operate in real-time, capturing messages or extracting data at a moment’s notice on the same platform that also integrates bulk data. This provides a key advantage over competing offerings that require the use of two separate tools to achieve the same functionality. Read more
Size & Effort Estimation Model for ETL
This model will simplify and accelerate the process of size & effort estimation for ETL job development using a ETL tool. The purpose of this paper is to define a method for estimating ETL job development for DataStage / Informatica jobs by calculating job size and complexity based on the job pattern and specific challenges within the job. This Size & Effort Estimation model will help to predict how many Horizontal and vertical design blocks should be produced and how many developers are required and how long it will take. The approach is based on data typically produced during early stages of software development from various successfully executed projects.
Read more

