Surrogate Key Generation in DataStage - An elegant way

April 26, 2008 · Filed Under DataStage Articles, Tech Articles · 1 Comment 

An elegant and fast way to generate surrogate keys in a parallel job!

This is a hot topic discussed and attempted by most of the ETL architects, designers and developers. This article looks at an elegant way for Surrogate Key Generation in a DataStage Parallel job, without having the overhead of creating multiple jobs or state file maintenance. This might fall slightly into the advanced way or for power users, as this includes creation of a parallel routine using DataStage Development Kit (Job Control Interfaces). But the strategy is definitely simple and elegant, and you can do it in one job and maintain the surrogate key in a centralised and editable location – an environment Variable defined in Administrator. Gives you wings to use it across the project in different jobs as well.
Read more

Datastage Parallel (C++) routine to create files dynamically

April 26, 2008 · Filed Under DataStage Articles, SOA, Tech Articles · 1 Comment 

A Datastage parallel (C++) routine to create files dynamically for every record in single job. This is a feature most of the time a SOA or real time environment demands. From a stream of sinlge source or multiple sources create files for each input record or for a set of records according to a condition. With this routine you can dynamically pass your file path, file name and extension and also the records to be written into the file. If you want multiple records to be written into one file, store each record in a stage variable with new line character and finally pass that to this routine call. Records can be of different metadata.
Read more

Size & Effort Estimation Model for ETL

April 26, 2008 · Filed Under Tech Articles · 14 Comments 

This model will simplify and accelerate the process of size & effort estimation for ETL job development using a ETL tool. The purpose of this paper is to define a method for estimating ETL job development for DataStage / Informatica jobs by calculating job size and complexity based on the job pattern and specific challenges within the job. This Size & Effort Estimation model will help to predict how many Horizontal and vertical design blocks should be produced and how many developers are required and how long it will take. The approach is based on data typically produced during early stages of software development from various successfully executed projects.
Read more

Datastage Enterprise Edition with Teradata

April 26, 2008 · Filed Under DataStage Articles · Comment 

DataStage (DS) has stages that allow you to use FastExport, MultiLoad, FastLoad and TPump. In addition, you can use the Teradata (TD) API stage and ODBC stage to do Extraction/loading/Lookup/manipulating of data. With IBM Information Server (DataStage 8x - The latest version of DataStage), the most awaited Teradata Connecter for Teradata Parallel Transporter (TPT / Teradata PT) stage joined the TD stages fleet. There is more good news coming with IBM Information Server:
>> Supports TD stored procedures
>> Supports TD macros
>> Supports restart capability and reject links for bulk loads

Read more

Rock your Data Warehousing with SOA

April 26, 2008 · Filed Under SOA, Tech Articles · Comment 

The lazy big giant who serviced the BI world on a periodic basis is getting an overhaul. It’s all set to keep a finger on the pulse of business and ship all the different [BI] pieces that customers require. Service-oriented architecture makes room for data warehousing! This could be the beginning of a perfect marriage.

The relationship between data warehousing and SOA was always complicated. Opposite ends of a spectrum - aggregated vs. federated data, data centric vs. process centric. But how about bringing them together to deliver data as a service? The suitability of a service-oriented architecture (SOA) to transform data warehouses into information as a service is addressing this challenge, though with important conditions and qualifications.
Read more