IBM Info Sphere Quality Stage Full Notes | Investigate Stage | Standardize Stage in Data Stage Full Notes

Quality Stage is a IBM Info Sphere Data stage tool. This IBM Info Sphere Quality Stage is used to main purpose in Data stage for cleaned data provided by IBM Info Sphere Quality Stage allows fraud detection, organizations for research, creation of business intelligence on individuals and , and planning Information.
Out of the box IBM Info Sphere Quality Stag provides for cleansing of Address,name and mare related data and some related types of data such as E-Mail addresses, tax IDs and so on.

Quality Stage is a IBM Info Sphere Data Ware Housing Tool in-tended to deliver huge quality data required for success in a

range of organization initiatives including business intelligence, tradition consolidation and master data management. It does this primarily by identifying components of data that may be in columns or free format, standardization the values and formats of those data, using the standardized decisions and other generated values to determine likely duplicate values or records, and building a “best of breed” record out of these sets of potential duplicates values.

Through its intuitive user interface IBM Info Sphere Quality Stage substantially reduces time and cost to

implement Customer Relationship Management (CRM), data warehouse/business

intelligence (BI), data governance, and other strategic IT initiatives and maximizes their

return on investment by ensuring their data quality.

1.Why investigate stage in Data Stage :

àDiscover trends and potential anomalies in data

àIdentify invalid and default values in a data

àVerify the reliability of the data in the fields to be used as a matching criteria

àGain complte understanding of the data in a context

With IBM Info Sphere Quality Stage it is possible, for example, to construct consolidated users and
household views, enabling more effective up-selling, cross-selling and customer
retention, and to help to improve customer support and service, for example by
identifying a Organization most profitable customers.

Investigate Stage:

Verify the domain:

Review each field and verify the data matches the meta data

Identify the data formats and missing and default values

Identify the data anomalies:

Format

Structure

Content

Feature of investigate:

Analyze free form and single domain columns

Provide frequency distribution of distinct values and patterns

Investigaet methods:

Character discrete

Character concatenate

Word investigate

Investigate default column names for Pattern Report:

1.Qsinvcolumn name:

2.QsInvPattern

3.QsInvsample

4.QsInvcount

5.Qsinvpercentgae

Investigate default column names for column Report:

1.QsInvcount

2.QsInvword

3.QsInvclasscode

Example :

Chardiscreate C mask (select one or many columns)

Characterconcatenate C MASK(select two or more columns concate nate)

WordInvstgate:FullName:

Token Rpt

Pattern Rpt

WordInvestigate:Address(pass address line 1,address line2)

Token Rpt

Pattern Rpt

WordInvestigate:Area(city ,state,Zip)

Token Rpt

Pattern Rpt

2.Standardize stage:

1.country identifier:

--- >select the rule set from others COUNTRY

--- > pass the literal ZQUSZQ and add the columns addressline1,addressline 2,city ,state,zip

--- > filter the records where ever we have flag ‘Y’ Those or US records

--- >split US, non US records into separate target

2. Apply the USPREP rule set to filter name components from address fields, and area components from address fields

n ->Select USPREP rule set from standardize rules

n ->pass ZQNAMEZQ and add the column “Fullname”

n ->pass ZQADDRZQ and add the column “addressline1”

n ->pass ZQADDRZQ and add the column “addressline2”

n ->pass ZQAREAZQ and add the column “City”

n ->pass ZQAREAZQ and add the column “State”

n ->pass ZQAREAZQ and add the column “Zip”

Standardize USNAME USADDR USAREA

1.Select USNAME rule set from standardize rules and add the clumn NameDomain_USPREP

2. select new process and select the USADDR rule set and add the column AddressDomain_USPREP

3. select new process and select the USAREA rule set and add the column AreaDomain_USPREP

Rules Columns

USNAME.SET NameDomain_USPREP

USADDR.SET AddressDomain_USPREP

USAREA.SET AreaDomain_USPREP

Investigate unhandled name patterns

Take the above job as input and use 3 investigate stages

1 for Inv Unhandled Name

2. for InvUnhandeldAddr

3.for InvUnhandledArea

Inv Unhandled Name:

select the method character concatenate for Name

select the columns

UnhandledPattern_USNAME, --- >set C mask

UnhandledData_USNAME--- >set X mask

InputPattern_USNAME--- >set X mask

NameDomain_USPREP--- >set X mask

InvUnhandeldAddr:

select the method character concatenate for Address

select the columns

UnhandledPattern_USADDR, --- >set C mask

UnhandledData_USADDR--- >set X mask

InputPattern_USADDR--- >set X mask

AddressDomain_USPREP--- >set X mask

InvUnhandeldArea:

select the method character concatenate for Area

select the columns

UnhandledPattern_USAREA, --- >set C mask

UnhandledData_USAREA--- >set X mask

InputPattern_USAREA--- >set X mask

AreaDomain_USPREP--- >set X mask

IBM Info Sphere Quality Stage Full Notes :

IBM Info Sphere Data Stage Full Notes :

IBM Info Sphere Data Profiling & Information Analzer in Data Stage Full Notes :

Latest Jobs

Saturday, January 4, 2014

IBM Info Sphere Quality Stage Full Notes | Investigate Stage | Standardize Stage in Data Stage Full Notes | Scenario's

No comments:

Post a Comment

join telegram group

Popular