IBM Info Sphere Quality Stage Full Notes | Investigate Stage | Standardize Stage in Data Stage Full Notes | Scenario's - Pharma Jobs

Saturday, January 4, 2014

IBM Info Sphere Quality Stage Full Notes | Investigate Stage | Standardize Stage in Data Stage Full Notes | Scenario's

Quality Stage is a IBM Info Sphere Data stage tool. This  IBM Info Sphere Quality Stage is used to main purpose in Data stage for cleaned data provided by IBM Info Sphere Quality Stage allows fraud detection, organizations for research, creation of business intelligence on individuals and ,  and planning Information.
Out of the box IBM Info Sphere Quality Stag provides for cleansing of Address,name and mare related data and some related types of data such as E-Mail addresses, tax IDs and so on.

Quality Stage is a IBM Info Sphere Data Ware Housing Tool in-tended to deliver huge quality data required for success in a
range of organization initiatives including business intelligence, tradition consolidation and master data management. It does this primarily by identifying components of data that may be in columns or free format, standardization the values and formats of those data, using the standardized decisions and other generated values to determine likely duplicate values or records, and building a “best of breed” record out of these sets of potential duplicates values.

Through its intuitive user interface IBM Info Sphere Quality Stage substantially reduces time and cost to
implement Customer Relationship Management (CRM), data warehouse/business
intelligence (BI), data governance, and other strategic IT initiatives and maximizes their
return on investment by ensuring their data quality.

1.Why investigate stage in Data Stage :

àDiscover trends and potential anomalies in data
àIdentify invalid and default values in a data
 Ã Verify the reliability of the data in the fields to be used as a matching criteria

 Ã Gain complte understanding of the data  in a context

With IBM Info Sphere Quality Stage it is possible, for example, to construct consolidated users and
household views, enabling more effective up-selling, cross-selling and customer
retention, and to help to improve customer support and service, for example by
identifying a Organization most profitable customers.

Investigate Stage:
Verify the domain:
Review each field and verify the data matches the meta data
Identify the data formats and missing and default values
Identify the data anomalies:
Format
Structure

Content

Feature of investigate:
Analyze free form and single domain columns
Provide frequency distribution  of distinct values and patterns

Investigaet methods:
Character discrete
Character concatenate
Word investigate

Investigate default column names for Pattern Report:

1.Qsinvcolumn name:
2.QsInvPattern
3.QsInvsample
4.QsInvcount
5.Qsinvpercentgae

Investigate default column names for column Report:

1.QsInvcount
2.QsInvword
3.QsInvclasscode

Example :

Chardiscreate C mask (select one or many columns)
Characterconcatenate C MASK(select two or more columns concate nate)
WordInvstgate:FullName:
Token Rpt
Pattern Rpt
WordInvestigate:Address(pass address line 1,address line2)
Token Rpt
Pattern Rpt
WordInvestigate:Area(city ,state,Zip)
Token Rpt
Pattern Rpt


2.Standardize stage:

1.country identifier:

--- >select the rule set from others COUNTRY
--- > pass the literal ZQUSZQ and add the columns addressline1,addressline 2,city ,state,zip
--- > filter the records where ever we have flag ‘Y’ Those or US records
--- >split US, non US records into separate target

2. Apply the USPREP rule set to filter name components from address fields, and area components from address fields

n  ->Select USPREP rule set from standardize rules
n  ->pass ZQNAMEZQ and add the column “Fullname”
n  ->pass ZQADDRZQ and add the column “addressline1”
n  ->pass ZQADDRZQ and add  the column “addressline2”
n  ->pass ZQAREAZQ and add  the column “City”
n  ->pass ZQAREAZQ and add  the column “State”
n  ->pass ZQAREAZQ and add  the column “Zip”

Standardize USNAME USADDR USAREA

1.Select USNAME rule set from standardize rules and add  the clumn NameDomain_USPREP
2. select new process and select the  USADDR rule set  and add the column AddressDomain_USPREP
3. select new process and select the  USAREA rule set  and add the column AreaDomain_USPREP

Rules                                              Columns
USNAME.SET                               NameDomain_USPREP
USADDR.SET                               AddressDomain_USPREP
USAREA.SET                               AreaDomain_USPREP

Investigate unhandled name patterns

Take the above job as input and  use 3 investigate stages
1 for  Inv Unhandled Name
2. for InvUnhandeldAddr
3.for InvUnhandledArea

Inv Unhandled Name:
select the method character concatenate for Name
select the columns
UnhandledPattern_USNAME, --- >set C mask
UnhandledData_USNAME--- >set X mask
InputPattern_USNAME--- >set X mask
NameDomain_USPREP--- >set X mask

InvUnhandeldAddr:
select the method character concatenate for Address
select the columns
UnhandledPattern_USADDR, --- >set C mask
UnhandledData_USADDR--- >set X mask
InputPattern_USADDR--- >set X mask
AddressDomain_USPREP--- >set X mask

InvUnhandeldArea:
select the method character concatenate for Area
select the columns
UnhandledPattern_USAREA, --- >set C mask
UnhandledData_USAREA--- >set X mask
InputPattern_USAREA--- >set X mask
AreaDomain_USPREP--- >set X mask



No comments:

Post a Comment