Quality Stage is a IBM Info Sphere Data stage tool. This IBM Info Sphere Quality Stage is used to main purpose in Data stage for cleaned data provided by IBM Info Sphere Quality Stage allows fraud detection, organizations for research, creation of business intelligence on individuals and , and planning Information.
Out of the box IBM Info Sphere Quality Stag provides for cleansing of Address,name and mare related data and some related types of data such as E-Mail addresses, tax IDs and so on.
With IBM Info Sphere Quality Stage it is possible, for example, to construct consolidated users and
household views, enabling more effective up-selling, cross-selling and customer
retention, and to help to improve customer support and service, for example by
identifying a Organization most profitable customers.
Out of the box IBM Info Sphere Quality Stag provides for cleansing of Address,name and mare related data and some related types of data such as E-Mail addresses, tax IDs and so on.
Quality Stage is a IBM Info Sphere Data Ware Housing Tool in-tended to deliver huge quality data required for success in a
range of organization initiatives including business intelligence, tradition consolidation and master data management. It does this primarily by identifying components of data that may be in columns or free format, standardization the values and formats of those data, using the standardized decisions and other generated values to determine likely duplicate values or records, and building a “best of breed” record out of these sets of potential duplicates values.
Through its intuitive user interface IBM Info Sphere Quality Stage substantially reduces time and cost to
implement Customer Relationship Management (CRM), data warehouse/business
intelligence (BI), data governance, and other strategic IT initiatives and maximizes their
return on investment by ensuring their data quality.
1.Why investigate stage in Data Stage :
à Discover trends and potential anomalies in data
à Identify invalid and default values in a data
à Verify the reliability of the data in the fields to be used as a matching criteria
à Gain complte understanding of the data in a context
With IBM Info Sphere Quality Stage it is possible, for example, to construct consolidated users and
household views, enabling more effective up-selling, cross-selling and customer
retention, and to help to improve customer support and service, for example by
identifying a Organization most profitable customers.
Investigate Stage:
Verify the domain:
Review each field and verify the data matches the meta data
Identify the data formats and missing and default values
Identify the data anomalies:
Format
Structure
Content
Feature of investigate:
Analyze free form and single domain columns
Provide frequency distribution of distinct values and patterns
Investigaet methods:
Character discrete
Character concatenate
Word investigate
Investigate default column names for Pattern Report:
1.Qsinvcolumn name:
2.QsInvPattern
3.QsInvsample
4.QsInvcount
5.Qsinvpercentgae
Investigate default column names for column Report:
1.QsInvcount
2.QsInvword
3.QsInvclasscode
Example :
Chardiscreate C mask (select one or many columns)
Characterconcatenate C MASK(select two or more columns concate nate)
WordInvstgate:FullName:
Token Rpt
Pattern Rpt
WordInvestigate:Address(pass address line 1,address line2)
Token Rpt
Pattern Rpt
WordInvestigate:Area(city ,state,Zip)
Token Rpt
Pattern Rpt
2.Standardize stage:
1.country identifier:
--- >select the rule set from others COUNTRY
--- > pass the literal ZQUSZQ and add the columns addressline1,addressline 2,city ,state,zip
--- > filter the records where ever we have flag ‘Y’ Those or US records
--- >split US, non US records into separate target
2. Apply the USPREP rule set to filter name components from address fields, and area components from address fields
n ->Select USPREP rule set from standardize rules
n ->pass ZQNAMEZQ and add the column “Fullname”
n ->pass ZQADDRZQ and add the column “addressline1”
n ->pass ZQADDRZQ and add the column “addressline2”
n ->pass ZQAREAZQ and add the column “City”
n ->pass ZQAREAZQ and add the column “State”
n ->pass ZQAREAZQ and add the column “Zip”
Standardize USNAME USADDR USAREA
1.Select USNAME rule set from standardize rules and add the clumn NameDomain_USPREP
2. select new process and select the USADDR rule set and add the column AddressDomain_USPREP
3. select new process and select the USAREA rule set and add the column AreaDomain_USPREP
Rules Columns
USNAME.SET NameDomain_USPREP
USADDR.SET AddressDomain_USPREP
USAREA.SET AreaDomain_USPREP
Investigate unhandled name patterns
Take the above job as input and use 3 investigate stages
1 for Inv Unhandled Name
2. for InvUnhandeldAddr
3.for InvUnhandledArea
Inv Unhandled Name:
select the method character concatenate for Name
select the columns
UnhandledPattern_USNAME, --- >set C mask
UnhandledData_USNAME--- >set X mask
InputPattern_USNAME--- >set X mask
NameDomain_USPREP--- >set X mask
InvUnhandeldAddr:
select the method character concatenate for Address
select the columns
UnhandledPattern_USADDR, --- >set C mask
UnhandledData_USADDR--- >set X mask
InputPattern_USADDR--- >set X mask
AddressDomain_USPREP--- >set X mask
InvUnhandeldArea:
select the method character concatenate for Area
select the columns
UnhandledPattern_USAREA, --- >set C mask
UnhandledData_USAREA--- >set X mask
InputPattern_USAREA--- >set X mask
AreaDomain_USPREP--- >set X mask
No comments:
Post a Comment