Data Warehousing Interview Questions
Here we are providing Data Ware Housing Interview Questions in this Page. This questions are very useful to Interview. In this page data warehousing Concepts, Data ware housing Interview Questions and Data Ware housing Basic are Explained Deeply. Large amount of data is stored in any target system o server is called data ware housing. The Data ware concepts for first mainly explained topics are Schema, Data ware house deep Explanation and Data mart topics are Explained.
In this data Warehousing Interview Questions for We are providing Topics are IBM Info sphere Data stage Topics, Infomatica Topics and Stages are Explained deeply. This Data ware Housing Concepts and Data Warehousing Interview Questions are very useful to develop your technical Skills for ETL Tool developers.
This Informatica Interview Questions are deeply explained for Search users. here in this page proving topics are mainly Data Warehousing Interview Questions, Data Warehousing Concepts, Data Integration Questions and Informatica Interview questions, IBM Info Sphere Data Stage Interview Questions.
Q : What is the difference between snow flake and star schema
Ans :
Star Schema | Snow Flake Schema |
The star schema is the simplest data warehouse scheme. | Snowflake schema is a more complex data warehouse model than a star schema. |
In star schema each of the dimensions is represented in a single table .It should not have any hierarchies between dims. | In snow flake schema at least one hierarchy should exists between dimension tables. |
It contains a fact table surrounded by dimension tables. If the dimensions are de-normalized, we say it is a star schema design. | It contains a fact table surrounded by dimension tables. If a dimension is normalized, we say it is a snow flaked design. |
In star schema only one join establishes the relationship between the fact table and any one of the dimension tables. | In snow flake schema since there is relationship between the dimensions tables it has to do many joins to fetch the data. |
A star schema optimizes the performance by keeping queries simple and providing fast response time. All the information about the each level is stored in one row. | Snowflake schemas normalize dimensions to eliminated redundancy. The result is more complex queries and reduced query performance. |
It is called a star schema because the diagram resembles a star. | It is called a snowflake schema because the diagram resembles a snowflake. |
Q : Difference between data mart and data warehouse
Ans :
Data Mart | Data Warehouse |
Data mart is usually sponsored at the department level and developed with a specific issue or subject in mind, a data mart is a data warehouse with a focused objective. | Data warehouse is a “Subject-Oriented, Integrated, Time-Variant, Nonvolatile collection of data in support of decision making”. |
A data mart is used on a business division/ department level. | A data warehouse is used on an enterprise level |
A Data Mart is a subset of data from a Data Warehouse. Data Marts are built for specific user groups. | A Data Warehouse is simply an integrated consolidation of data from a variety of sources that is specially designed to support strategic and tactical decision making. |
By providing decision makers with only a subset of data from the Data Warehouse, Privacy, Performance and Clarity Objectives can be attained. | The main objective of Data Warehouse is to provide an integrated environment and coherent picture of the business at a point in time. |
Q : Differences between connected lookup and unconnected lookup ?
Ans :
Connected Lookup | Unconnected Lookup |
This is connected to pipleline and receives the input values from pipleline. | Which is not connected to pipeline and receives input values from the result of a: LKP expression in another transformation via arguments. |
We cannot use this lookup more than once in a mapping. | We can use this transformation more than once within the mapping |
We can return multiple columns from the same row. | Designate one return port (R), returns one column from each row. |
We can configure to use dynamic cache. | We cannot configure to use dynamic cache. |
Pass multiple output values to another transformation. Link lookup/output ports to another transformation. | Pass one output value to another transformation. The lookup/output/return port passes the value to the transformation calling: LKP expression. |
Use a dynamic or static cache | Use a static cache |
Supports user defined default values. | Does not support user defined default values. |
Cache includes the lookup source column in the lookup condition and the lookup source columns that are output ports. | Cache includes all lookup/output ports in the lookup condition and the lookup/return port. |
Q : What is the difference between joiner and lookup ?
Ans :
Joiner | Lookup |
In joiner on multiple matches it will return all matching records. | In lookup it will return either first record or last record or any value or error value. |
In joiner we cannot configure to use persistence cache, shared cache, uncached and dynamic cache | Where as in lookup we can configure to use persistence cache, shared cache, uncached and dynamic cache. |
We cannot override the query in joiner | We can override the query in lookup to fetch the data from multiple tables. |
We can perform outer join in joiner transformation. | We cannot perform outer join in lookup transformation. |
We cannot use relational operators in joiner transformation.(i.e. <,>,<= and so on) | Where as in lookup we can use the relation operators. (i.e. <,>,<= and so on) |
Q : What is the difference between source qualifier and lookup ?
Ans
Source Qualifier | Lookup |
In source qualifier it will push all the matching records. | Where as in lookup we can restrict whether to display first value, last value or any value |
In source qualifier there is no concept of cache. | Where as in lookup we concentrate on cache concept. |
When both source and lookup are in same database we can use source qualifier. | When the source and lookup table exists in different database then we need to use lookup. |
Q : Differences between dynamic lookup and static lookup ?
Ans :
Dynamic Lookup Cache | Static Lookup Cache |
In dynamic lookup the cache memory will get refreshed as soon as the record get inserted or updated/deleted in the lookup table. | In static lookup the cache memory will not get refreshed even though record inserted or updated in the lookup table it will refresh only in the next session run. |
When we configure a lookup transformation to use a dynamic lookup cache, you can only use the equality operator in the lookup condition. NewLookupRow port will enable automatically. | It is a default cache. |
Best example where we need to use dynamic cache is if suppose first record and last record both are same but there is a change in the address. What informatica mapping has to do here is first record needs to get insert and last record should get update in the target table. | If we use static lookup first record it will go to lookup and check in the lookup cache based on the condition it will not find the match so it will return null value then in the router it will send that record to insert flow. But still this record dose not available in the cache memory so when the last record comes to lookup it will check in the cache it will not find the match so it returns null value again it will go to insert flow through router but it is suppose to go to update flow because cache didn’t get refreshed when the first record get inserted into target table. |
Informatica & IBM Info Sphere Data Stage Interview Questions
Q : How to Process multiple flat files to single target table through informatica if all files are same structure?
Ans :
We can process all flat files through one mapping and one session using list file.
First we need to create list file using unix script for all flat file the extension of the list file is .LST.
This list file it will have only flat file names.
At session level we need to set
source file directory as list file path
And source file name as list file name
And file type as indirect.
Q : SCD Type-II Effective-Date Approach ?
Ans :
· We have one of the dimension in current project called resource dimension. Here we are maintaining the history to keep track of SCD changes.
· To maintain the history in slowly changing dimension or resource dimension. We followed SCD Type-II Effective-Date approach.
· My resource dimension structure would be eff-start-date, eff-end-date, s.k and source columns.
· Whenever I do a insert into dimension I would populate eff-start-date with sysdate, eff-end-date with future date and s.k as a sequence number.
· If the record already present in my dimension but there is change in the source data. In that case what I need to do is
· Update the previous record eff-end-date with sysdate and insert as a new record with source data.
Q : Informatica design to implement SDC Type-II effective-date approach?
Ans :
· Once you fetch the record from source qualifier. We will send it to lookup to find out whether the record is present in the target or not based on source primary key column.
· Once we find the match in the lookup we are taking SCD column and s.k column from lookup to expression transformation.
· In lookup transformation we need to override the lookup override query to fetch active records from the dimension while building the cache.
· In expression transformation I can compare source with lookup return data.
· If the source and target data is same then I can make a flag as ‘S’.
· If the source and target data is different then I can make a flag as ‘U’.
· If source data does not exists in the target that means lookup returns null value. I can flag it as ‘I’.
· Based on the flag values in router I can route the data into insert and update flow.
· If flag=’I’ or ‘U’ I will pass it to insert flow.
· If flag=’U’ I will pass this record to eff-date update flow
· When we do insert we are passing the sequence value to s.k.
· Whenever we do update we are updating the eff-end-date column based on lookup return s.k value.
Q : Complex Mapping
Ans :
· We have one of the order file requirement. Requirement is every day in source system they will place filename with timestamp in informatica server.
· We have to process the same date file through informatica.
· Source file directory contain older than 30 days files with timestamps.
· For this requirement if I hardcode the timestamp for source file name it will process the same file every day.
· So what I did here is I created $InputFilename for source file name.
· Then I am going to use the parameter file to supply the values to session variables ($InputFilename).
· To update this parameter file I have created one more mapping.
· This mapping will update the parameter file with appended timestamp to file name.
· I make sure to run this parameter file update mapping before my actual mapping.
Q : How to handle errors in informatica?
Ans :
· We have one of the source with numerator and denominator values we need to calculate num/deno
· When populating to target.
· If deno=0 I should not load this record into target table.
· We need to send those records to flat file after completion of 1st session run. Shell script will check the file size.
· If the file size is greater than zero then it will send email notification to source system POC (point of contact) along with deno zero record file and appropriate email subject and body.
· If file size<=0 that means there is no records in flat file. In this case shell script will not send any email notification.
· Or
· We are expecting a not null value for one of the source column.
· If it is null that means it is a error record.
· We can use the above approach for error handling.
Q : Worklet
Ans :
Worklet is a set of reusable sessions. We cannot run the worklet without workflow.
If we want to run 2 workflow one after another.
· If both workflow exists in same folder we can create 2 worklet rather than creating 2 workfolws.
· Finally we can call these 2 worklets in one workflow.
· There we can set the dependency.
· If both workflows exists in different folders or repository then we cannot create worklet.
· We can set the dependency between these two workflow using shell script is one approach.
· The other approach is event wait and event rise.
Q : In shell script approach
Ans :
· As soon as it completes first workflow we are creating zero byte file (indicator file).
· If indicator file is available in particular location. We will run second workflow.
· If indicator file is not available we will wait for 5 minutes and again we will check for the indicator. Like this we will continue the loop for 5 times i.e 30 minutes.
· After 30 minutes if the file does not exists we will send out email notification.
Event wait and Event rise approach
In event wait it will wait for infinite time. Till the indicator file is available.
Informatica & IBM Info Sphere Data Stage Interview Questions
Q : Why we need source qualifier?
Ans :
Simply it performs select statement.
Select statement fetches the data in the form of row.
Source qualifier will select the data from the source table.
It identifies the record from the source.
Parameter fileit will supply the values to session level variables and mapping level variables.
Variables are of two types:
· Session level variables
· Mapping level variables
Session level variables are of four types:
· $DBConnection_Source
· $DBConnection_Target
· $InputFile
· $OutputFile
Mapping level variables are of two types:
· Variable
· Parameter
Q : What is the difference between mapping level and session level variables?
Ans :
Mapping level variables always starts with $$.
A session level variable always starts with $.
Q : Flat File
Ans :
Flat file is a collection of data in a file in the specific format.
Informatica can support two types of files
· Delimiter
· Fixed Width
In delimiter we need to specify the separator.
In fixed width we need to known about the format first. Means how many character to read for particular column.
In delimiter also it is necessary to know about the structure of the delimiter. Because to know about the headers.
If the file contains the header then in definition we need to skip the first row.
List file:
If you want to process multiple files with same structure. We don’t need multiple mapping and multiple sessions.
We can use one mapping one session using list file option.
First we need to create the list file for all the files. Then we can use this file in the main mapping.
Q : Aggregator Transformation:
Transformation type:
Active
Connected
The Aggregator transformation performs aggregate calculations, such as averages and sums. The Aggregator transformation is unlike the Expression transformation, in that you use the Aggregator transformation to perform calculations on groups. The Expression transformation permits you to perform calculations on a row-by-row basis only.
Components of the Aggregator Transformation:
The Aggregator is an active transformation, changing the number of rows in the pipeline. The Aggregator transformation has the following components and options
Aggregate cache: The Integration Service stores data in the aggregate cache until it completes aggregate calculations. It stores group values in an index cache and row data in the data cache.
Aggregate expression: Enter an expression in an output port. The expression can include non-aggregate expressions and conditional clauses.
Group by port: Indicate how to create groups. The port can be any input, input/output, output, or variable port. When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.
Sorted input: Select this option to improve session performance. To use sorted input, you must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.
Q : Aggregate Expressions:
Ans :
The Designer allows aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions. It can also include one aggregate function nested within another aggregate function, such as:
MAX (COUNT (ITEM))
The result of an aggregate expression varies depending on the group by ports used in the transformation
Informatica & IBM Info Sphere Data Stage Interview Questions
Q : Aggregate Functions
Ans :
Use the following aggregate functions within an Aggregator transformation. You can nest one aggregate function within another aggregate function.
The transformation language includes the following aggregate functions:
| AVG |
| COUNT |
| FIRST |
| LAST |
| MAX |
| MEDIAN |
| MIN |
| PERCENTILE |
| STDDEV |
| SUM |
| VARIANCE |
When you use any of these functions, you must use them in an expression within an Aggregator transformation.
Tips
Use sorted input to decrease the use of aggregate caches.
Sorted input reduces the amount of data cached during the session and improves session performance. Use this option with the Sorter transformation to pass sorted data to the Aggregator transformation.
Limit connected input/output or output ports.
Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache.
Filter the data before aggregating it.
If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation.
Normalizer Transformation:
Transformation type:
Active
Connected
The Normalizer transformation receives a row that contains multiple-occurring columns and returns a row for each instance of the multiple-occurring data.
The Normalizer transformation parses multiple-occurring columns from COBOL sources, relational tables, or other sources. It can process multiple record types from a COBOL source that contains a REDEFINES clause.
The Normalizer transformation generates a key for each source row. The Integration Service increments the generated key sequence number each time it processes a source row. When the source row contains a multiple-occurring column or a multiple-occurring group of columns, the Normalizer transformation returns a row for each occurrence. Each row contains the same generated key value.
SQL Transformation
Transformation type:
Active/Passive
Connected
The SQL transformation processes SQL queries midstream in a pipeline. You can insert, delete, update, and retrieve rows from a database. You can pass the database connection information to the SQL transformation as input data at run time. The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the query and returns rows and database errors.
For example, you might need to create database tables before adding new transactions. You can create an SQL transformation to create the tables in a workflow. The SQL transformation returns database errors in an output port. You can configure another workflow to run if the SQL transformation returns no errors.
When you create an SQL transformation, you configure the following options:
Mode. The SQL transformation runs in one of the following modes:
Script mode. The SQL transformation runs ANSI SQL scripts that are externally located. You pass a script name to the transformation with each input row. The SQL transformation outputs one row for each input row. |
Query mode. The SQL transformation executes a query that you define in a query editor. You can pass strings or parameters to the query to define dynamic queries or change the selection parameters. You can output multiple rows when the query has a SELECT statement.
Database type. The type of database the SQL transformation connects to.
Connection type. Pass database connection information to the SQL transformation or use a connection object.
Script Mode
An SQL transformation running in script mode runs SQL scripts from text files. You pass each script file name from the source to the SQL transformation ScriptName port. The script file name contains the complete path to the script file.
When you configure the transformation to run in script mode, you create a passive transformation. The transformation returns one row for each input row. The output row contains results of the query and any database error.
When the SQL transformation runs in script mode, the query statement and query data do not change. When you need to run different queries in script mode, you pass the scripts in the source data. Use script mode to run data definition queries such as creating or dropping tables.
When you configure an SQL transformation to run in script mode, the Designer adds the ScriptName input port to the transformation
An SQL transformation configured for script mode has the following default ports:
Port | Type | Description |
ScriptName | Input | Receives the name of the script to execute for the current row. |
ScriptResult | Output | Returns PASSED if the script execution succeeds for the row. Otherwise contains FAILED. |
ScriptError | Output | Returns errors that occur when a script fails for a row. |
Data Ware Housing Concepts
Q : Script Mode Rules and Guidelines?
Ans :
· Use the following rules and guidelines for an SQL transformation that runs in script mode:
· You can use a static or dynamic database connection with script mode.
· To include multiple query statements in a script, you can separate them with a semicolon.
· You can use mapping variables or parameters in the script file name
· The script code page defaults to the locale of the operating system. You can change the locale of the script.
· You cannot use scripting languages such as Oracle PL/SQL or Microsoft/Sybase T-SQL in the script.
· You cannot use nested scripts where the SQL script calls another SQL script
· A script cannot accept run-time arguments.
· The script file must be accessible by the Integration Service. The Integration Service must have read permissions on the directory that contains the script. If the Integration Service uses operating system profiles, the operating system user of the operating system profile must have read permissions on the directory that contains the script.
· The Integration Service ignores the output of any SELECT statement you include in the SQL script. The SQL transformation in script mode does not output more than one row of data for each input row.
Q : Query Mode
When an SQL transformation runs in query mode, it executes an SQL query that you define in the transformation. You pass strings or parameters to the query from the transformation input ports to change the query statement or the query data.
When you configure the SQL transformation to run in query mode, you create an active transformation. The transformation can return multiple rows for each input row.
Create queries in the SQL transformation SQL Editor. To create a query, type the query statement in the SQL Editor main window. The SQL Editor provides a list of the transformation ports that you can reference in the query.
You can create the following types of SQL queries in the SQL transformation:
Static SQL query. The query statement does not change, but you can use query parameters to change the data. The Integration Service prepares the query once and runs the query for all input rows.
Dynamic SQL query. You can change the query statements and the data. The Integration Service prepares a query for each input row. |
When you create a static query, the Integration Service prepares the SQL procedure once and executes it for each row. When you create a dynamic query, the Integration Service prepares the SQL for each input row. You can optimize performance by creating static queries.
Query Mode Rules and Guidelines
· Use the following rules and guidelines when you configure the SQL transformation to run in query mode:
· The number and the order of the output ports must match the number and order of the fields in the query SELECT clause.
· The native datatype of an output port in the transformation must match the datatype of the corresponding column in the database. The Integration Service generates a row error when the datatypes do not match.
· When the SQL query contains an INSERT, UPDATE, or DELETE clause, the transformation returns data to the SQLError port, the pass-through ports, and the NumRowsAffected port when it is enabled. If you add output ports the ports receive NULL data values.
· When the SQL query contains a SELECT statement and the transformation has a pass-through port, the transformation returns data to the pass-through port whether or not the query returns database data. The SQL transformation returns a row with NULL data in the output ports.
· You cannot add the "_output" suffix to output port names that you create.
· You cannot use the pass-through port to return data from a SELECT query.
· When the number of output ports is more than the number of columns in the SELECT clause, the extra ports receive a NULL value.
· When the number of output ports is less than the number of columns in the SELECT clause, the Integration Service generates a row error.
· You can use string substitution instead of parameter binding in a query. However, the input ports must be string datatypes.
Java Transformation Overview
Transformation type:
Active/Passive
Connected
The Java transformation provides a simple native programming interface to define transformation functionality with the Java programming language. You can use the Java transformation to quickly define simple or moderately complex transformation functionality without advanced knowledge of the Java programming language or an external Java development environment.
For example, you can define transformation logic to loop through input rows and generate multiple output rows based on a specific condition. You can also use expressions, user-defined functions, unconnected transformations, and mapping variables in the Java code.
Transaction Control Transformation
Transformation type:
Active
Connected
PowerCenter lets you control commit and roll back transactions based on a set of rows that pass through a Transaction Control transformation. A transaction is the set of rows bound by commit or roll back rows. You can define a transaction based on a varying number of input rows. You might want to define transactions based on a group of rows ordered on a common key, such as employee ID or order entry date.
In PowerCenter, you define transaction control at the following levels:
Within a mapping.Within a mapping, you use the Transaction Control transformation to define a transaction. You define transactions using an expression in a Transaction Control transformation. Based on the return value of the expression, you can choose to commit, roll back, or continue without any transaction changes.
Within a session. When you configure a session, you configure it for user-defined commit. You can choose to commit or roll back a transaction if the Integration Service fails to transform or write any row to the target.
When you run the session, the Integration Service evaluates the expression for each row that enters the transformation. When it evaluates a commit row, it commits all rows in the transaction to the target or targets. When the Integration Service evaluates a roll back row, it rolls back all rows in the transaction from the target or targets.
If the mapping has a flat file target you can generate an output file each time the Integration Service starts a new transaction. You can dynamically name each target flat file.
No comments:
Post a Comment