What is Data Warehousing?
Database used to generate Reporting and analysis is Called Data Warehousing. The Data Warehousing is Used to mainly large amount of data storing. Extracting data from Any source system and Loading to target system to based various methods. Data Warehousing Concepts are based to large amount of data will be stored on target system, Heavy data storage does not support to any data base directly. But Using Data Warehousing Concepts and Some warehousing rules are used to stored the the data t any target system is called data Warehousing.
Database used to generate Reporting and analysis is Called Data Warehousing. The Data Warehousing is Used to mainly large amount of data storing. Extracting data from Any source system and Loading to target system to based various methods. Data Warehousing Concepts are based to large amount of data will be stored on target system, Heavy data storage does not support to any data base directly. But Using Data Warehousing Concepts and Some warehousing rules are used to stored the the data t any target system is called data Warehousing.
Data Warehousing Concepts & Data Warehousing Interview Questions
Q : What is Difference between OLTP and DWH/DS/OLAP Tool?
Ans :
OLTP | DWH/DSS/OLAP |
OLTP maintains only current information. | OLAP contains full history. |
It is a normalized structure. | It is a de-normalized structure. |
Its volatile system. | Its non-volatile system. |
It cannot be used for reporting purpose. | It’s a pure reporting system. |
Since it is normalized structure so here it requires multiple joins to fetch the data. | Here it does not require much joins to fetch the data. |
It’s not time variant. | Its time variant. |
It’s a pure relational model. | It’s a dimensional model. |
Q : What is Staging area why we need it in DWH?
Ans :
If target and source databases are different and target table volume is high it contains some millions of records in this scenario without staging table we need to design your informatica using look up to find out whether the record exists or not in the target table since target has huge volumes so its costly to create cache it will hit the performance.
If we create staging tables in the target database we can simply do outer join in the source qualifier to determine insert/update this approach will give you good performance.
It will avoid full table scan to determine insert/updates on target.
And also we can create index on staging tables since these tables were designed for specific application it will not impact to any other schemas/users.
While processing flat files to data warehousing we can perform cleansing
Data cleansing, also known as data scrubbing, is the process of ensuring that a set of data is correct and accurate. During data cleansing, records are checked for accuracy and consistency.
· Since it is one-to-one mapping from ODS to staging we do truncate and reload.
· We can create indexes in the staging state, to perform our source qualifier best.
· If we have the staging area no need to relay on the informatics transformation to known whether the record exists or not.
Q : What is ODS ?
Ans :
My understanding of ODS is, its a replica of OLTP system and so the need of this, is to reduce the burden on production system (OLTP) while fetching data for loading targets. Hence its a mandate Requirement for every Warehouse.
So every day do we transfer data to ODS from OLTP to keep it up to date?
OLTP is a sensitive database they should not allow multiple select statements it may impact the performance as well as if something goes wrong while fetching data from OLTP to data warehouse it will directly impact the business.
ODS is the replication of OLTP.
ODS is usually getting refreshed through some oracle jobs.
Q : What is the difference between a primary key and a surrogate key?
Ans :
A primary key is a special constraint on a column or set of columns. A primary key constraint ensures that the column(s) so designated have no NULL values, and that every value is unique. Physically, a primary key is implemented by the database system using a unique index, and all the columns in the primary key must have been declared NOT NULL. A table may have only one primary key, but it may be composite (consist of more than one column).
A surrogate key is any column or set of columns that can be declared as the primary key instead ofa "real" or natural key. Sometimes there can be several natural keys that could be declared as the primary key, and these are all called candidate keys. So a surrogate is a candidate key. A table could actually have more than one surrogate key, although this would be unusual. The most common type of surrogate key is an incrementing integer, such as an auto increment column in MySQL, or a sequence in Oracle, or an identity column in SQL Server.
Q : Have you done any Performance tuning in informatica?
Ans :
1) Yes, One of my mapping was taking 3-4 hours to process 40 millions rows into staging table we don’t have any transformation inside the mapping its 1 to 1 mapping .Here nothing is there to optimize the mapping so I created session partitions using key range on effective date column. It improved performance lot, rather than 4 hours it was running in 30 minutes for entire 40millions.Using partitions DTM will creates multiple reader and writer threads.
2) There was one more scenario where I got very good performance in the mapping level .Rather than using lookup transformation if we can able to do outer join in the source qualifier query override this will give you good performance if both lookup table and source were in the same database. If lookup tables is huge volumes then creating cache is costly.
3) And also if we can able to optimize mapping using less no of transformations always gives you good performance.
4) If any mapping taking long time to execute then first we need to look in to source and target statistics in the monitor for the throughput and also find out where exactly the bottle neck by looking busy percentage in the session log will come to know which transformation taking more time ,if your source query is the bottle neck then it will show in the end of the session log as “query issued to database “that means there is a performance issue in the source query.we need to tune the query using .
No comments:
Post a Comment