IBM InfoSphere DataStage v8.5 Sample Questions:
1. In the exhibit, the Aggregator stage aggregates over CustomerID. The Join stage joins by CustomerID. When the score is created, DataStage will insert hash partitioners and tsort operators to ensure the correct results.
Which two choices will eliminate sort operators from being inserted at run-time? (Select two)
A) Add a Sort stage before the Copy stage and specify Same partitioners for the links going into the Aggregator and Join stages.
B) Add Sort stages in front of the Join stage and specify the Same partitioner for the links going into the Join. Set the $APT_NO_SORT_INSERTION environment variable to "True".
C) Add a Sort stage before the Aggregator stage and specify the Same partitioner for the link going into the Aggregator Set the $APT_SORTJNSERTION_CHECK_ONLY environment variable to "True".
D) Add a Sort stage before the Copy stage and specify Auto partitioned for the links going into the Aggregator and Join stages.
2. A scenario requires selecting only the most recent transactions for each of 2 million unique customers, from a 20 million row DB2 source table containing order history. Which parallel job design would satisfy this functional requirement?
A) Using the Dynamic Relational stage, use custom SQL to select all DISTINCT customer numbers from the order history table.
B) Select all rows using the ODBC Connector stage, use a Sort Aggregator on customer number key to select the maximum order date.
C) Using the DB2 Enterprise stage, select all rows. Perform a unique Sort using customer number and order date sort keys in ascending order.
D) Using the DB2 API stage, select all rows. Use a Sort stage with customer number and order date sort keys in ascending order, then Remove Duplicates with Last Duplicate to retain.
3. If you do not alter any of the Format settings, the Sequential File stage will produce a file with what three types of format? (Choose three.)
A) Rows are delimited by a DOS newline.
B) All columns are delimited by a comma.
C) Variable length columns are contained within double quotes.
D) All columns are delimited by a comma, except for the final column in a row.
E) Rows are delimited by a UNIX newline.
4. You have a 3TB dataset hash-partitioned on CustID in a clustered environment. You need to join this dataset with 1GB of reference data on OrderID. Which technique is most appropriate?
A) Use Join stage, hash-partition and sort both link on OrderID.
B) Use Lookup stage, select auto partitioning for the stream link and entire partitioning for the reference link.
C) Use Lookup stage, select auto partitioning for both link.
D) Use Lookup stage, select auto partitioning for the stream link and hash-partition the reference link on CustI
5. Which two statements are true about the usage of scratch disk? (Choose two.)
A) If "buffer" scratch disk pool is defined, the framework uses this disk pool and the default disk pool.
B) The Sort stage always uses the scratch disk.
C) You can define multiple scratch disk spaces to distribute disk I/O.
D) The parallel framework uses the disk space specified in the scratch disk setting to buffer virtual data set records.
Solutions:
| Question # 1 Answer: A,B | Question # 2 Answer: D | Question # 3 Answer: C,D,E | Question # 4 Answer: B | Question # 5 Answer: C,D |
We're so confident of our products that we provide no hassle product exchange.


By Adam

