As explain earlier merge is like a union operation in which it just collect all the records from provided sources without any condition. suppose if we have Person & PersonPhones two sources as shown below
Below is Person table with two columns personId & Name
PersonId | Name |
1 | Ram |
2 | Shyam |
3 | Ghanshyam |
Below is PersonPhones table with two columns PersonId & Phone
PersonId | Phone |
1 | 9999 |
2 | 3122 |
3 | 3422 |
Now if we talk about merge transformation then the result would be as shown below
Merge Transformation output
PersonId | Name | Phone |
1 | Ram | |
2 | Shyam | |
3 | Ghanshyam | |
1 | 9999 | |
2 | 3122 | |
3 | 3422 |
Now, If we talk about Merge Join transformation for above same inputs (Person, PersonPhones) then we will get following output
PersonId | Name | Phone |
1 | Ram | 9999 |
2 | Shyam | 3122 |
3 | GhanShyam | 3422 |
So, Merge Join is basically a join like Inner join, Left outer join, Full outer join etc.
I hope you got the context. Here we go step by step to implement the Merge join transformation
Step 1:- Add the different sources which you want. For this example I am taking two sources which have AdventureWorks database and one source have Person table & another one have PersonPhone as shown above in the example.
We added data flow task as shown in below image and renamed it to Merge join data flow task.
Now, in this data flow task, we will add data sources, in the first source we use person table and in the second source we use PersonPhone.
from person table, we are choosing BusinessEntityId, FirstName,MiddleName, LastName,suffix as shown in below figure
In the other source, we are using BusinessEntityId & Phone number columns.
If you find any issue in above steps then please, follow previous articles of step by step SSIS on Indiandotnet.
Step 2:- Now once the source is configured, we are going to drag drop merge control as shown in below figure.
Step 3:- Now to configure Merge join, we are going to drag drop inputs from the sources. When you try this at your end then you will get the following screen. Here you have to set left input and right input for the merge join.
Now, When you tried it you might found following warning. Which means the provided inputs of datasource1 & datasource2 are not sorted.
Step 4:- Here you have the option to add a Sort transformation and then provide input to merge join transformation control which is fairly simple and we have discussed in previous tips. Here, I am sharing one more option.
In this, you have to right click on DataSource and choose option Advance Editor by which you will get the following screen. You have to select “OLEDB Source output” of “Input and output Properties” tab.
Here, we need to make the IsSorted property to True as highlighted in below snap.
Step 5:- Once the above Step is done next step is click on output column’s BusinessEntityId value. Here you have to set the SortKeyPosition to 1.
As highlighted in below screen.
Step 6:- Once you configured sorted input sources then next step is to configure the merge join. So when you click on Edit option of merge join. You will get the following screen.
Here if you see, we can configure join type like Inner join, Left outer join, Full outer join.
For current example, we are choosing the option “Inner join” and joining key is BusinessEntityId which is primary key in Person table and foreign key in PersonPhone table.
Once the configuration is done. We can select what are the columns which we need as an output.
Step 7:- Now, Next step is to configure the output in excel. for this, we drag & drop a destination source. In the current example, we are using Excel as a destination. We are mapping the columns of output to excel as shown in below figures.
Step 8:- Once, everything is configured. We run the package and if everything is working fine then we will get the following screen.
To cross check the result we open the excel and we get the output which we expected as shown in below figure.
I hope this article might help you to understand the Merge join.
Enjoy !!!
RJ!!
Comments
Why would one bother to do all this when it is so much easier in SQL? What are the benefits of using a DFT. I struggle to see any?
I think that this is a great example on how to use the Merge Join tool to combine different sources of information. I have found that I use this tool mostly with XML transformations when there is a 1 to many relationship and I need to import the data to a SQL database. You have to use the merge join to combine the different nodes of data. This should be helpful for anyone wanting to import XML with SSIS. Same exact functionality applies with the SORT property and the SORT KEY POSITION.
The use of Merge Join is best used when the sources of your two datasource tasks are from two different SQL Server instances that can't be linked directly or when the data is from two different data source types such as SQL Server and Oracle or SQL Server and an Access database (lord knows why you would do that one!), etc.
One item left out in this article was that the data must be sorted. Just setting the IsSorted property to true doesn't always do the trick. An Order By on the fields that have a sortorder against them must exist in the query. I have done this without ordering my data in the source queries and my results were not what I was expecting! So, always add an order by in the query behind each data source when using Merge Join. If your not using a query and are just grabbing the data from a table then you will want to add a sort step after each data source to sort the data. I always try to use a custom query behind my data source so that I can sort them if needed without a separate sort step.
thank you a lot,i am following these examples.Very much helpful
Thank you for you good work.I am new to SSIS and this is helping me a lot.