If you want to select more than one table when connecting to RDBMs or big data stores, you will need to join your tables using either the SQL Designer or custom SQL (see Selecting multiple tables from a database connection). In the Query Blender, you can use joins to import data from different connections (see Creating joins with the Query Blender).
This article contains the following sections:
Creating join conditions
After you select your connections or tables, you can begin to create your join. The first step in creating a new join is adding a join condition. Your join can contain multiple conditions, but you need at least one.
When you are creating a new join, your connections or tables must have a join key. A join key is composed of attributes that contain common values between both datasets. Your join will be based around the attributes used in your join key. If the join key you select does not contain any matching values, BIME will reject your join. In the join condition, you will select your join key attributes from the Select a field drop-down lists.
For example both tables below contain matching account IDs, so the attribute Account ID is used as the join key.
If you are using the SQL Designer, you can change the operator used in your join condition. The operator calculates how fields are related and will change the amount of data imported. In the Query Blender your fields can only be equal to each other, but in the SQL Designer you can select from greater than, less than, and equal to operators. You can change the operator by clicking the = icon.
Changing the operator is most useful for join keys using date attributes. If the attributes use different date ranges, you can change the operator to select what date ranges are imported.
For example, the Account satisfaction scores table includes the years 2010-2014 and the Add-on data table includes the years 2011-2013. If you wanted to only import data from the years 2010-2013, you would need to change the operator. For this example, the Account satisfaction scores field would be less than or equal to the Add-on data field, so only dates before or during 2013 will be imported. If you then use an inner join, you will only retrieve data from 2010-2013.
After you finish selecting your fields and operator, you can click Add condition. Once you add the condition, you can then select your join type. The next section in this article will discuss join types.
Selecting a join type
When you create a new join, there are three available options for importing your data. Each of these join types will retrieve your data differently. This section will define each join type and provide examples of how data is imported when using the different joins.
The following are the available join types:
Left and right joins
You can use left and right joins to import all data from one table and only the required data from another. Left joins will retrieve all data from the left table and only the data matching the join condition from the right table. Right joins will retrieve all data from the right table and only the data matching the join condition from the left table.
The venn diagrams below show how left and right joins import your data.
Left and right join example
The example below demonstrates how left and right joins will import data. This example first shows how the data appears before being joined, then how the data is imported when using a left and right join. These joins are created in Query Blender using established connections, but if you would like to join tables from a database, you can use the SQL Designer (see Selecting multiple tables from a database connection).
The first connection, Account satisfaction scores, includes data on the accounts that provided a satisfaction score. Only account IDs that submitted a satisfaction score are included in the dataset. The image below displays the data before it is joined.
The second connection, Add-on data, contains information on all accounts that purchased an add-on. Only the accounts IDs with add-ons are included in the dataset. The image below displays the data before it is joined.
You can join these connections to show the satisfaction scores submitted by the accounts that purchased an add-on. Both datasets contain common account IDs, so the attribute Account ID is used as the join key.
The join condition will look like the image below. Please note, that the Add-on data connection is on the left side and Account satisfaction scores is on the right.
If you select the left join option, the results will resemble the image below.
In a left join, all account IDs from Add-on data are pulled, but only the matching account IDs are pulled from Account satisfaction scores. This creates empty Satisfaction Score values. These empty values represent account IDs that purchased an add-on, but did not submit a satisfaction score.
If you select the right join option, the results will resemble the image below:
In a right join, all account IDs from Account satisfaction score are pulled, but only the matching account IDs are pulled from Add-on data. This creates an empty Add-on value. The empty value represents account IDs that provided a satisfaction score, but did not purchase an add-on.
When you select an inner join, only the data matching the join condition is retrieved. The venn diagram below displays how data is imported when you select the inner join.
If you were to use an inner join for the Add-on data and Account satisfaction score connections above, the results would resemble the image below.
With an inner join, only the account IDs that purchased an add-on and submitted a satisfaction score are included in the results. All other account IDs are dropped from the query.
Selecting a join type depends on how many and what kind of results you want to display. In some instances it might be necessary to retrieve all data from one table or only the matching data. Remember that your join type is saved to your storage and will affect all other queries from the connection in the future.