This article is a tutorial aimed at providing a basic introduction to the use of SQL (Structured Query Language) and is meant complement the articles related to Managed and virtual tables. SQL is vital to the usage of these tables, whether it is to manipulate, extract or store data.
The outcomes of this tutorial are as follows:
1. Appropriately apply the following SQL statements/clauses/operators:
- AND, OR, NOT
- ORDER BY
- INSERT INTO
2. Construct and apply a SQL statement to view/extract data from a managed table
3. Construct and apply a SQL statement to modify an existing managed table in terms of:
- Inserting new records
- Updating existing records
- Deleting existing records
For users looking for a full and in-depth tutorial of SQL, you can click here to be re-directed.
1.You will be re-directed to an external website of which SolveXia has no association with.
2.The SolveXia platform requires the SQL code to be written in the same format and convention as Microsoft SQL Server.
1. Extracting data
To extract data from a table into an excel file or text file, you will need to use the Select to Excel/Text instructions (available in the File manipulation action step). This section will provide an introduction to the SQL queries/statements that can be used to extract data.
1.1 The "SELECT" query
A simple SQL query to return data from a managed table called Table_1.
Note: When retrieving data from a managed table, it is important to wrap the table's name in square brackets. In this case, Table_1 should appear as [Table_1] in the query.
1.2 The "WHERE" clause
There may be times where you may only want to extract a subset of the full data stored in the table. A typical example is if you stored your cash flow data in a table called, [Cashflow] and you wish to only extract the data only for a specific data. In such a situation, a "WHERE" clause allows you to filter the data you to a specific condition.
Given the scenario above to extract a specific month's cash flow, the below query demonstrates how to use a "WHERE" clause in tandem with a "SELECT" query
In 1.2, we went through an example of using the "WHERE" clause based on a singular condition. In practice, you may have multiple conditions that you wish to filter on.
The three operators you can use with the "WHERE" cause are:
The "AND" operator is used to retrieve data when all conditions you specified have been met (i.e. all conditions have been evaluated to True).
With our cash flow scenario, the below example demonstrates how to extract data on which you wish to see the cash flow due to a specific item for a specific month.
The 'OR' operator is used to retrieve data when at least one of the conditions you have specified is met.
With our example, the below extract demonstrates how to extract data for two months.
The 'NOT' operator is used to retrieve data when the specified conditions are not met (i.e. conditions evaluated to False).
The below extract shows how to use the 'NOT' operator to extract all cashflow data as long as it isn't associated with a specific label.
Users are kindly reminded to keep in mind of De Morgan's laws when using the 'NOT' operator with 'AND'/'OR' operators.
1.4 Ordered sets
Now that you are able to write a query to display the information you require, you may also need to sort the results in either ascending or descending order. Using our previous example, the syntax for sorting your data is shown below.
By default data is sorted in ascending order. To sort a specific column in descending order, place the keyword "DESC" after the column name.
2. Inserting and deleting data
You can insert data into an existing managed table through the "INSERT INTO" statement. You can insert new records in multiple ways.
The first method allows you to insert records based on values:
The second method allows you to insert records from another table:
Note: If you are adding a record that has values for all the columns in the table AND you are inserting values in the same as order corresponding to the table structure, you do not need to specify the column names after the table name in the "INSERT INTO" query.
Now, if you wanted to delete the data that we just insert into [Table_1]. This can be done through the "DELETE" statement.
Note: The "WHERE" clause is very important to delete data. If you leave out the "WHERE "clause, all data will be deleted (and won't be recoverable). It is also vital to be aware of the data you are deleting with the "WHERE" clause. It is always good practice to do a "SELECT * FROM" with the same conditions before executing the "DELETE FROM". This will allow you to examine the data you are deleting first.
3. Manipulating data
To manipulate data that exist in a Managed Table, you can achieve this by using the "UPDATE" statement. The UPDATE statement allows you to manipulate the data of existing columns by changing the values directly or using formulas.
The below extract shows how you can manipulate a column by changing the values directly.
Alternatively, you can update a table's column based on the values of other columns.
The "UPDATE" Statement allows you to manipulate the data of multiple columns; to do so you will need to separate each column you are manipulating with "," in the SET component. See the below extract for an example of the syntax.
Of course, it goes without saying, the UPDATE statement does not allow you to add/remove columns, only manipulate the data for existing columns. i.e. if Column N does not exist, the statement will fail.
So far we have shown you to update the data for an entire column by changing its values directly or through formulas. However, there may be cases where you only need to change a subset of the data in the column(s). You can achieve this by applying a "WHERE" clause at the end of your UPDATE statement.
To test your understanding of the content of this tutorial. There is a data set provided in this article, test1.csv to be used for the following exercise. The final results of the exercise have also been attached for you to compare your results.
Please start off by completing the following steps.
- Import the test1.csv into a managed table. Name the managed table, test1_A.
- Create two managed tables of the same structure as test1. Name these tables test1_F and test1_M.
If you are having difficulties doing the first two exercises, please refer to this article for some help.
Complete the following task.
- Remove records that have Year before than 1995.
- Create a step to clear test1_F and test1_M managed table. Import data from test1_A using a SQL command to test1_F where all records have Sex = F. Similarly, import data from test1_A where all records have Sex = M.
- Using SQL statements fill in the missing data in Pr_LTCS with the following information
- All females that are have suffered a WC injury, with an EBP equal or greater than 4, should have a Pr_LTCS of 80
- All males that are have suffered a WC injury, with an EBP equal or greater than 4, should have a Pr_LTCS of 90.
- All records that have an EBP of less than 3 should have a Pr_LTCS of 5.
- All records that have an Injury with the words "CAT" in it has Pr_LTCS of 100.
- Use SQL statements, in particular, make use of the NOT operator in at least one of your statements to correct the Gov_spend_wc column of both test1_M and test1_F with the following information
- All records with Injury, "CTP" or "CTP CAT" should have a Gov_spend_wc of 0.
- All records with Injury, "WC" or "WC CAT" should have a Gov_spend_wc of greater than 0
- (Hard) Missing Gov_spend_wc is approximated by averaging the non-zero Gov_spend_wc for injury with "WC" and "WC CAT" separately.
- Create two reports from your set of managed tables:
- A csv extract named "CTP ID" with all records that have Injury, CTP or CTP CAT (including both records that have Sex, "M" and "F"). Ensure the extract only shows the ID, Location, Sex and Year of the records and is ordered by the ID (in ascending order).
- An excel file named "WC Spend", with all remaining records, displaying all associated columns. Ensure records are ordered by ID (in ascending order).