In Oracle many times we want to group rows of values in such a way that aggregated value is the comma separated list/string (csv) of all values.
Also, We might want that the aggregated csv value do not contain any duplicate inside the group function.
Below are two approaches to achieve this.
- Using LISTAGG function
- Using XMLAGG and XMLELEMENT function
To demonstrate the query, consider following sample scenario
There is an Employee table. It has three columns department_id, designation, first_name (There could be many other columns in the table, obvious ones are employee_id, grade, project, etc.. but for the sake of simplicity only three columns are included here. if more columns are there solution will work just fine with them also)
In each department, there could be various designations and multiple persons can have same designation. So for each department_id, there could be multiple designation and first_name
As a final solution following should be achieved,
- For each department combine all designation and first_name into comma separated string.
- Additionaly, duplicate values should be removed So for e.g. if there are multiple Employee has the same designation Software Developer then in the aggregated string Software Developer should appear only once
1. Using LISTAGG function
LISTAGG function has following syntax
LISTAGG(column_name, separator_character) WITHIN GROUP (ORDER BY ord_col_name) [OVER (PARTITION BY)]
- column_name : name of the column, whose values needs to be combined in csv form
- seperator_character : character to be used for separating values
- ord_col_name : column name based on which result will be ordered
- [OVER(Partition by)] : frankly speaking I don’t understand what this does and this is optional so I haven’t used it.
Following query will generate the desired output
SELECT department_id, (SELECT LISTAGG(n.fn, ',') within GROUP (ORDER BY n.fn) FROM ( SELECT DISTINCT department_id, designation AS fn FROM employees e ORDER BY e.designation ) n WHERE n.department_id = emp.department_id) AS designation, (SELECT LISTAGG(n.fn, ',') within GROUP (ORDER BY n.fn) FROM ( SELECT DISTINCT department_id, first_name AS fn FROM employees e ORDER BY e.first_name ) n WHERE n.department_id = emp.department_id) AS first_name FROM employees emp GROUP BY department_id ORDER BY department_id
In the query group by is used for department_id column. So for each department_id all designation and first_name will be combined by the LISTAGG function and csv value will be generated.
Why below select subquery is required, woudn’t ground by with LISTAGG work without it ??
SELECT Listagg(n.fn, ',') within GROUP (ORDER BY n.fn) FROM (SELECT DISTINCT department_id, designation AS fn FROM employees e ORDER BY e.designation) n WHERE n.department_id = emp.department_id
If we use LISTAGG with group by and without select subquery, then the final concatenated values will include duplicate entries.
To remove the duplicate sub-query in the select clause is used. This sub-query first finds out the unique value for each department and then LISTAGG function is applied on the unique value so that final concatenated values contains no duplicate.
For e.g. in the above sub-query first unique designation for each department is found (DISTINCT department_id, designation) then from that list designation for current department is taken (where n.department_id = emp.department_id ). Now these unique values are concatenated and a final csv is generated by applying listagg function on these unique values.
2. Using XMLAGG and XMLELEMENT function
XMLAGG : This function aggregates the values of xml-elements.
xmlagg(xmlelement ORDER BY column_name)
- xmlelement: XML Element
- column_name: Name of the column on which ordering should be done
XMLELEMENT : This function generates xml element from the column value.
xmlelement( xml_element_name , column_name )
- xml_element_name : Any valid XML Node name
- column_name : Name of the column whose values needs to be combined. Here multiple column name can also be used using “||” if value from multiple columns needs to be combined. If some constant value needs to be appended after each row’s value then add that constant value after “||”
extract(‘//text()’) : This function extracts the text part of the xml element. For e.g. if we have node <a>data</a>, It will return “data”, the text part of xml element
GetClobVal : Use this function if in your data, if final CSV is going to contain more than 4000 characters, if in doubt then it is always better to user this function, because it can handle larger data set
RTRIM: It trims the trailing character.
The final query looks like below
SELECT department_id, ( SELECT rtrim( xmlagg( Xmlelement(e, ie.designation || ',').extract('//text()') order BY ie.designation ).getclobval(), ',') FROM ( SELECT DISTINCT department_id, designation FROM employees ) ie WHERE ie.department_id = e.department_id ) AS designation, ( SELECT rtrim( xmlagg( xmlelement(e, first_name || ',').extract('//text()') ORDER BY first_name).getclobval(), ',') FROM ( SELECT DISTINCT department_id, first_name FROM employees ) ie WHERE ie.department_id = e.department_id ) AS first_name FROM employees e GROUP BY department_id ORDER BY department_id
Simlar to approach 1, sub-query is used for removing the duplicates from the result
SELECT rtrim( xmlagg( Xmlelement(e, ie.designation || ',').extract('//text()') order BY ie.designation ).getclobval(), ',') FROM ( SELECT DISTINCT department_id, designation FROM employees ) ie WHERE ie.department_id = e.department_id
1st approach seems simple, but it has limitation that concatenated string should not exceed 4000 character(varchar2 max length)
2nd approach does not have such limitations and can be used for all situatuions
Hope this helps 🙂