Oracle aggregate values into comma separated string

In Oracle many times we want to group rows of values in such a way that aggregated value is the comma separated list/string (csv) of all values.
Also, We might want that the aggregated csv value do not contain any duplicate inside the group function.

Below are two approaches to achieve this.

  1. Using LISTAGG function
  2. Using XMLAGG and XMLELEMENT function

 

Example Scenario:

To demonstrate the query, consider following sample scenario

There is an Employee table. It has three columns department_id, designation, first_name (There could be many other columns in the table, obvious ones are employee_id, grade, project, etc.. but for the sake of simplicity only three columns are included here. if more columns are there solution will work just fine with them also)

In each department, there could be various designations and multiple persons can have same designation. So for each department_id, there could be multiple designation and first_name

As a final solution following should be achieved,

  • For each department combine all designation and first_name into comma separated string.
  • Additionaly, duplicate values should be removed So for e.g. if there are multiple Employee has the same designation Software Developer then in the aggregated string Software Developer should appear only once

 
Sample Input:

department_id designation first_name
1 Developer Pranav
1 Tester Ravi
1 Developer Krupal

Sample Output:

department_id aggregate_designation aggregate_first_name
1 Developer,Tester Krupal,Pranav,Ravi

 

1. Using LISTAGG function

LISTAGG function has following syntax

LISTAGG(column_name, separator_character) WITHIN GROUP (ORDER BY ord_col_name) [OVER (PARTITION BY)]

  • column_name : name of the column, whose values needs to be combined in csv form
  • seperator_character : character to be used for separating values
  • ord_col_name : column name based on which result will be ordered
  • [OVER(Partition by)] : frankly speaking I don’t understand what this does and this is optional so I haven’t used it.

Following query will generate the desired output

SELECT department_id, 
       (SELECT LISTAGG(n.fn, ',') within GROUP (ORDER BY n.fn) 
        FROM  ( SELECT DISTINCT department_id, designation AS fn 
                FROM   employees e 
                ORDER  BY e.designation ) n 
        WHERE  n.department_id = emp.department_id) AS designation, 
       (SELECT LISTAGG(n.fn, ',') within GROUP (ORDER BY n.fn) 
        FROM  ( SELECT DISTINCT department_id, 
                                first_name AS fn 
                FROM   employees e 
                ORDER  BY e.first_name ) n 
        WHERE  n.department_id = emp.department_id) AS first_name 
FROM   employees emp 
GROUP  BY department_id 
ORDER  BY department_id

In the query group by is used for department_id column. So for each department_id all designation and first_name will be combined by the LISTAGG function and csv value will be generated.

Why below select subquery is required, woudn’t ground by with LISTAGG work without it ??

SELECT Listagg(n.fn, ',') 
         within GROUP (ORDER BY n.fn) 
FROM   (SELECT DISTINCT department_id, 
                        designation AS fn 
        FROM   employees e 
        ORDER  BY e.designation) n 
WHERE  n.department_id = emp.department_id

If we use LISTAGG with group by and without select subquery, then the final concatenated values will include duplicate entries.
To remove the duplicate sub-query in the select clause is used. This sub-query first finds out the unique value for each department and then LISTAGG function is applied on the unique value so that final concatenated values contains no duplicate.
For e.g. in the above sub-query first unique designation for each department is found (DISTINCT department_id, designation) then from that list designation for current department is taken (where n.department_id = emp.department_id ). Now these unique values are concatenated and a final csv is generated by applying listagg function on these unique values.

 

2. Using XMLAGG and XMLELEMENT function

XMLAGG : This function aggregates the values of xml-elements.

xmlagg(xmlelement ORDER BY column_name)

  • xmlelement: XML Element
  • column_name: Name of the column on which ordering should be done

XMLELEMENT : This function generates xml element from the column value.

xmlelement( xml_element_name , column_name )

  • xml_element_name : Any valid XML Node name
  • column_name : Name of the column whose values needs to be combined. Here multiple column name can also be used using “||” if value from multiple columns needs to be combined. If some constant value needs to be appended after each row’s value then add that constant value after “||”

extract(‘//text()’) : This function extracts the text part of the xml element. For e.g. if we have node <a>data</a>, It will return “data”, the text part of xml element

GetClobVal : Use this function if in your data, if final CSV is going to contain more than 4000 characters, if in doubt then it is always better to user this function, because it can handle larger data set

RTRIM: It trims the trailing character.

The final query looks like below

 SELECT   department_id,
         (
                  SELECT   rtrim( xmlagg( Xmlelement(e, ie.designation
                                    || ',').extract('//text()') order BY ie.designation ).getclobval(), ',')
                  FROM     (
                                           SELECT DISTINCT department_id,
                                                           designation
                                           FROM            employees ) ie
                  WHERE    ie.department_id = e.department_id ) AS designation,
         (
                  SELECT   rtrim( xmlagg( xmlelement(e, first_name
                                    || ',').extract('//text()') ORDER BY first_name).getclobval(), ',')
                  FROM     (
                                           SELECT DISTINCT department_id,
                                                           first_name
                                           FROM            employees ) ie
                  WHERE    ie.department_id = e.department_id ) AS first_name
FROM     employees e
GROUP BY department_id
ORDER BY department_id 

Simlar to approach 1, sub-query is used for removing the duplicates from the result

SELECT   rtrim( xmlagg( Xmlelement(e, ie.designation
                  || ',').extract('//text()') order BY ie.designation ).getclobval(), ',')
FROM     (
                         SELECT DISTINCT department_id,
                                         designation
                         FROM            employees ) ie
WHERE    ie.department_id = e.department_id 

 
 
Summary:

1st  approach seems simple, but it has limitation that concatenated string should not exceed 4000 character(varchar2 max length)
2nd approach does not have such limitations and can be used for all situatuions
 

Hope this helps 🙂