Wednesday, August 3, 2011

Oracle PL/SQL function to split strings into tokens

I was trying to find how to split or tokenise (tokenize) a string in Oracle SQL. Technically you can do it directly with just instr and substr but your statement becomes very long and hard to read, mainly because you can't reuse the result of an instr. I found this which was interesting but not what I needed, so I wrote my own function to do the job. I'll post it here in the hope that it will be useful to someone.
create or replace function get_token(    the_list  varchar2,    the_index number,    delim     varchar2 := ',' )    return    varchar2 is    start_pos number;    end_pos   number; begin    if the_index = 1 then        start_pos := 1;    else        start_pos := instr(the_list, delim, 1, the_index - 1);        if start_pos = 0 then            return null;        else            start_pos := start_pos + length(delim);        end if;    end if;     end_pos := instr(the_list, delim, start_pos, 1);     if end_pos = 0 then        return substr(the_list, start_pos);    else        return substr(the_list, start_pos, end_pos - start_pos);    end if;  end get_token; /  
Example usage:
select    get_token('foo,bar,baz',1), -- 'foo'    get_token('foo,bar,baz',3), -- 'baz'    --    get_token('a,,b',2),        -- '' (null)    get_token('a,,b',3),        -- 'b'    --    get_token('a|b|c',2,'|'),   -- 'b'    get_token('a|b|c',4,'|')    -- '' (null) from    dual /  
Notes
  • Remember that the index starts at one not zero just like strings in PL/SQL.
  • Empty tokens are counted. You get null if you ask for an empty token.
  • You also get null if you ask for an index greater than the number of tokens.
  • The delimiter defaults to a comma but you can specify any delimiter. A delimiter of more than one char should work also.


Update 17/5/07
I've added the notes above and colourised the code to make it look prettier. To the commenters below, you're welcome!

Update 12/9/08
This version provided by Anonymous below supports negative indexes, eg
get_token('foo,bar,baz',-1), -- 'baz' get_token('foo,bar,baz',-2), -- 'bar' 
Also check the comments below for a regex based version that can handle quoted csv style fields.
create or replace function get_token (     the_list varchar2,     the_index number,     delim varchar2 :=';' )     return varchar2 is     start_pos number;     end_pos number;  begin     if the_index = 1 then         start_pos := 1;      elsif the_index < 0 then         start_pos := instr(the_list, delim, -1, abs(the_index)) + 1;      else         start_pos := instr(the_list, delim, 1, the_index - 1);          if start_pos = 0 then             return null;         else             start_pos := start_pos + length(delim);         end if;      end if;      if the_index < 0 then         end_pos := instr(the_list, delim, start_pos+1, 1);     else         end_pos := instr(the_list, delim, start_pos, 1);     end if;      if end_pos = 0 then         return substr(the_list, start_pos);     else         return substr(the_list, start_pos, end_pos - start_pos);     end if;  end get_token; 

No comments: