Friday, June 26, 2015

REGEXP_REPLACE function & SQL Query to remove Non-Numeric characters from a String

REGEXP_REPLACE function replaces string with regular expression matching supports. The simplest format for this function is:
REGEXP_REPLACE (source_string, pattern_to_find, pattern_to_replace_by)
The general format for the REGEXP_REPLACE function with all the options is


REGEXP_REPLACE (source_string, 
                pattern_to_find, 
               [pattern_to_replace_by, 
                position, 
                occurrence,
                match_parameter])

  • source_string:the string you want to search for
  • pattern-to-find:the pattern used to search.
  • pattern_to_replace_by: pattern used to do the matching
  • position:indicates where to start.
  • occurrence:indicates which occurrence of the pattern-to-find in the source-string you want to search for. For example, which occurrence of "si" do you want to extract from the source string "Mississippi".
  • match-parameter:for further customizing.
    • "i" in match-parameter can be used for caseinsensitive matching
    • "c" in match-parameter can be used for casesensitive matching
    • "n" in match-parameter allows the period to match the new line character
    • "m" in match-parameter allows for more than one line in source-string
MetacharactersMeaning
\Specify the escape sequence
\dDigit character
\DNon-digit character
\wWord character
\WNon-word character
\sWhitespace character
\SNon-whitespace character
\AMatches only at the beginning of a string or before a newline character at the end of a string
\ZMatches only at the end of a string
^Matches the position at the start of the string.
$Matches the position at the end of the string.
*Matches the preceding character zero or more times.
+Matches the preceding character one or more times.
?Matches the preceding character zero or one time.
*?Matches the preceding pattern element 0 or more times
+?Matches the preceding pattern element 1 or more times
??Matches the preceding pattern element 0 or 1 time
{n}Matches a character exactly n times, where n is an integer.
{n,}Matches the preceding pattern element at least n times
{n,m}Matches a character at least n times and at most m times, where n and m are both integers.
.Matches any single character except null.
(pattern)A subexpression that matches the specified pattern.
x|yMatches x or y, where x and y are one or more characters. war|peace matches war or peace.
[abc]Matches any of the enclosed characters.
[a-z]Matches any character in the specified range.
[:alphanum:]matches alphanumeric characters 0-9, A-Z, and a-z.
[:alpha:]matches alphabetic characters A-Z and a-z.
[:blank:]matches space or tab.
[:digit:]matches digits 0-9.
[:graph:]matches non-blank characters.
[:lower:]matches lowercase alphabetic characters a-z.
[:print:]is similar to [:graph:] except [:print:] includes the space character.
[:punct:]matches punctuation characters .,"`, and so on.
[:space:]matches all whitespace characters.
[:upper:]matches all uppercase alphabetic characters A-Z.
[:xdigit:]matches characters permissible in a hexadecimal number 0-9, A-F, and a-f.
[..]Matches one collation element, like a multicharacter element.
[==]Specifies equivalence classes.
\nA backreference to an earlier capture, where n is a positive integer.
Example:
ExampleDescription
\nmatches the newline character
\\matches \
\(matches (
^Amatches if A is the first character in the string.
$Bmatches if B is the last character in the string.
f*dmatches flood, food, and so on.
fo+dmatches fod, food, and so on.
fo?dmatches fd and fod only.
fo{2}dmatches food.
fo{2,3}dmatches food and foood only.
[ab]bcmatches abc and bbc.
[a-c]bcmatches abc, bbc, and cbc.

SQL> SELECT REGEXP_REPLACE('Mississippi', 'si', 'SI', 1, 0, 'i') FROM dual;

REGEXP_REPL
-----------
MisSIsSIppi

SQL> SELECT REGEXP_REPLACE('lord, llll','l[[:alpha:]]{2}', 'ssss') AS result FROM dual;

RESULT
------------
ssssd, ssssl



SQL Query to remove Non-Numeric characters from a String

Using TRANSLATE and REPLACE Function

We can use this method when we are completely aware of what all non-numeric characters that would be present in the input value. Here in this example i am trying to remove non-numeric characters from phone number field
SELECT TRANSLATE(REPLACE(LOWER('Ph +91 984-809-8540'),'(0) -',' '),'abcdefghijklmnopqrstuvwxyz()- +/,.#',' ') OUT_PUT FROM dual;
rs from phone number field
SELECT TRANSLATE(REPLACE(LOWER('Ph +91 984-809-8540'),'(0) -',' '),'abcdefghijklmnopqrstuvwxyz()- +/,.#',' ') OUT_PUT FROM dual;
1.Translate_and_Replace Function
The above query would return a output as “919848098540” but if the input is something like ‘Ph: +91 984-809-8540‘ then this would return an output ‘:919848098540‘ since we are not handling the character ‘:‘. So we can’t go for this method if we are not sure of the all possible non-numeric characters that would come in as input.

Using Regular Expression:

regexp_replace function replaces string with regular expression matching supports. The simplest format for this function is:
REGEXP_REPLACE (source_string, pattern_to_find, pattern_to_replace_by)
For more information about regexp_replace please read this article from oracle.com
SELECT to_number(regexp_replace('Ph: +91 984-809-8540', '[^0-9]+', '')) OUT_PUT FROM dual;
2.regexp_replace The above statement would replace all the characters except the digits 0-9 with null.
SELECT to_number(regexp_replace('Ph: +91 984-809-8540', '\D', '')) OUT_PUT FROM dual;
In this statement ‘\D’ would find all Non-digit characters and the will be replaced by null.
3.regexp_replace
Though the above two statements works well, there is a scenario where these two statements fail to work, let me tell you with an example
SELECT to_number(regexp_replace('0*0-7-', '[^0-9]+', '')) OUT_PUT FROM dual;
SELECT to_number(regexp_replace('0*0-7-', '\D', '')) OUT_PUT FROM dual;
we expect the above statement to return ‘007’ instead they would return ‘7’. this method omitts if we have digit ‘0’ as prefix. 4.regexp_replace
The best method I found is to use regexp Metacharacter ‘:digit:‘ which matches digits 0-9
SELECT REGEXP_REPLACE( '0*0-7-', '[^[:digit:]]', NULL ) OUT_PUT FROM DUAL;
SELECT regexp_replace( 'Ph: +91 984-809-8540', '[^[:digit:]]', NULL ) FROM dual;
5.regexp_replace

No comments:

Post a Comment