SQL Azure – TSQL User Defined Function – Separate multiple emails from NVARCHAR(MAX) field

Stack overflow sourced, adapted and personally tested code
Extract email address from string using tsql

A continuation of working with strings in TSQL specifically linked to emails.

Firstly create the following Function

CREATE FUNCTION [dbo].[fnFindPatternLocation]
(
    @string NVARCHAR(MAX),
    @term   NVARCHAR(MAX)
)
RETURNS TABLE
AS
    RETURN 
    (
        SELECT pos = Number - LEN(@term) 
        FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(@string, Number, 
        CHARINDEX(@term, @string + @term, Number) - Number)))
        FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
        FROM sys.all_objects) AS n(Number)
        WHERE Number > 1 AND Number <= CONVERT(INT, LEN(@string))
        AND SUBSTRING(@term + @string, Number, LEN(@term)) = @term
    ) AS y);

Then create a View of what you are interested in as follows.. Note here I am taking out the carriage return as my subsequent query doesn’t like them and in emails they frequently exist.

CREATE VIEW [dbo].[v001] as SELECT pkid, REPLACE(body, CHAR(13) + CHAR(10),' ') as body1 from t001email

Then run the newly created View through a query.

SELECT pkid, body1, pos, SUBSTRING(body,beginningOfEmail,endOfEmail-beginningOfEmail) AS email
FROM v001
CROSS APPLY (SELECT pos FROM dbo.fnFindPatternLocation(body1, '@')) AS A(pos)
CROSS APPLY (SELECT CHARINDEX(' ',body1 + ' ', pos)) AS B(endOfEmail)
CROSS APPLY (SELECT pos - CHARINDEX(' ', REVERSE(SUBSTRING(body, 1, pos))) + 2) AS C(beginningOfEmail)

Couple of things here
Multiple emails will be picked out and placed as separate records so if there a string that reads

This is a sentence with two emials first@gmail.com and a second second@gmail.com

it will return
first@gmail.com
second@gmail.com

If an email starts the field then this will NOT work after finding the @ symbol it will count forward and fail to find a space and so set space before to Null it will then return just the domain of the email. I will be looking to fix this at some point.

Secondly if the emails within the field contain contiguous special html characters such as < or > these will be picked up and inculded as if they are part of the email addresses.

We can fix this by scanning through the varchar(max) field and stripping out special characters.

NOTE : If you are working with email bodies carriage returns will also screw up the above query in which case consider running the field through some kind of replace view with similar syntax as

CREATE VIEW v002 as SELECT pkid, REPLACE(body, CHAR(13) + CHAR(10),' ') as txtBodyWithoutReturns from t001email