Handling email addresses in SQL Server, Database Articles, Global Guide Line Technology is a Mega IT Portal.
Google
Join Global Guide Line community

     Home                    

   First Website Guide       

   Learn HTML                 

   Learn CSS                    

 

   Learn XML                    

   Learn XSLT                  

   Learn Java Script          

   Learn SEO                   

   Learn SQL                   

   Database Articles         

   Web Hosting Guide      

   Services                       

   Contacts                       

Handling email addresses in SQL Server


     Back             Next     

Storing email addresses in SQL Server databases is quite common, especially in those tables, that store information about customers, clients, subscribers etc. This leads to the obvious question, which datatype one should use for storing email addresses? undoubtedly, it is varchar data type, as handling email addresses in SQL Server vary a great deal in length. Use nvarchar if you need to store email addresses that contain extended characters. Note that nvarchar datatype requires double the amount of storage space, compared to varchar! So use it only if you need it.
Now, the next question. What should be the length of this email column? I would say, use varchar(70). In my experience, an average email address is about 23 characters in length, the shortest handling email addresses being about 10 characters long and the longest ones being 65 to 70 characters long.
 
Now that we decided on the datatype and length, how about validating email addresses? In my humble opinion, email address validation should take place in the client applications and websites. It is the responsibility of your applications, not to send crap email addresses to your database. But then there are always exceptions like, multiple applications with varying degrees of validations, in which case, it is better to centralize the validation at the database level (or validate at the database level anyway, to be doubly sure).

Email validation has always been a tricky bit for programmers. T-SQL lacks sophisticated regular expressions, that are available in other languages like javascript. Keeping that in mind, here's the code for a CHECK constraint that validates the inserted and updated email addresses.

ALTER TABLE [dbo].[Subscribers]
WITH CHECK ADD
CONSTRAINT [EmailValidator]
CHECK
(
CHARINDEX(' ',LTRIM(RTRIM([Email]))) = 0
--
No embedded spaces

AND LEFT(LTRIM([Email]),1) <> '@'  -- '@'
can't be the first character of an email address

AND RIGHT(RTRIM([Email]),1) <> '.' -- '.'
can't be the last character of an email address

AND CHARINDEX('.',[Email],CHARINDEX('@',[Email])) - CHARINDEX('@',[Email]) > 1
-- There must be a '.' after '@'
AND LEN(LTRIM(RTRIM([Email]))) - LEN(REPLACE(LTRIM(RTRIM([Email])),'@','')) = 1
-- Only one '@' sign is allowed

AND CHARINDEX('.',REVERSE(LTRIM(RTRIM([Email])))) >= 3
-- Domain name should end with at least 2 character extension
AND (CHARINDEX('.@',[Email]) = 0 AND CHARINDEX('..',[Email]) = 0)
-- can't have patterns like '.@' and '..'
)
GO

The above CHECK constraint is added on the column "Email", of the table "Subscribers". Notice the comments at the end of each line. Also notice that the constraint is being added with "WITH CHECK" option. So, the constraint will not get created if your table already has invalid email addresses, in which case, you will see the following error message:


Server: Msg 547, Level 16, State 1, Line 1

ALTER TABLE statement conflicted with TABLE CHECK constraint 'EmailValidator'. The conflict occurred in database 'pubs', table 'subscribers'.

This CHECK constraint will prevent most of the invalid email addresses from being added to the database, but it is not 100% fool proof. We still need to check for characters that are not allowed inside an email address. I do not have a definitive list of characters that are allowed/disallowed in an email address at the moment, but the above CHECK constraint can be amended with this check, if you wish. There is no guarantee that the domain name is going to be really valid, unless you look up the DNS records, which is too much to ask for, in an OLTP database. But this kind of checks can be performed in a batch process that runs during off-peak hours.

In SQL Server 2000, this CHECK constraint can be rewritten as a User Defined Function (UDF). Now, let's change gears and move on from storing and validating email addresses, to querying email addresses.

Btw, if the above CHECK constraint fails, because you already have invalid email addresses, run the following query to see those invalid/offending email address:

SELECT *
FROM Subscribers
WHERE NOT
(
CHARINDEX(' ',LTRIM(RTRIM([Email]))) = 0
AND  LEFT(LTRIM([Email]),1) <> '@'
AND  RIGHT(RTRIM([Email]),1) <> '.'
AND  CHARINDEX('.',[Email],CHARINDEX('@',[Email])) - CHARINDEX('@',[Email]) > 1
AND  LEN(LTRIM(RTRIM([Email]))) - LEN(REPLACE(LTRIM(RTRIM([Email])),'@','')) = 1
AND  CHARINDEX('.',REVERSE(LTRIM(RTRIM([Email])))) >= 3
AND  (CHARINDEX('.@',[Email]) = 0 AND CHARINDEX('..',[Email]) = 0)
)
 

Consider that the table "Subscribers", stores the email addresses (along with other details) of people subscribed to your online newsletter. Now you want to find out how many subscribers you have from each domain (for example, Hotmail, Yahoo etc.). Here's the query that does that:

SELECT RIGHT(Email, CHARINDEX('@', REVERSE(Email))-1)
AS [Domain Name], COUNT(Email) AS [Number of Subscribers]
FROM dbo.Subscribers
GROUP BY RIGHT(Email, CHARINDEX('@', REVERSE(Email))-1)
ORDER BY [Number of Subscribers] DESC, [Domain Name]

 

The above query extracts the domain names from the email addresses and groups by the domain name, to get the count of subscribers from each domain. Output is sorted in the descending order of number of subscribers, followed by the domain name in ascending order, so that the most popular domain name appears at the top of the list. Here's the sample output:
 
Domain Namer> -------------------- Number of Subscribers
----------------------------------
hotmail.com 212
yahoo.com 168
msn.com 199
aol.com 185
attbi.com 210
earthlink.net 201
lycos.com 181
mindspring.com 165
vsnl.com 132
worldnet.att.net 145
   
(10 row(s) affected)

Step on Next to read the rest Handling Email addresses in SQL Server please....

     Back             Next     


 

[ About ] [ Contact ] [ Home ]
[ Links ] [ Site Map ] [ Services ] [ Privacy ]

Copyright © 2005-2006 www.globalguideline.com All rights reserved. Join Global Guide Line community.