|
Now that we decided on the datatype and length, how about validating email addresses? In my humble opinion, email address validation should take place in the client applications and websites. It is the responsibility of your applications, not to send crap email addresses to your database. But then there are always exceptions like, multiple applications with varying degrees of validations, in which case, it is better to centralize the validation at the database level (or validate at the database level anyway, to be doubly sure).
Email validation has always been a tricky bit for programmers. T-SQL lacks sophisticated regular expressions, that are available in other languages like javascript. Keeping that in mind, here's the code for a CHECK constraint that validates the inserted and updated email addresses.
ALTER TABLE [dbo].[Subscribers]
WITH CHECK ADD
CONSTRAINT [EmailValidator]
CHECK
(
CHARINDEX(' ',LTRIM(RTRIM([Email]))) = 0
--
No embedded spaces
AND LEFT(LTRIM([Email]),1) <> '@'
-- '@'
can't be the first character of an email address
AND RIGHT(RTRIM([Email]),1) <> '.'
-- '.'
can't be the last character of an email address
AND CHARINDEX('.',[Email],CHARINDEX('@',[Email])) -
CHARINDEX('@',[Email]) > 1
-- There must be a
'.' after '@'
AND LEN(LTRIM(RTRIM([Email]))) -
LEN(REPLACE(LTRIM(RTRIM([Email])),'@','')) = 1
-- Only one '@' sign
is allowed
AND CHARINDEX('.',REVERSE(LTRIM(RTRIM([Email])))) >= 3
-- Domain name should
end with at least 2 character extension
AND (CHARINDEX('.@',[Email]) = 0 AND CHARINDEX('..',[Email])
= 0)
-- can't have
patterns like '.@' and '..'
)
GO
The above CHECK constraint is added on the column "Email", of the table "Subscribers". Notice the comments at the end of each line. Also notice that the constraint is being added with "WITH CHECK" option. So, the constraint will not get created if your table already has invalid email addresses, in which case, you will see the following error message:
Server: Msg 547, Level 16, State 1, Line 1
ALTER TABLE statement conflicted with TABLE CHECK constraint 'EmailValidator'. The conflict occurred in database 'pubs', table 'subscribers'.
This CHECK constraint will prevent most of the invalid email addresses from being added to the database, but it is not 100% fool proof. We still need to check for characters that are not allowed inside an email address. I do not have a definitive list of characters that are allowed/disallowed in an email address at the moment, but the above CHECK constraint can be amended with this check, if you wish. There is no guarantee that the domain name is going to be really valid, unless you look up the DNS records, which is too much to ask for, in an OLTP database. But this kind of checks can be performed in a batch process that runs during off-peak hours.
In SQL Server 2000, this CHECK constraint can be rewritten as a User Defined Function (UDF). Now, let's change gears and move on from storing and validating email addresses, to querying email addresses.
Btw, if the above CHECK constraint fails, because you already have invalid email addresses, run the following query to see those invalid/offending email address:
SELECT *
FROM Subscribers
WHERE NOT
(
CHARINDEX(' ',LTRIM(RTRIM([Email]))) = 0
AND LEFT(LTRIM([Email]),1) <> '@'
AND RIGHT(RTRIM([Email]),1) <> '.'
AND CHARINDEX('.',[Email],CHARINDEX('@',[Email])) -
CHARINDEX('@',[Email]) > 1
AND LEN(LTRIM(RTRIM([Email]))) -
LEN(REPLACE(LTRIM(RTRIM([Email])),'@','')) = 1
AND CHARINDEX('.',REVERSE(LTRIM(RTRIM([Email])))) >= 3
AND (CHARINDEX('.@',[Email]) = 0 AND CHARINDEX('..',[Email])
= 0)
)
Consider that the table "Subscribers", stores the
email addresses (along with other details) of people subscribed to your online
newsletter. Now you want to find out how many subscribers you have from each
domain (for example, Hotmail, Yahoo etc.). Here's the query that does that:
SELECT RIGHT(Email, CHARINDEX('@',
REVERSE(Email))-1)
AS [Domain Name], COUNT(Email) AS [Number of Subscribers]
FROM dbo.Subscribers
GROUP BY RIGHT(Email, CHARINDEX('@', REVERSE(Email))-1)
ORDER BY [Number of Subscribers] DESC, [Domain Name]
The above query extracts the domain names from the
email addresses and groups by the domain name, to get the count of subscribers
from each domain. Output is sorted in the descending order of number of
subscribers, followed by the domain name in ascending order, so that the most
popular domain name appears at the top of the list. Here's the sample output:
|
Domain Namer>
-------------------- |
Number of Subscribers
---------------------------------- |
|
hotmail.com
|
212 |
|
yahoo.com
|
168 |
|
msn.com
|
199 |
|
aol.com
|
185 |
|
attbi.com
|
210 |
|
earthlink.net
|
201 |
|
lycos.com
|
181 |
|
mindspring.com
|
165 |
|
vsnl.com
|
132 |
|
worldnet.att.net
|
145 |
| |
|
|
(10 row(s) affected) |
Step on Next to read the rest Handling Email addresses in SQL Server
please....
Back
Next
|