Jul 17 2008

extract/get email address from string using vb.net

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

Here is a simple VB.Net code to extract email addresses from a string

Imports System.Text.RegularExpressions
 
Public Class Form1
 
    Private Sub btnExtract_Click_1(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnExtract.Click
 
        Dim textstring As Array
        Dim strEmail As String
        Dim strEmailChars As String
        Dim StrOutPut As String = ""
 
        strEmail = txtMailContent.Text
        StrOutPut = txtResult.Text
        strEmailChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_@."
        For cnt As Long = 0 To strEmail.Length - 1  'Loop throug each charecters in the content
 
            'Check whether the charecter is anything other than valid email charecters (Charecters in the strins "strEmailChars")
            If strEmailChars.Contains(strEmail(cnt)) = False Then
                strEmail = strEmail.Replace(strEmail(cnt), " ") 'Replace that String with space
            End If
        Next
 
        textstring = Split(strEmail) ' no delimeter means use space
 
        Dim emailpattern As String = "^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"
 
        Dim i As Integer
        Dim r As New Regex(emailpattern, RegexOptions.IgnoreCase + RegexOptions.Multiline)
 
        For i = 0 To textstring.GetLength(0) - 1
            Dim m As Match = r.Match(textstring(i))
 
            While m.Success
                If StrOutPut.Contains(m.ToString) = False Then
                    If StrOutPut.Length > 0 Then    'Put a comma before appending new email address if the
                        '                            taken email id is not the first one
                        StrOutPut = StrOutPut & ", "
                    End If
                    StrOutPut = StrOutPut & m.ToString()
                    lblCount.Text = lblCount.Text + 1
                End If
                m = m.NextMatch()
            End While
        Next i
        StrOutPut = StrOutPut.Remove(StrOutPut.LastIndexOf(",") + 1, 0)
 
        txtResult.Text = StrOutPut
 
        MsgBox("Completed")
 
    End Sub
 
    Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        lblCount.Text = 0
    End Sub
End Class

Now here is a detailed explanation.
Create a VB.Net 2005 application. Design a form with two text boxes (txtMailContent, txtResult), a label box (lblCount) and a button (btnExtract). Take the code window of the form and delete all the content and paste the above mentioned code.

Declare variables

Imports System.Text.RegularExpressions
‘Import name space for regular expressions.
Dim textstring As Array
Dim strEmail As String
Dim strEmailChars As String
Dim StrOutPut As String = ""

Assign string to extract to the variable strEmail and assign the current result to StrOutPut. This code will extract unique email address from the supplied string.

strEmail = txtMailContent.Text
StrOutPut = txtResult.Text

Assign legal characters to a string.

strEmailChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_@."

The below loop will replace all illegal characters with an “Space”

For cnt As Long = 0 To strEmail.Length - 1  'Loop throug each charecters in the content
 
'Check whether the charecter is anything other than valid email charecters
'(Charecters in the strins "strEmailChars")
If strEmailChars.Contains(strEmail(cnt)) = False Then
strEmail = strEmail.Replace(strEmail(cnt), " ") 'Replace that String with space
End If
Next

Split each word and store it in an array.

textstring = Split(strEmail) ' no delimeter means use space

The belowloop will extract each word in the array and check if that is a valid email. If that is a valid email then it will check whether its a duplicate. If not then it will add that email address to result. The label lblCount will display the total number of emails extracted.

Dim emailpattern As String = "^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"
Dim r As New Regex(emailpattern, RegexOptions.IgnoreCase + RegexOptions.Multiline)
For i = 0 To textstring.GetLength(0) - 1
Dim m As Match = r.Match(textstring(i))
 
While m.Success
If StrOutPut.Contains(m.ToString) = False Then
If StrOutPut.Length > 0 Then    'Put a comma before appending new email address if the
'                            taken email id is not the first one
StrOutPut = StrOutPut & ", "
End If
 
StrOutPut = StrOutPut & m.ToString()
lblCount.Text = lblCount.Text + 1
End If
m = m.NextMatch()
End While
Next i
StrOutPut = StrOutPut.Remove(StrOutPut.LastIndexOf(",") + 1, 0)
txtResult.Text = StrOutPut

Now Display the result in to the text box txtResult. :)

Add to Del.cio.us RSS Feed Add to Technorati Favorites Stumble It! Digg It!
    www.sajithmr.com

1 Comments on this post

Trackbacks

  1. Arun James said:
    Thanks for the info … Keep going .. :)
    July 18th, 2008 at 11:09 am

LEAVE A COMMENT

Subscribe Form

Subscribe to Blog