Extract email from text file

The Linux command line has many tools that we can use to extract information from text files. In this tutorial, we’ll use a few different methods to extract email addresses from a text file on Linux.

All methods will accomplish the same goal, so use whichever one you find most convenient.

In this tutorial you will learn:

  • How to extract email addresses from a text file
Extracting emails from a text file in Linux
Extracting emails from a text file in Linux
Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Any Linux distro
Software N/A
Other Privileged access to your Linux system as root or via the sudo command.
Conventions # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux commands to be executed as a regular non-privileged user

Extract email addresses from a text file




We are assuming that your text file contains one or more email addresses, sprinkled throughout some ordinary text. The objective here is to use various Linux utilities to sift through the text files for us and extract the email addresses.

Just to illustrate our examples, we are using the following text file.

Name: Luke Reynolds
Email: luke@example.com
For business inquiries, contact luke@example.net instead
Alternatively, send mail to my boss admin@example.com
The End

Check out the various methods we use below to extract each of the three email addresses in this example file.

Extracting emails from a text file in Linux
 
  1. The following grep regular expression can be used to extract the email addresses in our file.
    $ grep -oe "[a-zA-Z0-9._]\+@[a-zA-Z]\+.[a-zA-Z]\+" emails.txt
    
  2. The following sed command can also be used to extract email addresses.
    $ sed -r 's/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}/\n&\n/ig;s/(^|\n)[^@]*(\n|$)/\n/g;s/^\n|\n$//g;/^$/d' emails.txt
    
  3. The following Python code can also be used to extract email addresses from text.
    import re
    text = "Name: Luke Reynolds"+\
    "Email: luke@example.com"+\
    "For business inquiries, contact luke@example.net instead"+\
    "Alternatively, send mail to my boss admin@example.com"+\
    "The End"
    
    
    emails = re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", text)
    print (emails)
    

    Save your file as script.py and then execute it to extract the emails.

    $ python3 script.py
    
  4. Finally, we can also use the following perl code to extract email addresses.
    #!/usr/bin/perl
    
    use strict;
    
    my $email_count;
    
    while (my $line = <>) { #read from file or STDIN
      foreach my $email (split /\s+/, $line) {
         if ( $email =~ /^[-\w.]+@([a-z0-9][a-z-0-9]+\.)+[a-z]{2,4}$/i ) {
     		print $email . "\n";
    		$email_count++;
    	
      }
    }
    }
    
    print "Emails Extracted: $email_count\n";
    

    Save the file as script.pl and then execute it to extract the emails.

    $ ./script.pl emails.txt
    


Closing Thoughts

In this tutorial, we showed several different methods for extracting email addresses from a text file in Linux. As usual with most everything in Linux, there are multiple ways to accomplish the same task. Use whichever method you find most convenient, whether that be to use default Bash utilities, or the Python or Perl programming languages.



Comments and Discussions
Linux Forum