![]() |
Type in a command, or "ls dictionary" to search all commands for "dictionary", etc.
|
SYNOPSIS
extractDomainName [URL]
EXAMPLES
extractDomainName http://www.amazon.com/
returns: amazon.com
extractDomainName eemadges.com
returns: eemadges.com
extractDomainName http://en.wikipedia.org?search=%s
returns: en.wikipedia.org
extractDomainName http://seek.sing365.com:8080/cgi-bin/s.cgi?q=ladytron
returns: sing365.com
extractDomainName https://www.cia.gov/cia/publications/factbook/geos/.html
returns: cia.gov (thanks to Frank Raiser for noticing the https bug!)
DESCRIPTION
Extracts the domain name from the given URL.
It's a tad more complex than that. Since I made this command explicitly as a building block for another command (">") it has some quirks to fit my needs. For instance, I usually wanted the domain address with all subdomains ( e.g. I wanted en.wikipedia.org not just wikipedia.org) unless those subdomains corresponded to a search subsection of a website (e.g. I preferred nytimes.com instead of query.nytimes.com). Details are in the code below.
Here's the basic regexp behind extractDomainName:
def extractDomainName(url)
r = url=~(/^(?:\w+:\/\/)?([^\/?]+)(?:\/|\?|$)/) ? $1 : 'Not a valid URL!'
r.gsub!(/((?:www)|(?:seek)|(?:query)|(?:search))\.(([^\.]+)\.([^\.]+)(\.([^\.]+))?)/, '\2')
r.gsub!(/\:\d+$/, '')
end
Please email me (ely[dot]parra[gmail]) if you find bugs or have suggestions.
-elzr.com
==========
Old implementation:
http://eemadges.com/extractDomainName?id=%s