API Docs¶

domain_utils.domain_utils module¶

domain_utils.domain_utils.get_etld1(url, **kwargs)[source]¶

Returns the eTLD+1 (aka PS+1) of the url.

Parameters

url (string) – The url from which to extract the eTLD+1 / PS+1
extractor (tldextract::TLDExtract, optional) – An (optional) tldextract::TLDExtract instance can be passed with keyword extractor, otherwise we create and update one automatically.
kwargs – The method preprocesses the url with stem_url before extracting the domain. You can pass in stem_url parameters if you wish to change the behavior in some specific way.

Returns

The eTLD+1 / PS+1 of the url passed in. If no eTLD+1 is detectable, an empty string will be returned. Returns an IP address if the hostname of the url is a valid IP address.

Return type

string

domain_utils.domain_utils.get_port(url, extractor=None)[source]¶

Given an url, extract from it port if present.

Parameters

url (string) – The URL from where we want to get the scheme
extractor (tldextract::TLDExtract, optional) – An (optional) tldextract::TLDExtract instance can be passed with keyword extractor, otherwise we create and update one automatically.

Returns

Returns port in the url. If port not found, returns None.

Return type

int

domain_utils.domain_utils.get_ps_plus_1(url, **kwargs)[source]¶: An alias for get_etld1.

domain_utils.domain_utils.get_scheme(url, no_scheme='no_scheme')[source]¶

Given an url, extract from it the scheme.

Parameters

url (string) – The URL from where we want to get the scheme
no_scheme (any) – The value to use if no scheme is detected. Default is no_scheme

Returns

Returns the scheme with a default of ‘blank’ if no schema is provided

Return type

string

domain_utils.domain_utils.get_stripped_url(url, **kwargs)[source]¶: Alias for stem_url.

domain_utils.domain_utils.hostname_subparts(url, include_ps=False, **kwargs)[source]¶

Returns a list of slices of a url’s hostname down to the eTLD+1 / PS+1.

Parameters

url (string) – The url from which to extract the hostname parts
include_ps (boolean, optional) –
If include_ps is set, the hostname slices will include the public suffix For example: http://a.b.c.d.com/path?query#frag would yield:
- ["a.b.c.d.com", "b.c.d.com", "c.d.com", "d.com"] if include_ps == False
- ["a.b.c.d.com", "b.c.d.com", "c.d.com", "d.com", "com"] if include_ps == True
kwargs – Additionally all kwargs for get_etld1, can be passed to this method.

Returns

List of slices of of a url’s hostname down to the eTLD+1 / PS+1.

Return type

list (string)

domain_utils.domain_utils.is_ip_address(hostname)[source]¶: Check if the given string is a valid IP address

domain_utils.domain_utils.stem_url(url, return_unparsed=True, scheme_default='http', parse_ws=True, scheme=False, path=True, use_netloc=True, extractor=None)[source]¶

Returns a url stripped to just the beginning and end.

More formally it returns (scheme)?+(netloc|hostname)+(path)?.

For example https://my.domain.net/a/path/to/a/file.html#anchor?a=1 becomes my.domain.net/a/path/to/a/file.html URL parsing is done using std lib urllib.parse.urlparse.

A url is parsed if it has a qualifying scheme. The qualifying schemes are http, https, ws and wss. Websocket schemes can be omitted using the parse_ws parameter. Additionally, the scheme_default parameter provides a scheme where the url doesn’t contain one. The default is http and so urls without a scheme will, by default, be considered as http and therfore parsed.

What is returned for unparsed urls is determined by the return_unparsed parameter.

Parameters

url (string) – The URL to be parsed
return_unparsed (boolean, optional) – Action to take if scheme is not parsed e.g. file: or about:blank. If False, the result for non parsed urls will be an empty string If True, the result will be the original url, e.g. about:blank -> about:blank even if scheme=False. See method description to understand whether a URL is parsed or not. Default is True.
scheme_default (string, optional) – This parameter is passed to scheme parameter of urllib.parse.urlparse. This causes urls without a scheme to return the scheme default. Default is http.
parse_ws (boolean, optional) – If True, then ws and wss urls are parsed. Default is True.
scheme (boolean, optional) – If True, scheme will be prepended in parsed result. Default is False.
path (boolean, optional) – If True, path will be included in parsed result. Default is True.
use_netloc (boolean, optional) – If True urlparse’s netloc will be used. If False urlparse’s host will be returned. Using netloc means that a port is included, for example, if it was in the path. Default is True.
extractor (tldextract::TLDExtract, optional) – An (optional) tldextract::TLDExtract instance can be passed with keyword extractor, otherwise we create and update one automatically.

Returns

Returns a url stripped to (scheme)?+(netloc|hostname)+(path)?. Returns empty string if appropriate.

Return type

string

API Docs¶

domain_utils.domain_utils module¶

Table of Contents

Search