<scheme>://<username>:<password>@<host>:<port>/<path>?<query>#<fragment>
scheme
  • http, ftp, etc.
  • separated from the rest of the url by :
username and password
  • common in ftp, not so much in http
host
  • can be a domain name or an IP address
  • identifies the server machine we're trying to access
port
  • tells us what network port a particular application on the machine we're connecting to is listening on. 
  • for http, the default port assumed is 80, for https it is 443.
path
  • separated from the url components by /
  • tells us where on the server machine a resource lives. 
query
  • preferred way to send some parameters to the server
  • key=value pairs, separated from the rest of the url by ?
  • separated from each other by &
fragment
  • usually used to link to a particular section of an html document
  • separated from the rest of the url with a #
For example,
https://www.akshaykhot.com/some/crazy/path.html?param1=foo&param2=bar
Special Characters:
  • characters that have special meaning within a url are known as reserved characters, 
  • ; / ? : @ & = + $ ,
  • If a url is likely to contain one of these characters, it should be escaped before being included in the url. 
  • To url encode/escape a character we simply append its ascii hex value to the % character. 
  • url encoding of " " is %20
  • a url should always be in its encoded form. 
  • each part of the url must be encoded separately
  • don't encode a completely constructed url
absolute vs relative urls
  • if a url contains a scheme, e.g. http then it is an absolute url.
  • relative url is always interpreted relative to another url, known as a base url.