on Tuesday, 20 April, 2021

URI versus URL

This is a trivial blog post for some, but an interesting distinction that I didn't know about until recently.

Oftentimes, one can usually infer the meaning of a word from the context of its use, given plenty of exposure to it, and assuming the author is coherent. However, for me these two words (acronyms) seem to be used synonymously. But let's see what they mean.

As mentioned, they are both acronyms:

URI—: Uniform Resource Identifier;
URL—: Uniform Resource Locator.

From Merriam-Webster:

[URL] the address of a resource (such as a document or website) on the Internet that consists of a communications protocol followed by the name or address of a computer on the network and that often includes additional locating information (such as directory and file names);

And from Collins:

[URI] ABBREVIATION FOR: uniform resource identifier: any unique code for identifying an item on the internet.

But how are they different? In fact URL is a subtype of the space URI. There are two things that exist within the URI:

URN: (Uniform Resource Name), which name the resource (e.g., isbn: 978-1860071300).
URL: (aforementioned); a locator which tells you how to access the item (i.e., https, ftp, etc.), and the domain name where the item is accessible.

A URI is more broadly just an identifier to a specific resource, like a page or a book or a document. Futhermore, an IRI (internationalized resource identifier) is an identifier which may be expressed as a string of non-Latin-alphabet characters, such as Cyrillic or Chinese.

So that's it, really!

An interesting note about how URIs are processed:

When URIs are used on the Web to identify information resources, the protocol (e.g., HTTP) making use of the URIs defines the set of operations that may be performed on the resources. To perform these operations, a URI must first be resolved, which means determining how to access it, and then dereferenced, which means to perform the protocol-specified actions on the resource. Typically, in this context, to dereference means to retrieve the resource.

One article, however, has explained the messy history of what is explained above. There is a quote here which I should reference as a warning for the following section:

Writing is like sausage making in my view; you'll all be happier in the end if you just eat the final product without knowing what's gone into it. — George R. R. Martin

So with that said, let's get into how the sausage was made, as it were.

RFC 3986, in section 1.1.3, states that

A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location").

But the same RFC states just a little further down, in section 1.2.2, that

The URI itself only provides identification; access to the resource is neither guaranteed nor implied by the presence of a URI.

Earlier on it states in section 1.1.1

Each URI begins with a scheme name, as defined in Section 3.1, that refers to a specification for assigning identifiers within that scheme.

But...this seems contradicting. Or at least very confusing. This can happen with multiple authors, but to this extent, in a single RFC? The RFC then gives some examples of URIs:

ftp://ftp.is.co.za/rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB?objectClass?one
mailto:John.Doe@example.com
news:comp.infosystems.www.servers.unix
tel:+1-816-555-1212
telnet://192.0.2.16:80/
urn:oasis:names:specification:docbook:dtd:xml:4.1.2

So the same RFC has told us that a URI can be a name, a locator, or both—but a URI only provides identification, and a way to access it isn't guaranteed or implied—and that each URI begins with a scheme name (which in many cases tells you exactly how to access the resource).

...So is the resource always accessible or not? It it's locator always there?

As the article mentioned above says:

The reason the internet's been fighting about this for over a decade is that the RFC is poorly written.

Read the aforementioned article for further explaination.