DOM - Document Object Model

The Document Object Model (DOM) represents XML or HTML documents as a tree of nodes. Using DOM methods and properties, you can access any element on the page, modify or delete elements, and add new ones. DOM is a language-independent Application Programming Interface (API) that can be implemented not only in JavaScript, but also in any other programming language. For example, you can generate pages on the server side using the PHP implementation of DOM (php.net/dom).

Any HTML document can be represented as a DOM tree, where each node has its parent and children. Each node in this tree is an object, with its own properties and methods. Empty lines and comments are also considered objects (nodes) in the DOM model.

For all examples, we will use a simple HTML document, the code of which is shown below. And for viewing and working with the console, a separate page has been created - dom_example.html

<!DOCTYPE html>
<html>
    <head>
        <title>My page</title>
    </head>
    <body>
        <p class="opener">first paragraph</p>
        <p><em>second</em> paragraph</p>
        <p id="closer">final</p>
        <!-- and that's about it -->
    </body>
</html>

Accessing DOM Nodes

The document Node

The document node represents the entire HTML or XML document and is the root of the DOM tree. It is also the starting point for accessing any element or node in the document.

You can access the document node using the document object, which is a global variable in web browsers.

To investigate this node, execute the command console.dir(document). The console.dir command will display all properties and methods of this node.

All nodes (including document node, text nodes, element nodes, and attribute nodes) have nodeType, nodeName, and nodeValue properties.

> document.nodeType;
9

There are 12 types of nodes represented by integer numbers. As you can see, the document node is represented by the number 9. The most commonly encountered types are 1 (element), 2 (attribute), and 3 (text).

All nodes also have names. For HTML tags, the node name is the tag name (tagName property). Text nodes have the name #text, and the document node is named as:

> document.nodeName;
"#document"

Nodes can also have a value associated with them. For example, for text nodes, the value is the text content itself. The document node doesn't have any value:

> document.nodeValue;
null


documentElement

The documentElement property is a property of the document object, which represents the root element of the document. This means that documentElement refers to the <html> element in an HTML document.

The root element is the parent element for all other elements in the document and can contain other elements inside itself.

> document.documentElement;
<html>…</html>

nodeType equals 1, which corresponds to an element node:

> document.documentElement.nodeType;
1

For element nodes, the nodeName and tagName properties contain the name of the tag itself:

> document.documentElement.nodeName
"HTML"
> document.documentElement.tagName;
"HTML"


Child nodes

To check if a node has child nodes, you can call the hasChildNodes() method:

> document.documentElement.hasChildNodes();
true

The HTML element has three child elements: head, body, and a whitespace|empty element between them (whitespace elements are accounted for by most, but not all, browsers). You can access the child elements using the childNodes property:

> document.documentElement.childNodes.length;
3

> document.documentElement.childNodes[0];
<head>…</head>

> document.documentElement.childNodes[1];
#text

> document.documentElement.childNodes[2];
<body>…</body>

Each child node has a parentNode property that can be used to access its parent node:

> document.documentElement.childNodes[1].parentNode;
<html>…</html>

Let's save a reference to the body node:

> const bd = document.documentElement.childNodes[2];

Let's see how many child nodes the body has:

> bd.childNodes.length;
9

Let's recall what's inside the body tag:

<body>
    <p class="opener">first paragraph</p>
    <p><em>second</em> paragraph</p>
    <p id="closer">final</p>
    <!-- and that's about it -->
</body>

But why the body node contains 9 child nodes? Well, 3 p nodes and 1 comment node make up a total of 4 child nodes. The whitespace nodes between these 4 element nodes give us an additional 3 text nodes. That totals to 7 child nodes. The whitespace node between the <body> tag and the first <p> tag will be the eighth child node. The whitespace node between the comment and the closing </body> tag will be the ninth child node. To check these claims, you can run the command bg.childNodes in the console.



Attributes

Since the first child node of the body node is a whitespace node, then the second node (index 1) is the first paragraph in our HTML document:

> bd.childNodes[1];
<p class="opener">first paragraph</p>

To check if an element has attributes, the method hasAttributes() is used:

> bd.childNodes[1].hasAttributes();
true

How many attributes? In our example, there is 1 attribute - class

> bd.childNodes[1].attributes.length;
1

You can access attributes both by index and by name. You can also get the value of an attribute using the getAttribute() method:

> bd.childNodes[1].attributes[0].nodeName;
"class"

> bd.childNodes[1].attributes[0].nodeValue;
"opener"

> bd.childNodes[1].attributes['class'].nodeValue;
"opener"

> bd.childNodes[1].getAttribute('class');
"opener"


Accessing the Content of an Element

Let's take a look at the first paragraph of our document:

> bd.childNodes[1].nodeName;
"P"

You can retrieve the text content inside a paragraph by using the textContent property. The textContent property does not exist in older versions of IE, but another property, innerText, returns the same value:

> bd.childNodes[1].textContent;
"first paragraph"

> bd.childNodes[1].innerText;
"first paragraph"

There is also the innerHTML property, which returns (or sets) the HTML code contained within the node. It can be noticed that such behavior somewhat contradicts the DOM model, which represents the document as a tree of nodes rather than a string of tags. However, the innerHTML property has proven to be so convenient that it is widely used.

> bd.childNodes[1].innerHTML;
"first paragraph"

The first paragraph contains only text, so both innerHTML and textContent (innerText in IE) will return the same value. However, the second paragraph contains an em node, so we can observe the differences in the properties:

> bd.childNodes[3].innerHTML;
"<em>second</em> paragraph"

> bd.childNodes[3].textContent;
"second paragraph"

Another way to retrieve the text within the first paragraph is to use the nodeValue property of the text node contained within the p element:

> bd.childNodes[1].childNodes.length;
1

> bd.childNodes[1].childNodes[0].nodeName;
"#text"

> bd.childNodes[1].childNodes[0].nodeValue;
"first paragraph"

Methods for efficient DOM access

By using the properties childNodes, parentNode, nodeName, nodeValue and attributes, you can traverse the DOM tree and perform various operations on document nodes. However, the fact that whitespace and empty characters are also considered text nodes makes this traversal method unreliable. If the page structure changes, your script may no longer work correctly. Additionally, if you want to access nested elements of a particular node, you would need to write additional code before you can do so. That's where the fast access methods come into play, namely getElementsByTagName(), getElementsByName(), and getElementById().

getElementsByTagName() takes a tag name (element node name) as an argument and returns a collection (array-like object) of nodes that match the tag name. For example, the following script will count the number of paragraphs (the <p> tag) in the document:

> document.getElementsByTagName('p').length;
3

Access to each element of the collection can be obtained using square brackets or the item() method by specifying the index of the desired element (0 for the first element). For example:

> document.getElementsByTagName('p')[0];
<p class="opener">first paragraph</p>

> document.getElementsByTagName('p').item(0);
<p class="opener">first paragraph</p>

To retrieve the content of the first <p> tag, you can use the innerHTML property:

> document.getElementsByTagName('p')[0].innerHTML;
"first paragraph"

To access the last <p> tag:

> document.getElementsByTagName('p').item( document.getElementsByTagName('p').length - 1 );
<p id="closer">final</p>

To access the attributes of an element, you can use the attributes array or the getAttribute() method as shown before. However, a shorter way is to use the attribute name as a property of the element you're working with. This way, to get the value of the id attribute, you can write it like this:

> document.getElementsByTagName('p')[2].id;
"closer"

However, this approach will not work when accessing the value of the class attribute. This is an exception because the keyword "class" is reserved in ECMAScript. To overcome this issue, you need to use the className property instead:

> document.getElementsByTagName('p')[0].className;
"opener"

Using the getElementsByTagName() method, you can retrieve an array-like collection of all elements on the page:

> document.getElementsByTagName('*').length;
8


getElementById() is the most common method for accessing elements. You simply assign an ID attribute to the elements you intend to work with later, and then access them using the following approach:

> document.getElementById('closer');
<p id="closer">final</p>


Additional methods for fast access, introduced in modern browsers:

  • getElementsByClassName(): searching for elements by the value of the class attribute.
  • querySelector(): searching for an element based on a specified CSS selector
  • querySelectorAll(): this method is similar to the previous one, except that it returns all matching elements, not just the first one.


Elements that share the same parent

nextSibling and previousSibling are two convenient properties for navigating the DOM tree when you already have a reference to a specific element.

  • nextSibling refers to the next sibling node, which is the next element or node at the same level in the DOM hierarchy.
  • previousSibling refers to the previous sibling node, which is the previous element or node at the same level in the DOM hierarchy.

These properties allow you to traverse the DOM tree horizontally, moving to the next or previous sibling element or node from the current reference point.

> var para = document.getElementById('closer');
> para.nextSibling;
#text

> para.previousSibling;
#text

> para.previousSibling.previousSibling;
<p>…</p>

> para.previousSibling.previousSibling.previousSibling;
#text       

> para.previousSibling.previousSibling.nextSibling.nextSibling;
<p id="closer">final</p>



document.body

document.body is a property that represents the <body> element of an HTML document. It provides access to the content within the <body> tag, which is the main area of the webpage visible to the user.

> document.body;
<body>…</body>

> document.body.nextSibling;
null

> document.body.previousSibling.previousSibling;
<head>…</head>


firstChild and lastChild

firstChild and lastChild are properties of a DOM node that provide access to its first and last child nodes, respectively. firstChild is equivalent to childNodes[0], and lastChild is equivalent to childNodes[childNodes.length - 1]:

> document.body.firstChild;
#text

> document.body.lastChild;
#text

> document.body.lastChild.previousSibling;
<!-- and that's about it -->

> document.body.lastChild.previousSibling.nodeValue;
" and that's about it "



Traversing DOM

Finally, here is a function that takes any DOM node and traverses the entire DOM tree starting from the given node:

function walkDOM(n) {
    do {

        console.log(n);

        if (n.hasChildNodes()) {
            walkDOM(n.firstChild);
        }

    } while (n = n.nextSibling);
}

Usage example:

> walkDOM(document.documentElement);
> walkDOM(document.body);


Modifying DOM Nodes

Let's save a reference to the last <p> tag in a variable (remember that we are using a separate test page for all the examples):

> var my = document.getElementById('closer');

By changing the value of the innerHTML property, we modify the content of the <p> tag

> my.innerHTML = 'final!!!';
"final!!!"

Since innerHTML accepts an HTML-formatted string, you can create a new DOM element node as follows:

> my.innerHTML = '<em>my</em> final';
"<em>my</em> final"

The new em-node becomes part of the DOM tree:

> my.firstChild;
<em>my</em>

> my.firstChild.firstChild;
#text "my"

Another way to change the text inside a tag is to directly access the text node and modify its nodeValue property:

> my.firstChild.firstChild.nodeValue = 'your';
"your"


Changing Styles

More often, we need to change the presentation of elements rather than their content. All elements have a style property, which in turn contains properties that correspond to CSS properties. Here's an example of how you can change the style of a paragraph by adding a red border to it:

> my.style.border = "1px solid red";
"1px solid red"

CSS properties are often written with hyphens, which are not supported in JavaScript. In such cases, you should omit the hyphen and convert the following letter to uppercase. Thus, the CSS property padding-top becomes paddingTop, margin-left becomes marginLeft, and so on:

> my.style.fontWeight = 'bold';
"bold"

Additionally, there is access to the cssText property, which allows you to work with styles as a string:

> my.style.cssText;
"border: 1px solid red; font-weight: bold;"

To modify the style, you need to modify the string:

> my.style.cssText += " border-style: dashed;"
"border: 1px dashed red; font-weight: bold; border-style: dashed;"


Creating new element nodes

To create new nodes in the Document Object Model, you should use the methods createElement() and createTextNode(). Once you have created a new node, you can add it to the DOM tree using methods like appendChild() (or insertBefore(), or replaceChild()).

Creating a new p-element node and setting its text content:

> var myp = document.createElement('p');
> myp.innerHTML = 'yet another';
"yet another"

The new element automatically inherits all default properties, including the style property, which you can modify:

> myp.style;
CSSStyleDeclaration

> myp.style.border = '2px dotted blue';
"2px dotted blue"

By using the appendChild() method, you can add the new node to the DOM tree. Calling this method on the document.body node means creating one or more child nodes immediately after the last child node element. In our case, the new p-element will be added to the end of the page.

> document.body.appendChild(myp);
<p style="border: 2px dotted blue;">yet another</p>


insertBefore()

Using the appendChild() method, you can only add a new child element to the end of the selected element. To specify the exact position for the new element, you can use the insertBefore() method. Its functionality is similar to appendChild(), but it takes an additional parameter that indicates where (before which element) the new element should be placed.

For example, the following code will insert a text node at the end of the BODY:

> document.body.appendChild(document.createTextNode('boo!'));

The following code creates another text node and inserts it as the first child element of the BODY node:

> document.body.insertBefore(
    document.createTextNode('first boo!'),
    document.body.firstChild
);


Creating element nodes using pure DOM

Using innerHTML makes it easier to create new nodes compared to using pure DOM methods. When creating element nodes exclusively with DOM methods, you need to follow several steps:

  1. Create a text node containing the text "yet another".
  2. Create a paragraph node.
  3. Add the text node as a child node to the paragraph node.
  4. Add the paragraph node as a child node to the body node.

With this method, you can create any number of nodes and organize their nesting as desired. For example, let's say you need to add the following HTML code to the end of the body tag:

<p>one more paragraph<strong>bold</strong></p>

The hierarchy of nodes will look as follows:

P (paragraph) element node
    Text node with the value "one more paragraph"
    STRONG element node
        Text node with the value "bold"

Thus, the code to create and insert these new elements into the document looks as follows:

// Create a P element node
var myp = document.createElement('p');

// Create a text node and append it to the P element node
var myt = document.createTextNode('one more paragraph ');
myp.appendChild(myt);

// Create a STRONG element node and append a text node to it
var str = document.createElement('strong');
str.appendChild(document.createTextNode('bold'));

// Append the STRONG element node to the P element node
myp.appendChild(str);

// Append the P element node to the BODY
document.body.appendChild(myp);


cloneNode()

Another way to create new nodes is by copying (or cloning) an existing node. The cloneNode() method is used for this purpose and accepts a boolean parameter (true for deep cloning, including all child elements, false for shallow cloning of only the current element). Let's use this method.

Let's save a reference to the element we want to copy in a variable:

> var el = document.getElementsByTagName('p')[1];

Now the variable el refers to the second paragraph, which looks like this:

<p><em>second</em> paragraph</p>

Let's perform a shallow copy of this element and insert it into the body:

> document.body.appendChild(el.cloneNode(false));

You won't see any changes on the page because with shallow copying, only a copy of the <p> element is created without its nested elements. This means that the text inside the paragraph (which is a child text node) will not be copied.

The executed code is equivalent to the following:

> document.body.appendChild(document.createElement('p'));

If deep copying is performed, the entire DOM tree starting from the P-element will be copied, including the text nodes and the EM-element. The following code will fully copy the second paragraph to the end of the document:

> document.body.appendChild(el.cloneNode(true));

If you prefer, you can copy only the EM node:

> document.body.appendChild(el.firstChild.cloneNode(true));
<em>second</em>

... or only the text node with the value 'second':

> document.body.appendChild(el.firstChild.firstChild.cloneNode(false));
"second"


Remove Nodes

To remove nodes from the DOM tree, the removeChild() method is used.

Here's how you can remove the second paragraph (remember that we are using a separate test page for all the examples):

> var myp = document.getElementsByTagName('p')[1];
> var removed = document.body.removeChild(myp);

The removeChild() method returns the removed element, in case you need to use it further. You can still use all DOM methods on the removed element, even though it no longer exists in the DOM tree.

For example:

> removed;
<p>…</p>

> removed.firstChild;
<em>second</em>

There is also a method called replaceChild() that removes a node and inserts a new one in its place.

Here's how you can replace the second paragraph with the one stored in the removed variable:

> var p = document.getElementsByTagName('p')[1];
> var replaced = document.body.replaceChild(removed, p);

Just like removeChild(), replaceChild() also returns a reference to the node that was removed from the DOM tree.

> replaced;
<p id="closer">final</p>

The quick way to remove all content inside an element is to assign an empty string to the innerHTML property. The following code will remove all the children of the BODY tag:

> document.body.innerHTML = '';
""

To check if the BODY tag no longer has any children, you can use the following code:

> document.body.firstChild;
null

To remove nodes using only DOM methods, you would need to traverse all the descendants of a given node and remove each one individually. Here's a small function that removes all nodes starting from the provided node:

function removeAll(n) {
    while (n.firstChild) {
        n.removeChild(n.firstChild);
    }
}

You can call this function with the desired node as an argument to remove all its descendants. For example, to remove all nodes within the BODY tag, you can use

> removeAll(document.body);