Skip to main content
@ooneex/html wraps Cheerio in a small, typed Html class for parsing HTML and pulling out structured data. Load markup from a string or a URL, then call focused extractors that return plain typed objects for images, links, headings, videos, and checkbox tasks. It’s built for scraping and content analysis rather than DOM mutation.

Installation

Install the package with Bun.
bun add @ooneex/html

Usage

Create an Html instance with a markup string (or empty), then query it. All extractors return typed arrays you can iterate directly.
import { Html } from "@ooneex/html";

const html = new Html(`
  <article>
    <h1 id="title">Getting Started</h1>
    <p>Read the <a href="/docs" title="Docs">documentation</a>.</p>
    <img src="/hero.png" alt="Hero" width="800" />
  </article>
`);

html.getHeadings();
// [{ level: 1, text: "Getting Started", id: "title" }]

html.getLinks();
// [{ href: "/docs", text: "documentation", title: "Docs", target: null, rel: null }]

html.getImages();
// [{ src: "/hero.png", alt: "Hero", title: null, width: "800", height: null }]

Loading from a URL

Use loadUrl to fetch and parse a remote page. It returns the instance, so you can chain an extractor right away.
import { Html } from "@ooneex/html";

const page = await new Html().loadUrl("https://example.com");

const links = page.getLinks();
const text = page.getContent(); // trimmed plain-text of the whole document
You can also reuse an instance and swap its content with load, which returns this for chaining.
const html = new Html();

html.load("<h2>Section</h2>").getHeadings();

Extracting videos and tasks

getVideos collects <video> elements with their attributes and nested <source> tags, and getTasks reads checkbox list items.
import { Html } from "@ooneex/html";

const html = new Html(`
  <video poster="/poster.jpg" controls>
    <source src="/clip.webm" type="video/webm" />
  </video>
  <ul>
    <li><input type="checkbox" checked /> Write docs</li>
    <li><input type="checkbox" /> Ship release</li>
  </ul>
`);

html.getVideos();
// [{ src: null, poster: "/poster.jpg", controls: true, autoplay: false,
//    loop: false, muted: false, width: null, height: null,
//    sources: [{ src: "/clip.webm", type: "video/webm" }] }]

html.getTasks();
// [{ text: "Write docs", checked: true }, { text: "Ship release", checked: false }]
You can also get the serialized markup back with getHtml().

When to use it

  • Scraping or analyzing remote pages — fetch with loadUrl and pull out links, images, or headings.
  • Extracting a table of contents or outline from rendered HTML via getHeadings().
  • Collecting media references (getImages, getVideos) from user-supplied or fetched markup.
  • Parsing checkbox task lists out of HTML (e.g. rendered Markdown) with getTasks().
  • Grabbing the clean text content of a document with getContent().
You don’t need it if you only need to build or template HTML strings, or if you’re already running in a browser with direct DOM access — reach for it specifically when you need to parse and extract from existing markup server-side.