Node.js is a single-threaded JavaScript runtime, so any CPU-intensive code will block the event loop. This is a big problem for production web servers.
So how could one discover the bottleneck in their web server?
Tracing
Well, you could try tracing by auto-instrumenting your code using OpenTelemetry, and if the bottleneck is in any of the auto instrumented 3rd party libraries, you will have found your bottleneck, but if not, you’ll have to manually instrument all of your code (or at least the parts you think are likely the bottleneck, and hope you are right).
Profiling
According to the Node.js documentation:
Profiling a Node.js application involves measuring its performance by analyzing the CPU, memory, and other runtime metrics while the application is running. This helps in identifying bottlenecks, high CPU usage, memory leaks, or slow function calls that may impact the application’s efficiency, responsiveness and scalability. There are many third party tools available for profiling Node.js applications but, in many cases, the easiest option is to use the Node.js built-in profiler. The built-in profiler uses the profiler inside V8 which samples the stack at regular intervals during program execution. It records the results of these samples, along with important optimization events such as jit compiles, as a series of ticks.
1
NODE_ENV=production node --prof app.js
However there is an overhead to profiling, so you probably shouldn’t be running your backend with profiling on 100%, instead you should sample the profiling, i.e, run it periodically. For example, sample for X seconds every Y minutes.
But if the spike occurs between the sampling intervals, it won’t be caugh by the profiler, so you’re running on pure luck here.
Conditional profiling
What if the application could start profiling conditionally, like when the event loop lag spikes, and stop profiling after the spike is resolved?
This way you’re not running the profiler 24/7, and you’re not running it in sampling intervals and hoping for it to catch the bottlenecks.
So how can we do it? We need to:
1- Have a way of getting event loop lag statistics periodically.
2- Have a way of starting and stopping the profiler on demand.
Monitoring Event Loop Lag
Node.js has a built-in way to monitor the event loop lag called monitorEventLoopDelay.
Here’s the JSDoc for it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/**
* _This property is an extension by Node.js. It is not available in Web browsers._
*
* Creates an `IntervalHistogram` object that samples and reports the event loop
* delay over time. The delays will be reported in nanoseconds.
*
* Using a timer to detect approximate event loop delay works because the
* execution of timers is tied specifically to the lifecycle of the libuv
* event loop. That is, a delay in the loop will cause a delay in the execution
* of the timer, and those delays are specifically what this API is intended to
* detect.
*
* ```js
* import { monitorEventLoopDelay } from 'node:perf_hooks';
* const h = monitorEventLoopDelay({ resolution: 20 });
* h.enable();
* // Do something.
* h.disable();
* console.log(h.min);
* console.log(h.max);
* console.log(h.mean);
* console.log(h.stddev);
* console.log(h.percentiles);
* console.log(h.percentile(50));
* console.log(h.percentile(99));
* ```
* @since v11.10.0
*/
Let’s assume we want to do our conditional profiling based on the event loop lag max time, so we will be using h.max.
Starting the profiler on demand
Node.js has a built-in way for starting and stopping the V8 inspector on demand by the running code itself.
1
import { Session } from "inspector/promises";
The
inspector.Sessionis used for dispatching messages to the V8 inspector back-end and receiving message responses and notifications.
Here’s how to use it:
1
2
3
4
5
6
7
8
9
10
11
12
13
import { Session } from "inspector/promises";
const session = new Session();
await session.post("Profiler.enable");
// Set sampling interval to 10 microseconds for finer granularity
await session.post("Profiler.setSamplingInterval", { interval: 10 });
await session.post("Profiler.start");
// Profile for 1 second
await sleep(1000);
const { profile } = await session.post("Profiler.stop");
const filename = `cpu-profile-${new Date().toISOString()}.json`;
writeFileSync(filename, JSON.stringify(profile));
Now you have a JSON file that contains information about the execution of your program.
Now let’s tie this into a full example.
Full example
We will write a simple web server that has an endpoint that runs a CPU-intensive task that blocks the event loop, and another endpoint that will allow us to end this spike to test that the profiling will indeed stop after the spike is over. The threshold for max event loop lag that triggers the profiling is 200 milliseconds.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
import express from "express";
import { monitorEventLoopDelay } from "perf_hooks";
import { Session } from "inspector/promises";
import { writeFileSync } from "fs";
import crypto from "crypto";
function cpuIntensiveFunction() {
for (let i = 0; i < 100; i++) {
crypto.pbkdf2Sync("password", "salt", 10000, 64, "sha512");
}
}
const histogram = monitorEventLoopDelay({ resolution: 1 });
histogram.enable();
const session = new Session();
session.connect();
let isProfiling = false;
// Monitor event loop delay and trigger profiling if threshold exceeded
setInterval(async () => {
const maxEventLoopLagInMS = histogram.max / 1000000;
console.log({ maxEventLoopLagInMS });
if (maxEventLoopLagInMS > 200 && !isProfiling) {
console.log(
`Event loop latency exceeded 200ms (${maxEventLoopLagInMS.toFixed(2)}ms) - Starting CPU profiling`,
);
isProfiling = true;
try {
await session.post("Profiler.enable");
// Set sampling interval to 10 microseconds for finer granularity
await session.post("Profiler.setSamplingInterval", { interval: 10 });
await session.post("Profiler.start");
// Profile for 1 second
try {
await sleep(1000);
const { profile } = await session.post("Profiler.stop");
const filename = `cpu-profile-${new Date().toISOString()}.json`;
writeFileSync(filename, JSON.stringify(profile));
console.log(`CPU profile saved to ${filename}`);
} catch (error) {
console.error("Error stopping profiler:", error);
} finally {
isProfiling = false;
}
} catch (error) {
console.error("Error starting profiler:", error);
isProfiling = false;
}
}
histogram.reset();
}, 5000);
/**
* @param {number} ms
*/
async function sleep(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
/**
* @type {NodeJS.Timeout | null}
*/
let loadInterval = null;
async function main() {
const app = express();
app.get("/hello", (_, res) => {
res.end("world");
});
app.post("/load/start", (_, res) => {
console.log("Received request to start CPU load");
loadInterval = setInterval(() => {
cpuIntensiveFunction();
}, 2000);
res.send("CPU load started");
});
app.post("/load/stop", (_, res) => {
if (!loadInterval) {
res.status(400).send("No load running");
return;
}
console.log("Received request to stop CPU load");
clearInterval(loadInterval);
loadInterval = null;
res.send("CPU load stopped");
});
app.listen(3000, () => console.log("Started web server on port 3000..."));
}
main();
I ran this code and here are the logs:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
node main.js
Started web server on port 3000...
{ maxEventLoopLagInMS: 3.401727 }
{ maxEventLoopLagInMS: 3.692543 }
Received request to start CPU load
{ maxEventLoopLagInMS: 9.060351 }
{ maxEventLoopLagInMS: 1169.162239 }
Event loop latency exceeded 200ms (1169.16ms) - Starting CPU profiling
CPU profile saved to cpu-profile-2025-10-28T21:19:33.391Z.json
Received request to stop CPU load
{ maxEventLoopLagInMS: 2.600959 }
{ maxEventLoopLagInMS: 2.187263 }
{ maxEventLoopLagInMS: 2.646015 }
So at first the event loop lag was normal (3 ms), and after calling the CPU-intensive endpoint, the event loop lag spiked to 1169 ms, then the profiler started profiling because this lag exceeds the 200 ms threshold, and after the spiked was over, the event loop lag returned to normal levels and the profiler stopped profiling.
So, now we have a JSON file that contains data about the execution of our code. How can we get useful information from it? There are many visualization tools, and for this example I will use Speedscope.
Here’s the visualization after I upload the file.
We can see that the bottleneck is in the function named cpuIntensiveFunction on line 7 in the main.js file. We can also see that underlying bottleneck of this function is calling pbkdf2Sync.
Now we know the bottleneck and can fix it in the appropriate way.
Here’s the repo for the example code: https://github.com/adhamsalama/event-loop-based-profiling
