YDN Blog Category Archive: performance
November 24, 2009
An Engineer's Guide to DNS
This article is the second in a series and part of ongoing research on web app performance. Get updates on the latest YDN articles via Twitter, follow @ydn.
The Domain Name System (DNS) is part of the "dark matter" of the internet. It's hard to observe the DNS directly yet it exerts an obscure, pervasive influence without which everything would fly apart. Because it's so difficult to probe people tend to take it for granted, which I think is a mistake. DNS problems can hurt the speed and reliability of your applications without you even noticing. In this article we'll take a look at the behavior of the DNS and walk through some experiments you can run to gather valuable data about your users' network performance.
A Clever Shambles
Before two computers can talk to each other on the 'net, one of them has to know the numeric IP address of the other. Using the DNS is often compared to looking up a number in the phone book. But that can give the impression the information is in one place, close to hand.
Instead, imagine it's 1982. You live in Tucson and you want to call a hotel in Toronto. You don't have a Toronto phone book so you call your local library. They don't have one either. Life is boring in Tucson, so the librarian uses her New York phone book to call another library. The nice lady in New York looks up the hotel's number in her copy of the Toronto phone book, tells it to your local librarian, who then calls back to give it to you. Doing all this is a hassle, so everyone in the chain writes down the number just in case the question ever comes up again.
The DNS is even more complex because of the hierarchy of internet domains. Consider the host name foo.bar.example.net. To look it up your computer will have to look up every part of the name, in reverse order. That means resolving ".", then "net.", then "example.net.", "bar.example.net.", and finally "foo.bar.example.net."[0]. It's not just a matter of finding the Toronto book. It's looking up someone who knows someone who has the Canada book and from there who has the Ontario book, then the Toronto book, and so on.
If this sounds ridiculously complex and fragile, that's because it is. Writing down the answer to common queries, aka caching, is the only reason we're able to get away with it. In practice the root domain "." is known to everyone. During normal operation "net." should be cached all levels including at your local librarian, aka your ISP. Anything beyond that requires some lookups unless the domain is already very well-known.
How long does it take to look up a hostname?
A single DNS lookup may involve several recursive lookups to machines all over the world. Because of this hassle, information is cached for short periods of time at every level, including on your computer. So "the time it takes to do a DNS lookup" can vary wildly depending on the state of affairs in many different places, and the quality of the network connections between them.
On Mac OSX the dscacheutil command will tell you about your computer's latency and cache hit ratio:
$ dscacheutil -statistics
Overall Statistics:
Average Call Time - 0.118626
Cache Hits - 236152
Cache Misses - 231052
Total External Calls - 279350
Statistics by procedure:
Procedure Cache Hits Cache Misses External Calls
------------------ ---------- ------------ --------------
gethostbyname 161252 39952 6749
gethostbyaddr 60 151 211
...
These numbers are interesting but fairly useless for our purposes. It combines cached and uncached lookups into one "average". Also, browsers often cache and even precache DNS information, bypassing whatever the operating system is doing. So we can't rely on what the machine tells us. We need to do some experimenting on our own.
First, I ran long series of tests against Yahoo hostnames from the office, my house, and other locations. For 100 seconds I ran as many DNS lookups as I could and timed them. Each lookup was for a wildcard hostname. A wildcard like *.dnstest.example.net means you can make up random new hostnames on the fly, eg x9zzy.dnstest.example.net, that will resolve to a real IP address. This ensures that each test will be a full end-to-end DNS lookup without any caching to skew the numbers [1].
Figure 0: Average DNS latency at various locations
This graph is useful mostly to illustrate that it's possible for users on "broadband" connections to have invisible performance problems related to DNS. But it doesn't tell you which users or how many.
How can we figure out the response time distribution (distribution, not average) for a wide range of users? How can we get a better idea of the role the DNS plays in the performance of web applications? Conditions on the internet change constantly. The tests would have to be large-scale and continuous to mean anything.
Let's scope things down a bit. We don't really care about how quickly users resolve any hostname. We care about how quickly our users resolve our hostnames. So maybe you can get the data you want by observing your users. Unfortunately DNS lookups happen mostly through computers we do not control. Worse, they happen over UDP, which doesn't expose performance data to the callee. The request and response packets are sent without any error correction or acknowledgement. So we can't just look at the usual logs we collect on our servers.
The librarian in New York will never know how long it took the librarian in Tucson to call you back. The hotel staffer in Toronto has no idea how you found their number. That is, unless you tell him. And that's what we'll do: run a special series of tests from the perspective of the caller, ie the users, and report back results.
A DNS Observatory
It's tricky but not impossible to gather some statistics on user DNS latency without running benchmark software on their computers. One way works like this:
Set up a wildcard hostname, perferably one that does not share cookies with your main site. Give it a low TTL, say, 60 seconds, so you don't pollute downstream caches.
- Set up a webserver for the wildcard hostname that serves zero-byte files as fast as possible. Make sure that KeepAlive, Nagle, and any caching headers are turned off.
- In the footer of the pages in your main site, add a script similar to Listing 1. It performs two HTTP requests:
/A.gifand/B.gif. The first image load, A, will require a full DNS lookup and an HTTP transaction. The second, B, should only involve an HTTP transaction. - Subtract the time it takes to complete B from the time it takes to complete A, and you have a (very) rough idea of how long it took to perform just the DNS lookup.
- Send the DNS and HTTP statistics back to your server as part of another image request. You can extract the results later from your logs.
- Rinse and repeat over a large sample (>10,000) of page views. Millions if you can.
NB: You will get strange, even negative, numbers from this test. The deviation of individual data points can be greater than the phenomenon you are trying to measure. If you want to get accurate numbers for a specific user you'll need to run many tests over a period of time. But a single test per user works well enough in aggregate.
<script>
(function() {
function dns_test() {
var random = Math.floor(Math.random()*(2147483647)).toString(36);
var host = 'http://'+random+".dnstest.example.net";
var img1 = new Image();
var img2 = new Image();
var img3 = new Image();
var ts = null;
var stats = {};
img1.onload = function() {
stats['dns'] = (new Date()).getTime() - ts;
ts = (new Date()).getTime();
img2.src = host + "/B.gif";
};
img2.onload = function() {
stats['http'] = (new Date()).getTime() - ts;
stats.dns = stats.dns - stats.http; // the clever bit
img3.src = host + '/dnstest.gif?dns='+stats.dns+'&http='+stats.http;
};
ts = (new Date()).getTime();
img1.src = host + "/A.gif";
}
window.setTimeout(dns_test, 11337);
})();
</script>
Below is a graph of the distribution of uncached DNS lookup times from real users in the wild, collected by this script over one week. The sample was heavily skewed towards US broadband connections. The median was 146 milliseconds and the gemoetric mean was 163 milliseconds [2]. This is rather larger than the 20-120 milliseconds quoted in the Yahoo Performance Guidelines for a "typical" DNS lookup. Beware pithy numbers (even ours).
The distribution is even more interesting than the averages. Twenty percent of users in our sample took more than 500 milliseconds just to resolve one hostname. Granted, these lookups were uncached. Assuming a 50% cache hit rate, that's still one out of ten users in this dataset laboring under crappy DNS performance. As of this writing that's a market as large as Safari, Chrome and Opera combined.

The cause is unclear. It's possible that user network quality is just that bad. It could be physical distance. It could also be the DNS resolvers of ISPs at fault. It could be your DNS server. Or it could be something else. Or all of the above.
Remember that your mileage may vary. Not every combination of site and userbase will have a similar graph. Also remember that a lot of caching is going on at every level of the system. There's not a simple fixed cost to using alternate hosts for your images and scripts. The best strategy may well be to have one and only one "asset host" or CDN that does not share cookies with your main site.
If you run a commercial website, consider setting up with a dedicated DNS hosting provider that has presence on several continents. The DNS hosting service typically thrown in for free by domain registrars is not very good. For most sites, solid DNS hosting costs about $USD 50 per year. It's worth the effort. Heck, set up with two different services for failover.
Try this at home
For privacy reasons we can't release the raw data we collected. But if you have a website with a fair amount of traffic, I strongly encourage you to run these DNS measurements for yourself. You can learn a lot by drilling down into the data.
- Play with graphing the distributions of different subnets (eg 18.* for MIT or 12.* for AT&T). You might be surprised at who is fast and who is slow.
In your webserver logs for /dnstest.gif there should be a User-Agent field as well. So you can look at correlations between DNS performance, browsers, and operating systems. For example, check out those little bumps at 1s and 3s in Figure 1. It turns out that the DNS resolver in Windows has aggressive timeouts. Those bumps are caused by Windows clients timing out then succeeding on a retry.- We're not just timing DNS latency, we're also timing how long it takes to perform a minimal TCP handshake + HTTP transaction. That gives you interesting information about user connection latencies, for free. But that's a whole 'nother subject.
This article is the second in a series and part of ongoing research on web app performance. If you have any suggestions or ideas to help improve the experiments, please leave a note in the comments. Next we hope to dig into more detail about user network performance data and how you can use it to improve your websites and applications.
You can also check out the Yahoo! performance guidelines for suggestions on improving the performances of your site, or install the YSlow toolbar for Firebug to help profile the speed of your site.
Carlos Bueno
carlosb@yahoo-inc.com
Software Engineer, Yahoo! Mail
Follow @ydn on Twitter, now for more performance updates.
[0] The dot "." at the end is not a typo. Though "com" and "net" are called "top-level" domains there is actually one more layer behind them called the root domain, designated by that trailing dot. The root domain is managed as a global public utility by dozens of internet service providers all over the world.
Fun fact: the entire country of Sweden dropped off the 'net in October 2009 because a network operator forgot to include that last dot in a configuration file.
[1] I'm fudging here a bit. It's possible that during this test, everything up to .dnstest.example.net will be cached at the user's ISP. This is by design, to reduce load on the root and top-level domain servers. But the lookup should always at least do a request to the ISP's resolver and a request in turn to example.net's authoritative DNS server.
[2] These kinds of datasets tend to be log-normal, with long thin tails trailing from a large central spike. The "average" value, or arithemtic mean, would be misleading in this case so we won't discuss it.
[∞] Bonus footnote! Here is the code to generate a table from your webserver logs:
# run a grep for "/dnstest.gif" and save to a file bzcat /your/apache/logs/access_log.*.txt.bz2 | grep dnstest.gif > /tmp/dnstest.log# perl magic to split the data into columns
echo "A,B,C,D,dns,http" > latency.csv
perl -lane 'if (/^([\d]+)\.([\d]+)\.([\d]+)\.([\d]+).+dns=(\d+)\&http=(\d+)/) { print "$1,$2,$3,$4,$5,$6" }' dnstest.log >> latency.csv
Here are the R commands to generate graphs and poke around at the data. If you've never tried R, it is a wonderful open-source statistics suite. The best introduction on how to use it is here. (PDF)
## R script for generating the histogram
x <- read.csv("latency.csv", header=TRUE)# take only results greater than 0 and less than 4,000
y <- subset(x, (x$dns > 0 & x$dns < 4000))# draw the histogram
hist(y$dns, xlab="Milliseconds", main=NA, breaks=200, col="red", border="red", prob=FALSE)# rug() adds "tassles" to the bottom of the graph to show data point density.
# rug() makes tassles. Get it? Yeah.
rug(sample(y$dns, 5000))
## bonus bonus: more interesting stats
# Show only requests from Brazil. this filter is not strictly true
# (ie, they have more subnets than 200/8) but it's true enough to play with.
brazil = subset(y, (y$A==200))
hist(brazil$dns, xlab="Milliseconds", main="Brazil (200.*)", breaks=200, col="green", border="green", prob=FALSE)
rug(sample(brazil$dns, 5000))## other interesting subnets
mit = subset(y, (y$A==18))
att = subset(y, (y$A==12))# the "average" is not very useful with log-normal datasets
#mean(y$dns)
# median and geometric mean are more informative
median(y$dns)
exp(mean(log(y$dns)))# percentage of users over / under a specific point
table(y$dns > 250) / length(y$dns)
table(y$dns > 500) / length(y$dns)
Posted at 9:18 AM | Comments (9) | Permalink
October 1, 2009
An Engineer's Guide to Bandwidth
Web app developers spend most of our time not thinking about how data is actually transmitted through the bowels of the network stack. Abstractions at the application layer let us pretend that networks read and write whole messages as smooth streams of bytes. Generally this is a good thing. But knowing what's going underneath is crucial to performance tuning and application design. The character of our users' internet connections is changing and some of the rules of thumb we rely on may need to be revised.
In reality, the Internet is more like a giant cascading multiplayer game of pachinko. You pour some balls in, they bounce around, lights flash and —usually— they come out in the right order on the other side of the world.
What we talk about, when we talk about bandwidth
It's common to talk about network connections solely in terms of "bandwidth". Users are segmented into the high-bandwidth who get the best experience, and low-bandwidth users in the backwoods. We hope some day everyone will be high-bandwidth and we won't have to worry about it anymore.
That mental shorthand served when users had reasonably consistent wired connections and their computers ran one application at a time. But it's like talking only about the top speed of a car or the MHz of a computer. Latency and asymmetry matter at least as much as the notional bits-per-second and I argue that they are becoming even more important. The quality of the "last mile" of network between users and the backbone is in some ways getting worse as people ditch their copper wires for shared wifi and mobile towers, and clog their uplinks with video chat.
It's a rough world out there, and we need to to a better job of thinking about and testing under realistic network conditions. A better mental model of bandwidth should include:
- packets-per-second
- packet latency
- upstream vs downstream
Packets, not bytes
The quantum of internet transmission is not the bit or the byte, it's the packet. Everything that happens on the 'net happens as discrete pachinko balls of regular sizes. A message of N bytes is chopped into ceil(N / 1460) packets [1] which are then sent willy-nilly. That means there is little to no difference between sending 1 byte or 1,000. It also means that sending 1,461 bytes is twice the work of sending 1,460: two packets have to be sent, received, reassembled, and acknowledged.
Packet #1 Payload
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
.....................................................................
...........................................................
Packet #2 Payload
.
Listing 0: Byte 1,461 aka The Byte of Doom
Crossing the packet line in HTTP is very easy to do without knowing it. Suppose your application uses a third-party web analytics library which, like most analytics libraries, stores a big hunk of data about the user inside long-lived cookie tied to your domain. Suppose you also stuff a little bit of data into the cookie too. This cookie data is thereafter echoed back to your web server upon each request. The boilerplate HTTP headers (Accept, User-agent, etc) sent by every modern browser take up a few hundred more bytes. Add in the actual URL, Referer header, query parameters... and you're dead. There is also the little-known fact that browsers split certain POST requests into at least two packets regardless of the size of the message.
One packet, more or less, who cares? For one, none of your fancy caching and CDNs can help the client send data upstream. TCP slow-start means that the client will wait for acknowledgement of the first packet before sending the second. And as we'll see below, that extra packet can make a large difference in the responsiveness of your app when it's compounded by latency and narrow upstream connections.
Packet Latency
Packet latency is the time it takes a packet to wind through the wires and hops between points A and B. It is roughly a function of the physical distance (at 2/3 of the speed of light) plus the time the packet spends queued up inside various network devices along the way. A typical packet sent on the 'net backbone between San Francisco and New York will take about 60 milliseconds. But the latency of a user's last-mile internet connection can vary enormously [2]. Maybe it's a hot day and their router is running slowly. The EDGE mobile network has a best-case latency of 150msec and a real-world average of 500msec. There is a semi-famous rant from 1996 complaining about 100msec latency from substandard telephone modems. If only.
Packet loss
Packet loss manifests as packet latency. The odds are decent that a couple packets that made up the copy of this article you are reading got lost along the way. Maybe they had a collision, maybe they stopped to have a beer and forgot. The sending end then has to notice that a packet has not been acknowledged and re-transmit.
Wireless home networks are becoming the norm and they are unfortunately very susceptible to interference from devices sitting on the 2.4GHz band, like microwaves and baby monitors. They are also notorious for cross-vendor incompatibilities. Another dirty secret is that consumer-grade wifi devices you'll find in cafés and small offices don't do traffic shaping. All it takes is one user watching a video to flood the uplink.
Upstream < Downstream
Internet providers lie. That "6 Megabit" cable internet connection is actually 6mbps down and 1mbps up. The bandwidth reserved for upstream transmission is often 20% or less of the total available. This was an almost defensible thing to do until users started file sharing, VOIPing, video chatting, etc en masse. Even though users still pull more information down than they send up, the asymmetry of their connections means that the upstream is a chokepoint that will probably get worse for a long time.
A Dismal Testing Harness

Figure 0: It's popcorn for dinner tonight, my love. I'm doing science!
We need a way to simulate high latency, variable latency, limited packet rate, and packet loss. In the olden days a good way to test the performance of a system through a bad connection was to configure the switch port to run at half-duplex. Sometimes we even did such testing on purpose. :) Tor is pretty good for simulating a crappy connection but it only works for publicly-accessible sites. Microwave ovens consistently cause packet loss (my parents' old monster kills wifi at 20 paces) but it's a waste of electricity.
The ipfw on Mac and FreeBSD comes in handy for local testing. The command below will approximate an iPhone on the EDGE network with a 350kbit/sec throttle, 5% packet loss rate and 500msecs latency. Use sudo ipfw flush to deactivate the rules when you are done.
$ sudo ipfw pipe 1 config bw 350kbit/s plr 0.05 delay 500ms
$ sudo ipfw add pipe 1 dst-port http
Here's another that will randomly drop half of all DNS requests. Have fun with that one.
$ sudo ipfw pipe 2 config plr 0.5
$ sudo ipfw add pipe 2 dst-port 53
To measure the effects of latency and packet loss I chose a highly-cached 130KB file from Yahoo's servers. I ran a script to download it as many times as possible in 5 minutes under various ipfw rules [3]. The "baseline" runs were the control with no ipfw restrictions or interference.

Figure 1: The effect of packet latency on download speed

Figure 2: Effect of packet loss on download speed
Just 100 milliseconds of packet latency is enough to cause a smallish file to download in an average of 1500 milliseconds instead of 350 milliseconds. And that's not the worst part: the individual download times ranged from 1,000 to 3,000 milliseconds. Software that's consistently slow can be endured. Software that halts for no obvious reason is maddening.

Figure 3: Extreme volatility of response times during packet loss.
So, latency sucks. Now what?
Yahoo's web performance guidelines are still the most complete resource around, and backed up by real-world data. The key advice is to reduce the number of HTTP requests, reduce the amount of data sent, and to order requests in ways that use the observed behavior of browsers to best effect. However there is a simplification which buckets users into high/low/mobile categories. This doesn't necessarily address poor-quality bandwidth across all classes of user. The user's connection quality is often very bad and getting worse, which changes the calculus of what techniques to employ. In particular we should also take into account that:
- Upstream packets are almost always expensive.
- Any client can have high or low overall bandwidth.
- High latency is not an error condition, it's a fact of life.
- TCP connections and DNS lookups are expensive under high latency.
- Variable latency is in some ways worse than low bandwidth.
Assuming that a large but unknown percentage of your users labor under adverse network conditions, here are some things you can do:
- To keep your user's HTTP requests down to one packet, stay within a budget of about 800 bytes for cookies and URLs. Note that every byte of the URL counts twice: once for the URL and once for the Referer header on subsequent clicks. An interesting technique is to store app state that doesn't need to go to the server in fragment identifiers instead of query string parameters, e.g.
/blah#foo=barinstead of/blah?foo=bar. Nothing after the # mark is sent to the server. - If your app sends largish amounts of data upstream (excluding images, which are already compressed), consider implementing client-side compression. It's possible to get 1.5:1 compression with a simple LZW+Base64 function; if you're willing to monkey with ActionScript you could probably do real gzip compression.
YSlow says you should flush() early and put Javascript at the bottom. The reasoning is sound: get the HTML <head> portion out as quickly as possible so the browser can start downloading any referenced stylesheets and images. On the other hand, JS is supposed to go on the bottom because script tags halt parallel downloads. The trouble comes when your page arrives in pieces over a long period of time: the HTML and CSS are mostly there, maybe some images, but the JS is lost in the ether. That means the application may look like it's ready to go but actually isn't — the click handlers and logic and ajax includes haven't arrived yet.

Figure 4: docs is loading slowly... dare I click?Maybe in addition to the CSS/HTML/Javascript sandwich you could stuff a minimal version of the UI into the first 1-3KB, which gets replaced by the full version. Google Docs presents document contents as quickly as possible but disables the buttons until its sanity checks pass. Yahoo's home page does something similar.
This won't do for heavier applications, or those that don't have a lot of passive text to distract the user with while frantic work happens offstage. Gmail compromises with a loading screen which times out after X seconds. On timeout it asks the user to choose whether to reload or use their lite version.
- Have a plan for disaster: what should happen when one of your scripts or styles or data blobs never arrives? Worse, what if the user's cached copy is corrupted? How do you detect it? Do you retry or fail? A quick win might be to add a checksum/eval step to your javascript and stylesheets.
- We also recommend that you should make as much CSS and Javascript as possible external and to parallelize HTTP requests. But is it wise to do more DNS lookups and open new TCP connections under very high latency? If each new connection takes a couple seconds to establish, it may be better to inline as much as possible.
- The trick is how to decide that an arbitrary user is suffering high latency. For mobile users you can pretty much take high latency as a given [4]. Armed with per-IP statistics on client network latency from bullet #4 above, you can build a lookup table of high-latency subnets and handle requests from those subnets differently. For example if your servers are in Seattle it's a good bet that clients in the 200.0.0.0/8 subnet will be slow. 200.* is for Brasil but the point is that you don't need to know it's for Brasil or iPhone or whatever — you're just acting on observed behavior. Handling individual users from "fast" subnets who happen to have high latency is a taller order. It may be possible to get information from the socket layer about how long it took to establish the initial connection. I don't know the answer yet but there is promising research here and there.
- A good technique that seems to go in and out of fashion is KeepAlive. Modern high-end load balancers will try to keep the TCP connection alive between themselves and the client, no matter what, while also honoring whatever KeepAlive behavior the webserver asks for. This saves expensive TCP connection setup and teardown without tying up expensive webserver processes (the reason why some people turn it off). There's no reason why you couldn't do the same with a software load balancer / proxy like Varnish.
This article is the first in a series and part of ongoing research on bandwidth and web app performance. It's still early in our research, but we chose to share what we've found early so you can join us on our journey of discovery. Next, we will dig deeper into some of the open questions we've posed, examine real-world performance in the face of high latency and packet loss, and suggest more techniques on how to make your apps work better in adverse conditions based on the data we collect.
Carlos Bueno
Software Engineer, Yahoo! Mail
Read more about how to optimize your web site performance with the Yahoo! performance guidelines.
Notes
[1] ceil(N / 1460) is the same algorithm you use to figure out how many trips it takes to carry your laundry down the stairs. (ceil is geekspeak for rounding up.) Say you have 50 pounds of clothes and the basket holds 13 pounds. 50 / 13 = 3 remainder 11, so you need to make 4 trips. The bigger the basket the fewer the trips. So why not use huge packets? On private networks you might see configurations for "Jumbo frames". But in the wild you have to consider the cost of packet loss, typical message sizes, old or incompatible routers, etc.
That specific number (aka Maximum Segment Size) comes from the maximum packet size (aka Maximum Transmission Unit) of 1,500 octets (aka bytes) set in RFC 1191 (aka Ethernet v2), minus the space reserved for the source and destination addresses, flags, etc. IPv6, which has been coming any day now since the Clinton administration, will probably converge on an MSS of 1,220 or 1,440 in the wild. Point being, we're stuck with tiny packets for the rest of our lifetimes.
[2] DNS can also cause latency. We tend to take hostname lookups for granted, but an ISP's DNS resolvers are often unloved. It once took me several years to convince BellSouth's customer service that one of their DNS resolvers was actually off the network. User DNS problems are doubly nasty because we as application developers can't control or even detect them.
[3] The script was single-threaded and used a new TCP connection for each request. A single restriction was used per run, ie X milliseconds latency or Y% packet loss. The wifi was a Linksys WRT45g at a distance of 5 meters, with standard firmware in 802.11g mode and WPA2 encryption. The uplink was a "6mbps" home cable connection about 50 miles and ten network hops away from the nearest Yahoo caching server, during off-peak hours.
[4] The Google mobile team recently put out an interesting fact: "On an iPhone 2.2 device, 200k of JavaScript held within a block comment adds 240ms during page load, whereas 200k of JavaScript that is parsed during page load added 2600 ms."
Image Credit: Pachinko by the_toe_stubber on Flickr.
Posted at 8:00 AM | Comments (30) | Permalink
September 22, 2009
Not Just a Pretty Face: Performance and the New Yahoo! Search
Today we announced the new Yahoo! Search Results Page, which ships with a wide array of rich new features. What you might be surprised to learn is that the new design is actually a little faster than the original. Through diligent use of modern performance techniques, we not only held the line on the total page size and number of HTTP requests, but we also made a number of improvements to the load time of the page. Now that you've seen the new search results page, let's walk through some of the performance considerations we used when constructing the new template.
Code Refactoring
Any sweeping design change like this is a great opportunity to refactor, and we took full advantage, rebuilding the Yahoo! Search page's HTML, CSS, and JavaScript foundation from scratch. If you think of the template as a shell that wraps the "10 blue links" in the center of the page, all the markup around the middle content well has been rewritten. This allowed us to get rid of old cruft and take advantage of quite a few new techniques and best practices, reducing core pageweight and render complexity in the process.
As just one quick example of what's new, our search page now uses CSS image flipping. Rather than including separate images in our sprite for up and down arrows, we actually only include a down arrow. To generate an up arrow for all A-grade browsers, we use vendor-provided CSS hooks:
-moz-transform: rotate(180deg);
-webkit-transform: rotate(180deg);
filter: progid:DXImageTransform.Microsoft.BasicImage(rotation=2);
The actual byte savings are small, but every little bit counts, and this was relatively easy to implement.
We also took this opportunity to improve page structure and accessibility. As far as we are concerned, the philosophy that you have to create a separate experience for accessibility is a fallacy; we believe you can write accessible markup without hurting performance. A key improvement in the new design is simply creating better document structure using <h1>, <h2>, and <h3>, which enables screenreaders to navigate the page more easily. We've also added some better keyboard interactions, such as making sure the first tab key press takes you directly to the search box instead of hitting navigation links, and enabling CTRL-SHIFT DOWN to jump past the header and sidebar and put focus on the first web result.
Data URI Images
The new design incorporates several subtle, repeating gradients, which look great but can be absolute performance killers. To help alleviate this problem, we took advantage of an obscure feature supported by all modern browsers called Data URI images. This technique enables you to embed the encoded data for individual images right into your CSS. The technique has been around for a while, but it's only recently become widely supported enough to use in production.
Data URI images enabled us to avoid the extra sprite weight associated with repeating gradients, while at the same time improving perceived performance by avoiding the "pop-in" effect that you sometimes see with template images. In a traditional CSS file that refers to external images, the browser loads the CSS, parses the CSS, and starts rendering the page. Any image references in the CSS spawn a new HTTP request. Depending on your connection speed, the page might have already rendered by the time the image returns, which causes the image to appear like it suddenly popped in to the page. Data URI images helped us eliminate the pop-in effect entirely and significantly reduce the number of HTTP requests.
To maintain backwards compatibility, we provided a separate gradient-only sprite for IE6 and IE7. This means that those browsers encounter slightly worse performance than more modern browsers, but the net effect is still an overall win. Of course, managing a split code base is a little risky. Many sites prefer to do this at runtime, using conditional comments or other techniques. In our case, the overall difference is actually pretty small — our build tools push the right static resources to our CDN, and our frontend does browser sniffing and swaps in the right CSS file.
Semantic Page Flushing
Rather than waiting until the server generates the entire page and then sending everything at once, we send the page to the client in three semantically meaningful chunks, which enables the browser to start rendering the page and requesting static resources more quickly.
- The first chunk includes the page header and search box, and is sent before we even request search results from the backend. This enables the browser to begin downloading static resources while our server is still processing the search request.
- The second chunk includes all the visible page content and ads, but no JavaScript. This enables the user to instantly begin scanning and interacting with their search results before the browser downloads and executes any Javascript code.
- The final chunk includes the JavaScript that adds rich but non-critical functionality like Search Assist and Search Pad.
The net effect is that the user sees the page loading and can begin interacting with it much sooner. By sending the browser the info it needs to download static components as early as possible, we also reduce overall round-trip time.
Note that the old design also used semantic page flushing, but not quite as aggressively. The key difference is that in the previous design, the first chunk only included markup up to the <head>. By refactoring how our backend logic works, we were able to push the chunk down into the <body> and include key visual components such as the page header and search box. By getting the visual markup started down the wire, this creates the perception that the page is loading, that "something is happening."
Lazy Loading
The core JavaScript and CSS used on the SRP now loads in two distinct chunks. The first chunk includes only the bare minimum CSS and JavaScript required to render 100% of search result pageviews, so that the base experience loads as quickly as possible. The second chunk includes additional (but heavy) functionality such as Search Assist and Search Pad. We also load additional chunks of CSS and JavaScript for shortcuts and other dynamic features only as necessary, ensuring that we never waste time loading code that we're unlikely to need.
As a search site, we can get away with heavy use of lazy-loading because the search experience is such a fundamental user experience on the web. When a user requests a search page, they typically scan and click very quickly. As long as we make that core experience as fast as possible, we can defer other components for later. If your site has some other usage paradigm, you have to be more careful; you can't lazy load components that the user wants to interact with right away.
Designers and Engineers Agree: Performance First!
Beyond the technical considerations listed above, perhaps the most important factor was our philosophy that performance is everybody's problem, at all stages. Our frontend engineering team started thinking about and planning for performance even before the designs were past the rough draft stage. This enabled us to give early feedback to our User Experience Design (UED) department and work closely with them as they refined their designs.
Our designers had already taken into account many performance concepts they had learned over the years, such as sprite optimization. However, the new design uses gradients far more heavily than the previous design, which can get expensive — particularly if you stretch the gradients vertically or horizontally across the page.
Fortunately, our designers brought these considerations to us early, and we were able to brainstorm with them about how to use graphical components more efficiently. Once our designers understood some of the techniques we wanted to use and some of the limitations we had, they were able to knock out some absolutely gorgeous designs that still fit within our performance constraints. After those initial meetings, at each stage in the design our UED team would ping the performance engineers to ensure that we stayed within our performance targets. This close collaboration helped keep us from having to be reactive about performance.
In other words, new tricks and performance techniques only get you so far. Thanks to countless hours of hard work by individual designers and engineers, the new Yahoo! Search Page delivers far more functionality and design components in an even faster package. And we're still working hard with our designer colleagues to make the search experience even more fast and engaging over the coming weeks and months. If you have any questions about Yahoo! Search and frontend performance, we welcome your feedback!
Ryan Grove
Yahoo! Frontend Engineer
Stoyan Stefanov
Yahoo! Performance Engineer
Venkateswaran Udayasankar
Yahoo! Performance Engineer
Posted at 10:57 AM | Comments (14) | Permalink
April 29, 2009
YSlow Release
One of the most frequent complaints I hear about YSlow grades is, "Some of these performance rules do not apply for my website." Most web developers want to evaluate site performance according to their own specific design and content criteria. For example: Not all sites use content delivery networks (CDNs). With the latest release of YSlow, it becomes really easy for web developers to configure their own rule sets and get relevant grades for their pages. We've also incorporated nine new rules, in addition to the previous thirteen. The ability to create your own rule set for performance testing is a next step towards opening up YSlow for developers to create and share their own rules.
Improving page performance sometimes involves reducing page weight. Images are one of the biggest contributors to page weight for most sites. Our performance rules talk about optimizing images to improve performance. Most often, in-depth knowledge of tools like Photoshop and other design tools is required to publish highly optimized images. We've removed the pain of optimization in this version of YSlow by integrating with Smush.it. Smush.it finds all images on your web page and applies the right techniques to optimize them without visual quality loss. Developers can also download smushed images in a zip format.
The new look is designed to encourage developers to evaluate their webpage performance more closely and pay attention to small details, like making sure they've used small favicons that are cacheable. Our goal is a better, faster web experience for all.
If you have any questions or feedback, we encourage you to join the conversation on the Yahoo! Exceptional Performance group. We look forward to your continued interest and enthusiasm – stay tuned, there's still a lot more to come.
Pramod Khincha
Exceptional Performance
Note: You can download this new YSlow release for Firefox here. Thanks to cancel bubble (below) for providing the reminder and the link.
Posted at 8:01 AM | Comments (28) | Permalink
December 28, 2008
YSlow 2.0 early preview in China
Earlier this month, I had the pleasure of talking about the next iteration of Yahoo's performance tool YSlow at a conference organized by CSDN in Beijing. While YSlow 2.0 is still under development, it was a great opportunity to share the excitement about the upcoming release and also talk to people who are actually using the current version in their daily development life. We wanted to get a sense of whether we're headed in the right direction.
CSDN stands for China Software Developers Network: a vibrant online community with over 3 million members who create about a million forum posts and 50,000 technology articles, every month. The network runs on an in-house community platform allowing members to join discussions and forums, run blogs, chat, get personal hosting, personalized search and recommendations. The community recognizes and honors contributions through a rating system that rewards the best content with greater visibility. In addition to the online community, CSDN has a book publishing house, prints China's authoritative IT technology magazine Programmer, and provides training and talent recruiting services.

In addition to the YSlow talk, I also gave one about JavaScript, you can check out the slides on Slideshare:
Needless to say it was a great experience to meet and talk to the Chinese developers and answer their challenging questions about YSlow and OOJS. And then again, how can you not like a conference that opens in the spirit of the 2008 Beijing Olympics - with cheerleaders!
Stoyan Stefanov
Performance guy / YSlow 2.0 architect
Posted at 8:04 AM | Comments (6) | Permalink
September 30, 2008
Smushit.com - optimizing images has just become really easy
Nicole Sullivan and Stoyan Stefanov are dedicated to making the web a faster place. As integral parts of the Exceptional Performance Team they already shared a lot of crucial information of how to make your web sites faster.
One thing they've been pondering a lot about lately is image optimization for file size. Image editing tools come with all kind of great ways to optimize images for visual quality and file size, but when you look at the image in a text or hex editor you'll find that there is a lot of extra information in the file, for example the name of the editing suite, dates when the picture was created and lots more.
There are a lot of tools that remove this information safely and get the most out of the images without having an effect on their visual quality. The catch is that there are a lot of tools for a lot of image formats, all of them on the command line.
So Nicole and Stefan took their research findings, fired up their code editors and built a web app that does all the optimization for you:
Smushit.com allows you to upload some files or give it a URL. The tool then takes the images, optimizes them and tells you how many bytes you can save. You then get a zip of all the images for download and can replace them on your site.
Here's a video of Stoyan and Nicole presenting Smushit.com at The Ajax Experience in Boston (sorry about the audio):
Chris Heilmann
Yahoo Developer Network
Posted at 8:25 AM | Comments (18) | Permalink
July 10, 2008
Twitter, SearchMonkey, and Caching
Intrepid coder Bart Teeuwisse has written up an excellent technical account of creating "Tweet", a beautifully designed SearchMonkey app for Twitter. From a performance standpoint, writing a Twitter SearchMonkey app is particularly challenging, as Bart explains:
It turns out that execution speed of a SearchMonkey is key. To make the SearchMonkey Gallery a presentation monkey such as Tweet has to complete within a fraction of a second. Any call to fetch 3rd party takes too long to satisfy this requirement. Certainly calling Twitter's API whose fluctuating response times are all over the map.
Secondly, Twitter's profile API call takes a user ID, which first has to be extracted from Yahoo!'s indexed data. An additional data SearchMonkey can do that and whose output is the input to Tweet's profile feching data monkey. However, this chaining of data monkeys makes Tweet only slower.
Fortunately, Bart hit on a really clever solution: a mashup with Google App Engine, which acts as a simple proxy cache for Twitter data, which SearchMonkey can then consume. The result (after also adding Bart's own FriendNet infobar app):

Not only is the caching a nifty way to smooth out the API response times, but it also helps reduce the number of (rate-limited) API calls required. Read more about it at Bart's place.
Posted at 11:17 AM | Comments (0) | Permalink
June 30, 2008
So many performance geeks all in one place!
O’Reilly’s Velocity Con, of course.

Kai Hansen, Tony Ralph, Eric Goldsmith, and Artur Bergman during, This is Your Page with Ads, a panel moderated by Steve Souders.
It turns out I’m not the only person who thinks micro-optimization of CSS files is cool. I learned this lesson a year ago when I joined the Exceptional Performance team at Yahoo! and had it reinforced by the quality of both the presentations and the hallway conversations at the O’Reilly Velocity Conference last week.
Attending Velocity Con was fabulous. I was especially impressed that the sessions on web performance were packed. There were a ton of Yahoos at the conference, Julien Lecomte from Yahoo! Search spoke about “High-performance Ajax Applications”.
“In the past few years, Ajax has become very popular because it has enabled developers to build more complex web applications. However, in the rush to push the browser to new limits, we have created a monster. “ – Julien
Julien suggested several detailed strategies and patterns that developers can use to accelerate their applications. Stoyan Stefanov, the lead developer of YSlow, and my colleague in the Exceptional Performance team, spoke about Image Optimization, including the 7 mistakes most sites are making. He showed non-designers how to automate image optimization and reduce image bloat by as much as 30%. After attending the talk, Douglas Crockford shared some love.
“It is good to be able to point with pride at something that Yahoo does that is extremely smart. The Exceptional Performance Team is one of the things that makes me proud to be at Yahoo.” – Doug
John Allspaw from Flickr joined a panel about Surviving Success by preparing to be TechCrunched, Dugg, Slashdotted, or even “Oprahed”. He also presented Capacity Management.
“Your process of capacity planning should be adaptive, adjustable, and include more than just system statistics. Measurement, architecture, and economics are all equally important to having your site perform. Becoming popular doesn’t have to mean being afraid your site will fall over from too much load.” – John
Adam Bechtel, the chief architect covering network, storage and systems infrastructure at Yahoo! presented “Performance Plumbing”. He believes that scale provides unique opportunities to leverage the network to improve performance.
“As your site scales, don’t overlook the performance opportunities that the plumbing creates.” – Adam
Tony Ralph who works on ad quality and performance for Yahoo! participated in a panel, This is Your Page with Ads. He made an important point that I hadn’t really thought of before. He indicated that the ad industry and engineers measure performance in very different ways; one via monetization, the other via impact on response time. He emphaiszed how important it is for engineers to understand both points of view, so that we can effectively measure and convey the impact of end user experience on revenue.
Kai Hansen from Google Ireland also mentioned the need to properly advocate this point of view from within our companies so that quality metrics such as keyword relevance and performance are tied to the cost of displaying ads.
I look forward to Velocity Conference 2009. I do hope that it will focus on the front end with more talks about HTML, CSS, JavaScript, and Ajax. These sessions were the most popular of the conference, and front-end performance is still in its infancy. Douglas Crockford expressed it very well.
“By showing the browser makers how web applications actually perform, the browser makers are now able to make effective changes to the platform. As the platform evolves, we will need new rules and new tools. There is still much to do. (Emphasis mine)” – Doug
Exceptional Performance Yahoo!
Posted at 7:06 AM | Comments (2) | Permalink
June 17, 2008
New YSlow with Firefox 3 support
Just in time for the Firefox 3 Download Day (today, June 17th), last night we released a new version of YSlow that works with Firefox 3. You can install it from the YSlow page or the Mozilla add-ons site.
What's new in this version:
- Firefox 3 and Firebug 1.2 beta support
- improved and simplified check for javascript minification
- different coloring for inline vs. external CSS and JS ("All CSS" and "All JS" features)
- clickable list of resources as a Table of Contents ("All CSS" and "All JS" features)
- improved colors and presentation in the "legend" of component pies under Stats
- fixed a bug where the same hostname with different port number was counted as a separate DNS lookup
- misc bugfixes and style tweaks
In this version, as with the previous one, we aimed at supporting all possible combinations of the different Firefox and Firebug development branches, namely the latest Firefox 2 and 3 and the latest Firebug releases: 1.05 (stable), 1.1 (beta) and 1.2 (beta).
Many thanks to everybody who sent kind words of encouragement and questions about the availability of this new release, sorry we didn't reply to all of you, but now your wait is over.
As always, your feedback is welcome and appreciated, feel free to use the contact form or join the exceptional-performance mailing list.
Happy download day!
Stoyan Stefanov
Exceptional Performance
Posted at 2:05 PM | Comments (0) | Permalink
April 23, 2008
YSlow 0.9.5b1 Release - Addressing Firefox and Firebug Compatibility
Committed to keeping up with the latest in Firefox and Firebug development, we’re happy to announce that a new version increment of YSlow was released, mainly aiming at addressing compatibility. What’s in this release?
- Firefox 3 beta 5 support
- Support for the latest versions of the different Firebug branches
- Pie chart representation of the components in the Stats tab
- Improved display in the expanded CSS expressions rule report in the Performance tab
- Support for disabled Firebug Net Panel (default behavior in Firebug 1.2)
- Misc fixes for the reports in the Tools section
Looking at the diversity of Firefox/Firebug versions, these are the current available branches for Firefox and Firebug.
Firefox has two active branches:
- Firefox 2 - the current version is 2.0.0.14, this is the stable production version
- Firefox 3 - the latest is 3.0. beta 5
Firebug has 3 active branches:
- Firebug 1 – the latest version being 1.05, this is the stable version
- Firebug 1.1 – the latest is 1.1.0 beta 12
- Firebug 1.2 – the latest 1.2.0 alpha 21
Firebug 1 doesn’t work with Firefox 3, so there are a total of 5 combinations and the YSlow 0.9.5b1 has been successfully tested on all of them:
| Firebug 1 | Firebug 1.1 | Firebug 1.2 | |
| Firefox 2 | Yes | Yes | Yes |
| Firefox 3 | N/A | Yes | Yes |
You can download the tool here, report bugs here, read the performance rules, and participate in the performance mailing list discussions. Also make sure you keep an eye on our performance-related postings on YDN and YUI Blog.
Enjoy,
Yahoo! Exceptional Performance
Posted at 10:37 AM | Comments (10) | TrackBack | Permalink
April 11, 2008
New Rules for Exceptional Performance
Initially 13, then 14, and now 34 performance best practices have been released. As promised, we've updated our pages to include details on each of these new rules. The rules will gradually find their way into YSlow, at least those that are testable. Huge thanks goes out to all those at Yahoo! who helped identify, validate and test the new best practices, and especially to our very own Stoyan Stefanov who put it all together. Stoyan Stefanov is part of the Exceptional Performance team and also the lead developer for YSlow.
We hope you'll find some interesting ideas to help you accelerate the user experience on your pages today. Any comments and feedback appreciated. Let's make the web a better place!
Tenni Theurer
Yahoo! Exceptional Performance
Posted at 4:44 PM | Comments (1) | TrackBack | Permalink
March 17, 2008
Yahoo!'s Latest Performance Breakthroughs
Stoyan Stefanov made an appearance last week at the PHP Quebec Conference in Montreal. His session debuts Yahoo!’s latest research results and performance breakthroughs. He covers the existing 14 rules, plus 20 new rules for faster web pages. We’ve categorized the optimizations into: server, content, cookie, JavaScript, CSS, images, and mobile.
After YSlow "A"?
If your page isn't getting an "A" in YSlow, I recommend that you tackle those recommendations first. However, if you're getting an "A" and looking for more ways to optimize your web pages, here are 20 new recommendations to accelerate the end-user's experience. Stay tuned, you'll be hearing more about YSlow and these rules at Yahoo! Developer Network and Yahoo! User Interface Blog.
| 1. Flush the buffer early | [server] |
| 2. Use GET for AJAX requests | [server] |
| 3. Post-load components | [content] |
| 4. Preload components | [content] |
| 5. Reduce the number of DOM elements | [content] |
| 6. Split components across domains | [content] |
| 7. Minimize the number of iframes | [content] |
| 8. No 404s | [content] |
| 9. Reduce cookie size | [cookie] |
| 10. Use cookie-free domains for components | [cookie] |
| 11. Minimize DOM access | [javascript] |
| 12. Develop smart event handlers | [javascript] |
| 13. Choose <link> over @import | [css] |
| 14. Avoid filters | [css] |
| 15. Optimize images | [images] |
| 16. Optimize CSS sprites | [images] |
| 17. Don't scale images in HTML | [images] |
| 18. Make favicon.ico small and cacheable | [images] |
| 19. Keep components under 25K | [mobile] |
| 20. Pack components into a multipart document | [mobile] |
Many thanks to all the developers at Yahoo! that have directly or indirectly contributed to this list - you know who you are (see credits at the end of Stoyan's presentation). We share our findings so that others can join us in accelerating the user experience on the web.
Tenni Theurer
Yahoo! Exceptional Performance
Posted at 4:02 PM | Comments (19) | TrackBack | Permalink
February 14, 2008
YSlow 0.9.3 Release with Firefox 3 Support
YSlow, the performance lint tool created by the Yahoo! Exceptional Performance team was updated to version 0.9.3. today. This minor version increment contains:
,
- Firefox 3 support (up to and including 3.0b4pre)
- Sortable table of page components in the Components tab
- Beacons (1x1 images that are not part of the DOM) excluded from the overall score. If you want them back in the score, use
about:configto set the optionextensions.firebug.yslow.excludeBeaconsFromLinttofalse - Minor bug fixes that caused YSlow to freeze as it peels off the page components (encountering streams, applets, empty URLs)
In the spirit of openness and our commitment to a faster experience on the web, we hope you join us in accelerating the user experience.
You can download the tool here, report bugs here, read the performance rules, and participate in the performance mailing list discussions. Also make sure you keep an eye on our performance-related postings on YDN and YUI Blog.
Posted at 5:40 PM | Comments (25) | Permalink
February 4, 2008
Candidates graded on technical savvy, site performance
Performance is important to users. It influences click through rates, loyalty, and engagement. Users want fast websites that they can view from anywhere, including phones, wireless connections, laptops, and home computers. Users also want a graphically rich user experience with all the bells and whistles. Multimedia blends of video, images, feeds, and other components can be very slow indeed. Here at Yahoo! we spend a lot of time analyzing and improving the performance of our own sites. In honor of Super Tuesday, we thought it would be fun to take a poke at the Presidential Candidates web sites and share with you what we found.
How did they do? Overall, atrociously, all the candidates failed the YSlow exam except Mike Gravel who earned a "D". Page weight was a problem for Barack Obama, whose site weighed in at almost 700Kb. It was even worse for Mitt Romney, whose site weighed a whopping 1,531Kb. I hope he doesn't have supporters trying to make contributions on dialup modems!
Democrats got better grades in almost all performance subjects tested, in particular response times and page weight. They improved user experience for returning visitors by setting an Expires headers and improving the full cache user experience. This helped propel them to a performance GPA of "C" despite their failing YSlow grade. Republicans never managed to overcome the deficit and finished the semester with an "F".

Figure 1: Democrats versus Republicans Performance Report Card
Performance is one component in a balance of competing goals. These sites are trying to solicit support, donations, and volunteers. They need to provide a rich user experience that keeps people coming back, and engages them with the candidates' progress. Ultimately, they want to do this in a way that is fast, and accessible to as many voters as possible; including those on mobile phones, dialup modems, or low broadband.
Democrats
Mike Gravel and Hillary Clinton's websites had the two best response times tested. Barack Obama's website came in fourth, after the leading Republican. Response time is all about getting the biggest user-experience bang for your buck. One reason Obama's site might be slower, is the amount of below-the-fold content. Many voters may not see this extra content, we human beings don't seem to like to scroll, but it still impacts performance.
Clinton, Obama, and Huckabee have graphically rich sites, and yet they are among the fastest. Do these sites succeed in engaging voters, or do they prefer the more serious, austere look of Gravel's site? The idea that a candidates website has a serious impact on his or her chances of being elected is relatively new. The rules and strategies are being invented now; we're living history.
Figure 2: Below the fold content might not be visible to voters, but it does affect response times
Figure 3: Democrats Response Time Report Card.
This graph shows all of the candidates' grades plotted together. Reaching the outside band means the candidate got an A, for instance Obama and Gravel received As in Image Optimization, while Clinton squeaked into a high-B. The closer the candidate got to reaching the edge of the ocatgon, the better their grade. In fact each band is equivalent to one letter grade. From roughly the halfway point to the very center are variations on a failing grade.
Clinton and Gravel provided two of the best performance-based user experiences recorded. However, Clinton had a large number of HTTP requests, which can slow down a site significantly. She counteracted that by setting an Expires headers so that returning voters would not have to pay the same performance penalty. Clinton also split static content across more than one domain to enable parallel downloads.
Republicans
Examining Republicans performance offers a clear opportunity to witness the connection between response time and page weight. To deliver a fast site, hard choices have to be made about which features to include, and which to abandon.
Mike Huckabee's website combined low page weight and fewer http requests to achieve the best response time among Republicans and the third fastest response time overall. He could however trim 20Kb of fat from images with no loss of quality. We tested using lossless compression algorithms to determine how much extra baggage the candidates' pages were carrying. Romney was the real surprise in this category. The extra fat in his home page weighed more than Mike Huckabees entire page!
Image Optimization is the kind of low hanging fruit that makes your site faster with absolutely no loss for the user. The tool we built to test image formats and compression algorithms determined that John McCain was the master of image optimization; we were only able to remove 1Kb from his images.
Mitt Romneys site takes almost eight seconds to load even on a broadband connection. He got so many "F"s, our radar chart looks like Pollock on a bad day.
Figure 4: Republicans Report Card
Every one of these sites had outlying data points, that is, random response times of as much as 19 seconds. Romney even had a data point at 1.825 seconds, despite his more typical 5-10 second load times. These bad user experiences are real, and while most users just press reload and forget about it, it is important that we correct what we can, before they get frustrated and simply don't come back.
Comparing worst-case scenarios, medians, averages, YSlow scores, empty or primed cache experiences, and other measurements can help you get a fuller picture of your user experience. Rather than looking for one magic number, performance requires us to unravel a nuanced puzzle. Try to understand what your users see, and how you can make it better for them, whether they are voters, customers, or people coming to read your blog.
For those of you who like numbers, here's the data:
| Hillary Clinton | Barack Obama | Mike Gravel | Mike Huckabee | John McCain | Ron Paul | Mitt Romney | |
| YSlow | 59 | 52 | 66 | 50 | 42 | 36 | 33 |
| Page Weight (K) | 300 | 691 | 195 | 185 | 568 | 483 | 1531 |
| Response Time (S) | 2.6077 | 3.4810 | 2.3733 | 3.1148 | 6.5309 | 4.2125 | 7.5352 |
| HTTP Requests | 96 | 94 | 32 | 56 | 107 | 83 | 77 |
| Cookie Weight | 214 | 333 | 281 | 217 | 299 | 203 | 130 |
| Primed Cache - Page Weight (Kb) | 33 | 242 | 8 | 46 | 237 | 173 | 448 |
| Primed Cache - HTTP Requests | 4 | 93 | 4 | 56 | 107 | 76 | 77 |
| Wasted Image Weight (Kb) | 29 | 6 | 20 | 20 | 1 | 49 | 205 |
All measurements were taken using a MacBook Pro with Firefox, Firebug, and YSlow over a wireless connection. Page weights, cookie weights, and HTTP requests were determined via the YSlow Stats panel.
Should you choose the next president based on their YSlow score? Probably not, but, it is one of many interesting ways of evaluating how technically savvy they are in an increasingly technical world. Now don't get me wrong, I know the candidates didn't write their own HTML or optimize their own images, but they did choose the person who would do this work for them. That's what makes this game interesting; they have to choose the right people for the right jobs every day. Happy voting!
Nicole Sullivan, Technical Evangelist
Yahoo!'s Exceptional Performance
Posted at 9:09 PM | Comments (6) | Permalink
January 7, 2008
The 7 Habits for Exceptional Performance
In July 2007 I took over the reins from Steve Souders (my former boss, performance co-hort, and someone I greatly respect) as manager of Yahoo!’s Exceptional Performance team. I was humbled and excited about the opportunity to lead Yahoo!’s now worldwide effort on accelerating the user experience and making our products faster, better, and more efficient.
Improvements in web site performance are similar to improvements in energy or fuel efficiency. We make good progress yet we continue to consume more, which reverse the results of our improvements. The net effect is that optimizing performance is an on-going battle. To ring in the New Year, the Exceptional Performance team would like to share our 7 Habits for Exceptional Performance:
1. LOFNO – Look out for number one, that is, your users. Be an advocate for your users. You do control the user experience, so don’t settle for excuses and don’t make excuses. A lot of people shift the blame towards things they don’t control. The truth is that even if it’s slow ads or the framework that’s slowing down your site, chances are there are still things you can do personally to optimize performance for your users. Has every image been optimized? Have you evaluated whether users really use that feature you pushed so hard for? Did you run YSlow? Have you set the right tone and leadership so that others know performance is a top priority for your product? Focus on what you can do, not what you can’t do. Leave no stone unturned.
2. Harvest the low hanging fruit – Find the optimizations that give you the biggest bang for your buck. If your web site has many pages, prioritize the pages. Look first at pages with higher traffic since those are the ones your users visit most. Identify strategic pages, ones that are important for the business. Create a list of performance optimizations and then prioritize that list starting with what will improve performance most. Then prioritize the same list again based on how much effort is required. Remember that removing just one image can often improve the user’s perceived response time by as much as an entire rewrite of the backend. Implement the Rules for High Performance Web Sites (aka YSlow Rules). These rules were identified at Yahoo! as the low hanging fruit for making web sites faster without compromising design or features.
3. Balance features with speed – Exceptional performance is a cross-team discipline. Our performance golden rule tells us that 80-90% of the time a user waits for a page to load is spent on the front-end. This makes the decision about what goes into the product (design, features, etc.) a major chunk of the time a user spends waiting for the components (images, JavaScript, CSS, etc.) to come down the wire. Think Yin and Yang, a constant flux of alternating forces. Designers add visual appealing elements. Product managers add functionally rich features. Engineers add flexible frameworks. All this equates to more time a user waits for your page to load. Remove images, eliminate features, compress components – all that equates to less time a user waits. Faster response time reduces site abandonment and increases usability. Less abandonment and better usability increases page views. And hey, you’ll also have a happier, less frustrated user.
4. Start early and make performance part of the process – Don’t wait until right before your product is about to be launched to discover that your product performs badly. By then, it’ll be too late. Incorporate performance into the product roadmap at design time and requirements gathering. Make performance part of the process early in the development cycle. Run performance tests at every major milestone. Every feature has a performance cost associated with it. Develop a test methodology and measure that cost. If your website requires a login, profile your most-valued users and create test accounts with the features you anticipate them to use. If your most-valued users are on dialup or broadband bandwidth speeds, make sure you run performance tests over these types of bandwidth speeds.
5. Quantify and track results – Let’s face it, we all want recognition for good work. There are lots of things we can do to improve the user’s experience. It’s more rewarding when we can quantify those optimizations. Have a portfolio of tools. Quantify performance so that it matches the experience of your users. Understand the differences between the various methodologies and tools your organization uses. If you don’t see an improvement after implementing an optimization, it could be a bad measurement methodology. There are many tools out there and different tools can show you different results. Make sure you are comparing apples to apples. Each tool has its differences, but together they can provide you a complete picture of how your product performs.
6. Set targets – Once you’ve established a methodology to quantify results, set and agree upon a target. Look at your competitors to help you determine a target. Better yet, look at the performance of pages where your users came from. From a quantitative perspective, two pages might take the same amount of time to load but qualitative research has shown us that users’ perception can vary depending on the performance of pages that load right before. Aim high and set a winning target for you, your team, and more importantly, your users.
7. Ask questions and challenge answers – Even smart people make assumptions or repeat incorrect statements. The best thing you can do is ask lots of questions, challenge answers, and if you have time verify the answers yourself. There’s no such thing as a bad question, but there are bad answers. Ask questions that give you the high-level overview. Ask questions that allow you to probe beneath the surface. Where did the information come from? How old is the data? What method was used to obtain the data? What alternative methods were considered and why weren’t they chosen? What assumptions were made? What were the drawbacks to an approach? If there was more time, what else might you have tried? Ask questions before hastily drawing a conclusion.
8. (Bonus) Run YSlow – YSlow analyzes web pages and tells you why they’re slow. Download today and run YSlow on all the pages you visit!
Happy Optimizing and Happy New Year!
[Tenni Theurer is a Product Optimization Manager and manages Yahoo!’s Exceptional Performance team. Tenni has spoken at several conferences including Web 2.0 Exp, The Ajax Experience, The Rich Web Experience, AJAXWorld, BlogHer, and CSDN-DrDobbs. She also blogs regularly on Yahoo! Developer Network and Yahoo! User Interface Blog.]
Posted at 2:01 PM | Comments (7) | TrackBack | Permalink
December 11, 2007
Performance Draws a Crowd in Beijing
Last week Tenni Theurer, manager of Yahoo!'s Exceptional Performance group and my main performance co-hort, returned from her appearance at the CSDN-Dr.Dobbs Software Developer 2.0 Conference in Beijing, China. This was a big conference, perhaps the biggest software conference ever in China. I was psyched when Tenni told me her talk drew a crowd and was one of the best talks of the conference! CSDN's SD2.0 web site says, "Based on our SD conference survey result, Tenni Theurer’s session ranked as one of the top 3 sessions and was also selected by our editors as the most popular speaker." It's great to see interest in fast web pages has spread worldwide. Upcoming performance performances include the WebGuild Web 2.0 Conference in Santa Clara on January 29 and Velocity, the web performance conference from O'Reilly on June 23-24 near San Francisco.
Steve Souders
Chief Performance Yahoo!
Posted at 4:28 PM | Comments (0) | Permalink
December 5, 2007
YSlow 0.9 Release - Better Support for Web 2.0
We're excited to announce the release of YSlow 0.9, Yahoo!'s web page performance analysis tool. There are two big features in this release. By integrating more tightly with Firebug's Net Panel, YSlow now finds non-DOM components such as Ajax requests and image beacons. And YSlow now crawls frames and iframes and analyzes those resources as well. There are several other new features and bug fixes described in the release notes including highlighting 404s, better detection of CSS expressions and JavaScript minification, and searching within the YSlow panel.
These features make YSlow stronger at identifying performance improvements for Web 2.0 applications. It's great that YSlow does even better performance analysis of pages, but be forewarned that your previous YSlow scores will drop if these new-found components exhibit bad performance characteristics. As mentioned in Rule 14 - Make Ajax Cacheable, some of the performance improvements that are readily applied to static content (far future Expires header, gzip compression, minification) can also be applied to Ajax responses. Whether it's Web 1.0 or Web 2.0, YSlow 0.9 helps you figure out what to fix to make your pages faster for your users.
Steve Souders
Stoyan Stefanov
Posted at 8:19 AM | Comments (7) | Permalink
November 26, 2007
Virgin America tunes up with YSlow
I get feedback daily on people using YSlow. The emails are all positive, even the ones that report bugs. The best emails describe how YSlow helped a company make their web pages load faster. It was fun to read this article on The Register today: Virgin America tunes up with YSlow. If anyone else has YSlow experiences they wish to share, please post them on the Exceptional Performance Yahoo! group or send them to YSlow feedback.
Steve Souders
Chief Performance Yahoo! and creator of YSlow
Posted at 10:42 AM | Comments (0) | Permalink
November 15, 2007
Velocity Web Performance and Operations Conference
O'Reilly just announced Velocity, their first conference focused on web performance and operations. It's scheduled for June 23-24, 2008 at the San Francisco Airport Marriott in Burlingame. I'm proud to say that Jesse Robbins and I are co-chairing Velocity.
The idea for this conference came after a late night dinner in Seattle with several performance gurus including John Jenkins (Amazon), John Rauser (Farecast), and Nate Moch (Zillow). We had spent the night exchanging ops war stories and performance insights. I hated seeing the night come to an end and promised to find a venue for us and others to gather and share best practices. The idea of learning from other experts and sharing our lessons learned to help others avoid the pitfalls we had already discovered was exciting. JJ, Nate, and I, along with Jesse Robbins and Artur Bergman, met with Tim O'Reilly and Brady Forrest at OSCON and got the ball rolling.
If you're the person your company turns to to keep the web site running, you'll want to make this conference. More importantly, if you're someone who wants to learn from these industry leaders to find out how to make your site fast, scalable, and always available block your calendar for June 23-24. To help make sure we have the most relevant topics and speakers we've gathered an incredible program committee: Artur Bergman (O'Reilly Radar & Wikia), Cal Henderson (Flickr & author of Building Scalable Web Sites), Jon Jenkins (Amazon), and Eric Schurman (Live Search).
Velocity's Call for Participation is now open. We're looking for proposals in the areas of scalability, networking, Ajax performance, database performance, capacity planning, monitoring, and more. See the CFP for the full list. Proposals will be accepted until January 3, 2008.
Registration opens in March 2008. Until then, stay in touch using the official RSS feed. You can also join the Facebook group and Upcoming event. Please use velocity08 when tagging. I hope to see you in June.
Steve Souders
Chief Performance Yahoo!
Posted at 11:26 AM | Comments (3) | Permalink
October 26, 2007
Web Site Optimization: a Practical Perspective
Stoyan Stefanov just published an article entitled Web Site Optimization: 13 Simple Steps. People have commented that Yahoo!'s Performance Rules are geared towards large web sites (like Yahoo!). Stoyan's article approaches the best practices from a different angle.
This tutorial takes a practical, example-based approach to implementing those rules. It's targeted towards web developers with a small budget, who are most likely using shared hosting, and working under the various restrictions that come with such a setup.
He also brings up some points that aren't mentioned in Yahoo's best practices, such as:
- increasing parallel downloads by splitting resources across multiple domains
- hosting static resources on a domain that's free of cookies
- configuring compression in different environments
- code examples for preloading resources
- tools for minifying CSS
A quick aside about Stoyan: Stoyan blogs on phpied.com and has co-authored several books including PHP Programming with PEAR and Building Online Communities with phpBB 2. Although he started this article awhile ago, when we saw his work on performance and optimization we asked Stoyan to join the Exceptional Performance team. Now he's a speedfreak with the rest of us! Watch for more news from Stoyan in the future.
Steve Souders
Posted at 9:16 AM | Comments (0) | Permalink
October 4, 2007
YSlow 0.8 Patches Firebug's Net Panel
At Future of Web Apps in London I announced the release of YSlow 0.8. This update includes a few enhancements, but the biggest change is a patch to Firebug's Net Panel. I discovered that resources (scripts, stylesheets, images) read from the browsers cache (with no HTTP traffic) still show up in Net Panel. This has caused confusion when people thought their cacheable components were not actually being cached by the browser. I talked to Joe Hewitt and settled on a fix that comes with this version of YSlow. The full details are found in the article Bug (fix) in Firebug's Net Panel. Enjoy and send your feedback.
Steve Souders
Chief Performance Yahoo!
Posted at 3:47 PM | Comments (3) | Permalink
September 26, 2007
High Performance Web Sites: Rule 14 - Make Ajax Cacheable
People ask whether these performance rules apply to Web 2.0 applications. They definitely do! This rule is the first rule that resulted from working with Web 2.0 applications at Yahoo!.
One of the cited benefits of Ajax is that it provides instantaneous feedback to the user because it requests information asynchronously from the backend web server. However, using Ajax is no guarantee that the user won't be twiddling his thumbs waiting for those asynchronous JavaScript and XML responses to return. In many applications, whether or not the user is kept waiting depends on how Ajax is used. For example, in a web-based email client the user will be kept waiting for the results of an Ajax request to find all the email messages that match their search criteria. It's important to remember that "asynchronous" does not imply "instantaneous".
To improve performance, it's important to optimize these Ajax responses. The most important way to improve the performance of Ajax is to make the responses cacheable, as discussed in Rule 3: Add an Expires Header. Some of the other rules also apply to Ajax:
- Rule 4: Gzip Components
- Rule 9: Reduce DNS Lookups
- Rule 10: Minify JavaScript
- Rule 11: Avoid Redirects
- Rule 13: Configure ETags
However, Rule 3 is the most important for speeding up the user experience. Let's look at an example. A Web 2.0 email client might use Ajax to download the user's address book for autocompletion. If the user hasn't modified her address book since the last time she used the email web app, the previous address book response could be read from cache if that Ajax response was made cacheable with a future Expires header. The browser must be informed when to use a previously cached address book response versus requesting a new one. This could be done by adding a timestamp to the address book Ajax URL indicating the last time the user modified her address book, for example, &t=1190241612. If the address book hasn't been modified since the last download, the timestamp will be the same and the address book will be read from the browser's cache eliminating an extra HTTP roundtrip. If the user has modified her address book, the timestamp ensures the new URL doesn't match the cached response, and the browser will request the updated address book entries.
Even though your Ajax responses are created dynamically, and might only be applicable to a single user, they can still be cached. Doing so will make your Web 2.0 apps faster.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 8:51 AM | Comments (14) | Permalink
August 22, 2007
YSlow and Knowing What Matters
Over on Coding Horror, Jeff Atwood writes YSlow: Yahoo's Problems Are Not Your Problems and makes some very good points about not taking YSlow results too literally--at least not without thinking about what you're doing.
But before you run off and implement all of Yahoo's solid advice, consider the audience. These are rules from Yahoo, which according to Alexa is one of the top three web properties in the world. And Rich's company, Topix, is no slouch either-- they're in the top 2,000. It's only natural that Rich would be keenly interested in Yahoo's advice on how to scale a website to millions of unique users per day.
That's good advice when it comes to following any set of recommendations. YSlow was designed for Yahoo's goals and will likely become more general over time. Take it's advice with a grain of salt, just like you should anyone's advice.
The comments on that post contain some useful nuggets as well, including some discussion from Steve Souders and the YSlow creators.
And, if you haven't already seen it, check out our Introducing YSlow screencast which was posted about here.
Jeremy Zawodny
Yahoo! Developer Network
Posted at 7:36 AM | Comments (3) | Permalink
August 15, 2007
YSlow and Web Two Point Slow
It's always fun to see what people will do with the tools we release. Sometimes they use them in "interesting" ways. Or, in this case, they're used to come to an amusing conclusion. That's exactly what you'll find in Web 2 Point Slow - Slowcial Communities:
These descriptive statistical data show that successful Web 2.0 communities are pretty slow. This is not necessarily a problem of slow Web servers or Internet connections, but of the amount of data, the number of HTTP requests, too much JavaScript, Flash, images and other media, HTML structure and the time it takes for the browser to render the pages.
Here's the chart:

Luckily, we're not in that slow group. As Ramiro notes "4 sites stand out: yahoo.com and craiglist.org with a grade higher than 90..."
Excellent. :-)
I'm sure this isn't the first time someone will use YSlow to rank some of their favorite sites. Maybe some of those sites will implement of few of our Exceptional Performance Best Practices to speed things up. :-)
Have you seen any particularly surprising results from YSlow so far? Let us know.
Jeremy Zawodny
Yahoo! Developer Network
Posted at 4:04 PM | Comments (3) | Permalink
August 7, 2007
YSlow Podcast Interview and Screencast Demo
Editor's note: Unfortunately, the original video and audio files have gone missing. Please check out the YSlow 2.0 screencast from April 2009 for an introduction to the most recent release.
During the week of the YSlow release, Dan Theurer and I sat down with Steve Souders (Chief Performance Yahoo) to discuss web site performance and YSlow.
The result of that conversation (and some Camtasia learning on my part) is an audio interview and a video demo for your listening and viewing pleasure.
The 8:51 audio recording (8MB MP3) captures the background discussion, including the need for YSlow, how it came to be, performance best practices, FireBug integration, and so on.
The 8:22 video screencast is a continuation of the discussion where we run YSlow against www.yahoo.com to get an idea of how YSlow works. You can jump right in and watch the video without listening to the podcast, but you'll miss a few references from earlier in the discussion.
We also "filmed" two other demos: one using my blog (it gets a "D") and another using the YDN web site (it doesn't score well either). Look for those to appear soon.
Enjoy...
Jeremy Zawodny
Yahoo! Developer Network
Posted at 10:57 AM | Comments (8) | Permalink
July 24, 2007
YSlow Release on YDN
Yahoo! has released YSlow, their web performance tool, on YDN under an open source license. Steve Souders, Yahoo!'s Chief Performance Yahoo!, made the announcement during his session at OSCon.
YSlow measures web page performance based on the best practices evangelized by Yahoo!'s Exceptional Performance team. Since many of these best practices focus on the frontend, YSlow is integrated with Joe Hewitt's Firebug, the web development tool of choice for frontend developers.
YSlow has three main views: Performance, Stats, and Components. Performance view scores the page against each performance rule, generates an overall YSlow grade for the page, and lists specific recommendations for making the page faster. Stats view summarizes the total page weight, cookie size, and HTTP request count. Components view lists each component (image, stylesheet, script, Flash object, etc.) in the page along with HTTP information relevant to page load times. It also contains several tools including JSLint. Try it out!
Posted at 4:31 PM | Comments (11) | Permalink
July 23, 2007
High Performance Web Sites: Rule 13 – Configure ETags
Entity tags (ETags) are a mechanism that web servers and browsers use to determine whether the component in the browser's cache matches the one on the origin server. (An "entity" is another word for what I've been calling a "component": images, scripts, stylesheets, etc.) ETags were added to provide a mechanism for validating entities that is more flexible than the last-modified date. An ETag is a string that uniquely identifies a specific version of a component. The only format constraints are that the string be quoted. The origin server specifies the component's ETag using the ETag response header.
HTTP/1.1 200 OK Last-Modified: Tue, 12 Dec 2006 03:03:59 GMT ETag: "10c24bc-4ab-457e1c1f" Content-Length: 12195
Later, if the browser has to validate a component, it uses the If-None-Match header to pass the ETag back to the origin server. If the ETags match, a 304 status code is returned reducing the response by 12195 bytes for this example.
GET /i/yahoo.gif HTTP/1.1 Host: us.yimg.com If-Modified-Since: Tue, 12 Dec 2006 03:03:59 GMT If-None-Match: "10c24bc-4ab-457e1c1f" HTTP/1.1 304 Not Modified
The problem with ETags is that they typically are constructed using attributes that make them unique to a specific server hosting a site. ETags won't match when a browser gets the original component from one server and later tries to validate that component on a different server—a situation that is all too common on web sites that use a cluster of servers to handle requests. By default, both Apache and IIS embed data in the ETag that dramatically reduces the odds of the validity test succeeding on web sites with multiple servers.
The ETag format for Apache 1.3 and 2.x is inode-size-timestamp. Although a given file may reside in the same directory across multiple servers, and have the same file size, permissions, timestamp, etc., its inode is different from one server to the next.
IIS 5.0 and 6.0 have a similar issue with ETags. The format for ETags on IIS is Filetimestamp:ChangeNumber. A ChangeNumber is a counter used to track configuration changes to IIS. It's unlikely that the ChangeNumber is the same across all IIS servers behind a web site.
The end result is ETags generated by Apache and IIS for the exact same component won't match from one server to another. If the ETags don't match, the user doesn't receive the small, fast 304 response that ETags were designed for; instead, they'll get a normal 200 response along with all the data for the component. If you host your web site on just one server, this isn't a problem. But if you have multiple servers hosting your web site, and you're using Apache or IIS with the default ETag configuration, your users are getting slower pages, your servers have a higher load, you're consuming greater bandwidth, and proxies aren't caching your content efficiently. Even if your components have a far future Expires header, a conditional GET request is still made whenever the user hits Reload or Refresh.
If you're not taking advantage of the flexible validation model that ETags provide, it's better to just remove the ETag altogether. The Last-Modified header validates based on the component's timestamp. And removing the ETag reduces the size of the HTTP headers in both the response and subsequent requests. This Microsoft Support article describes how to remove ETags. In Apache, this is done by simply adding the following line to your Apache configuration file:
FileETag none
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 4:06 PM | Comments (54) | Permalink
High Performance Web Sites: Rule 12 – Remove Duplicate Scripts
It hurts performance to include the same JavaScript file twice in one page. This isn’t as unusual as you might think. A review of the ten top U.S. web sites shows that two of them contain a duplicated script. Two main factors increase the odds of a script being duplicated in a single web page: team size and number of scripts. When it does happen, duplicate scripts hurt performance by creating unnecessary HTTP requests and wasted JavaScript execution.
Unnecessary HTTP requests happen in Internet Explorer, but not in Firefox. In Internet Explorer, if an external script is included twice and is not cacheable, it generates two HTTP requests during page loading. Even if the script is cacheable, extra HTTP requests occur when the user reloads the page.
In addition to generating wasteful HTTP requests, time is wasted evaluating the script multiple times. This redundant JavaScript execution happens in both Firefox and Internet Explorer, regardless of whether the script is cacheable.
One way to avoid accidentally including the same script twice is to implement a script management module in your templating system. The typical way to include a script is to use the SCRIPT tag in your HTML page.
<script type="text/javascript" src="menu_1.0.17.js"></script>
An alternative in PHP would be to create a function called insertScript.
<?php insertScript("menu.js") ?>
In addition to preventing the same script from being inserted multiple times, this function could handle other issues with scripts, such as dependency checking and adding version numbers to script filenames to support far future Expires headers.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 3:39 PM | Comments (6) | Permalink
High Performance Web Sites: Rule 11 – Avoid Redirects
Redirects are accomplished using the 301 and 302 status codes. Here’s an example of the HTTP headers in a 301 response.
HTTP/1.1 301 Moved Permanently
Location: http://example.com/newuri
Content-Type: text/html
The browser automatically takes the user to the URL specified in the Location field. All the information necessary for a redirect is in the headers. The body of the response is typically empty. Despite their names, neither a 301 nor a 302 response is cached in practice unless additional headers, such as Expires or Cache-Control, indicate it should be. The meta refresh tag and JavaScript are other ways to direct users to a different URL, but if you must do a redirect, the preferred technique is to use the standard 3xx HTTP status codes, primarily to ensure the back button works correctly.
The main thing to remember is that redirects slow down the user experience. Inserting a redirect between the user and the HTML document delays everything in the page since nothing in the page can be rendered and no components can start being downloaded until the HTML document has arrived.
One of the most wasteful redirects happens frequently and web developers are generally not aware of it. It occurs when a trailing slash (/) is missing from a URL that should otherwise have one. For example, going to http://astrology.yahoo.com/astrology results in a 301 response containing a redirect to http://astrology.yahoo.com/astrology/ (notice the added trailing slash). This is fixed in Apache by using Alias or mod_rewrite, or the DirectorySlash directive if you're using Apache handlers.
Connecting an old web site to a new one is another common use for redirects. Others include connecting different parts of a website and directing the user based on certain conditions (type of browser, type of user account, etc.). Using a redirect to connect two web sites is simple and requires little additional coding. Although using redirects in these situations reduces the complexity for developers, it degrades the user experience. Alternatives for this use of redirects include using Alias and mod_rewrite if the two code paths are hosted on the same server. If a domain name change is the cause of using redirects, an alternative is to create a CNAME (a DNS record that creates an alias pointing from one domain name to another) in combination with Alias or mod_rewrite.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 3:10 PM | Comments (15) | Permalink
High Performance Web Sites: Rule 10 – Minify JavaScript
Minification is the practice of removing unnecessary characters from code to reduce its size thereby improving load times. When code is minified all comments are removed, as well as unneeded white space characters (space, newline, and tab). In the case of JavaScript, this improves response time performance because the size of the downloaded file is reduced. Two popular tools for minifying JavaScript code are JSMin and YUI Compressor.
Obfuscation is an alternative optimization that can be applied to source code. Like minification, it removes comments and white space, but it also munges the code. As part of munging, function and variable names are converted into smaller strings making the code more compact as well as harder to read. This is typically done to make it more difficult to reverse engineer the code. But munging can help performance because it reduces the code size beyond what is achieved by minification. The tool-of-choice is less clear in the area of JavaScript obfuscation. Dojo Compressor (ShrinkSafe) is the one I’ve seen used the most.
Minification is a safe, fairly straightforward process. Obfuscation, on the other hand, is more complex and thus more likely to generate bugs as a result of the obfuscation step itself. Obfuscation also requires modifying your code to indicate API functions and other symbols that should not be munged. It also makes it harder to debug your code in production. Although I’ve never seen problems introduced from minification, I have seen bugs caused by obfuscation. In a survey of ten top U.S. web sites, minification achieved a 21% size reduction versus 25% for obfuscation. Although obfuscation has a higher size reduction, I recommend minifying JavaScript code because of the reduced risks and maintenance costs.
In addition to minifying external scripts, inlined script blocks can and should also be minified. Even if you gzip your scripts, as described in Rule 4, minifying them will still reduce the size by 5% or more. As the use and size of JavaScript increases, so will the savings gained by minifying your JavaScript code.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 2:32 PM | Comments (16) | Permalink
July 20, 2007
High Performance Web Sites: Rule 9 – Reduce DNS Lookups
The Domain Name System (DNS) maps hostnames to IP addresses, just as phonebooks map people's names to their phone numbers. When you type www.yahoo.com into your browser, a DNS resolver contacted by the browser returns that server’s IP address. DNS has a cost. It typically takes 20-120 milliseconds for DNS to lookup the IP address for a given hostname. The browser can’t download anything from this hostname until the DNS lookup is completed.
DNS lookups are cached for better performance. This caching can occur on a special caching server, maintained by the user's ISP or local area network, but there is also caching that occurs on the individual user's computer. The DNS information remains in the operating system's DNS cache (the "DNS Client service" on Microsoft Windows). Most browsers have their own caches, separate from the operating system's cache. As long as the browser keeps a DNS record in its own cache, it doesn't bother the operating system with a request for the record.
Internet Explorer caches DNS lookups for 30 minutes by default, as specified by the DnsCacheTimeout registry setting. Firefox caches DNS lookups for 1 minute, controlled by the network.dnsCacheExpiration configuration setting. (Fasterfox changes this to 1 hour.)
When the client’s DNS cache is empty (for both the browser and the operating system), the number of DNS lookups is equal to the number of unique hostnames in the web page. This includes the hostnames used in the page’s URL, images, script files, stylesheets, Flash objects, etc. Reducing the number of unique hostnames reduces the number of DNS lookups.
Reducing the number of unique hostnames has the potential to reduce the amount of parallel downloading that takes place in the page. Avoiding DNS lookups cuts response times, but reducing parallel downloads may increase response times. My guideline is to split these components across at least two but no more than four hostnames. This results in a good compromise between reducing DNS lookups and allowing a high degree of parallel downloads.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 10:42 AM | Comments (11) | Permalink
July 18, 2007
High Performance Web Sites: Rule 8 – Make JavaScript and CSS External
Many of these performance rules deal with how external components are managed. However, before these considerations arise you should ask a more basic question: Should JavaScript and CSS be contained in external files, or inlined in the page itself?
Using external files in the real world generally produces faster pages because the JavaScript and CSS files are cached by the browser. JavaScript and CSS that are inlined in HTML documents get downloaded every time the HTML document is requested. This reduces the number of HTTP requests that are needed, but increases the size of the HTML document. On the other hand, if the JavaScript and CSS are in external files cached by the browser, the size of the HTML document is reduced without increasing the number of HTTP requests.
The key factor, then, is the frequency with which external JavaScript and CSS components are cached relative to the number of HTML documents requested. This factor, although difficult to quantify, can be gauged using various metrics. If users on your site have multiple page views per session and many of your pages re-use the same scripts and stylesheets, there is a greater potential benefit from cached external files.
Many web sites fall in the middle of these metrics. For these properties, the best solution generally is to deploy the JavaScript and CSS as external files. The only exception I’ve seen where inlining is preferable is with home pages, such as Yahoo!'s front page (http://www.yahoo.com) and My Yahoo! (http://my.yahoo.com). Home pages that have few (perhaps only one) page view per session may find that inlining JavaScript and CSS results in faster end-user response times.
For front pages that are typically the first of many page views, there are techniques that leverage the reduction of HTTP requests that inlining provides, as well as the caching benefits achieved through using external files. One such technique is to inline JavaScript and CSS in the front page, but dynamically download the external files after the page has finished loading. Subsequent pages would reference the external files that should already be in the browser's cache.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 8:05 PM | Comments (18) | Permalink
July 16, 2007
High Performance Web Sites: Rule 7 – Avoid CSS Expressions
CSS expressions are a powerful (and dangerous) way to set CSS properties dynamically. They’re supported in Internet Explorer, starting with version 5. As an example, the background color could be set to alternate every hour using CSS expressions.
background-color: expression( (new Date()).getHours()%2 ? "#B8D4FF" : "#F08A00" );
As shown here, the expression method accepts a JavaScript expression. The CSS property is set to the result of evaluating the JavaScript expression. The expression method is ignored by other browsers, so it is useful for setting properties in Internet Explorer needed to create a consistent experience across browsers.
The problem with expressions is that they are evaluated more frequently than most people expect. Not only are they evaluated when the page is rendered and resized, but also when the page is scrolled and even when the user moves the mouse over the page. Adding a counter to the CSS expression allows us to keep track of when and how often a CSS expression is evaluated. Moving the mouse around the page can easily generate more than 10,000 evaluations.
One way to reduce the number of times your CSS expression is evaluated is to use one-time expressions, where the first time the expression is evaluated it sets the style property to an explicit value, which replaces the CSS expression. If the style property must be set dynamically throughout the life of the page, using event handlers instead of CSS expressions is an alternative approach. If you must use CSS expressions, remember that they may be evaluated thousands of times and could affect the performance of your page.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 7:52 AM | Comments (10) | Permalink
July 12, 2007
High Performance Web Sites: Rule 6 – Move Scripts to the Bottom
Rule 5 described how stylesheets near the bottom of the page prohibit progressive rendering, and how moving them to the document HEAD eliminates the problem. Scripts (external JavaScript files) pose a similar problem, but the solution is just the opposite: it’s better to move scripts from the top to as low in the page as possible. One reason is to enable progressive rendering, but another is to achieve greater download parallelization.
With stylesheets, progressive rendering is blocked until all stylesheets have been downloaded. That’s why it’s best to move stylesheets to the document HEAD, so they get downloaded first and rendering isn’t blocked. With scripts, progressive rendering is blocked for all content below the script. Moving scripts as low in the page as possible means there's more content above the script that is rendered sooner.
The second problem caused by scripts is blocking parallel downloads. The HTTP/1.1 specification suggests that browsers download no more than two components in parallel per hostname. If you serve your images from multiple hostnames, you can get more than two downloads to occur in parallel. (I've gotten Internet Explorer to download over 100 images in parallel.) While a script is downloading, however, the browser won’t start any other downloads, even on different hostnames.
In some situations it’s not easy to move scripts to the bottom. If, for example, the script uses document.write to insert part of the page’s content, it can’t be moved lower in the page. There might also be scoping issues. In many cases, there are ways to workaround these situations.
An alternative suggestion that often comes up is to use deferred scripts. The DEFER attribute indicates that the script does not contain document.write, and is a clue to browsers that they can continue rendering. Unfortunately, Firefox doesn't support the DEFER attribute. In Internet Explorer, the script may be deferred, but not as much as desired. If a script can be deferred, it can also be moved to the bottom of the page. That will make your web pages load faster.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 5:04 PM | Comments (37) | Permalink
July 9, 2007
High Performance Web Sites: Rule 5 – Put Stylesheets at the Top
While researching performance at Yahoo!, we discovered that moving stylesheets to the document HEAD makes pages load faster. This is because putting stylesheets in the HEAD allows the page to render progressively.
Front-end engineers that care about performance want a page to load progressively; that is, we want the browser to display whatever content it has as soon as possible. This is especially important for pages with a lot of content and for users on slower Internet connections. The importance of giving users visual feedback, such as progress indicators, has been well researched and documented. In our case the HTML page is the progress indicator! When the browser loads the page progressively the header, the navigation bar, the logo at the top, etc. all serve as visual feedback for the user who is waiting for the page. This improves the overall user experience.
The problem with putting stylesheets near the bottom of the document is that it prohibits progressive rendering in many browsers, including Internet Explorer. Browsers block rendering to avoid having to redraw elements of the page if their styles change. The user is stuck viewing a blank white page. Firefox doesn't block rendering, which means when the stylesheet is done loading it's possible elements in the page will have to be redrawn, resulting in the flash of unstyled content problem.
The HTML specification clearly states that stylesheets are to be included in the HEAD of the page: "Unlike A, [LINK] may only appear in the HEAD section of a document, although it may appear any number of times." Neither of the alternatives, the blank white screen or flash of unstyled content, are worth the risk. The optimal solution is to follow the HTML specification and load your stylesheets in the document HEAD.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 12:26 PM | Comments (25) | Permalink
July 3, 2007
High Performance Web Sites: Rule 4 - Gzip Components
The time it takes to transfer an HTTP request and response across the network can be significantly reduced by decisions made by front-end engineers. It’s true that the end-user’s bandwidth speed, Internet service provider, proximity to peering exchange points, etc. are beyond the control of the development team. But there are other variables that affect response times. Compression reduces response times by reducing the size of the HTTP response.
Starting with HTTP/1.1, web clients indicate support for compression with the Accept-Encoding header in the HTTP request.
Accept-Encoding: gzip, deflate
If the web server sees this header in the request, it may compress the response using one of the methods listed by the client. The web server notifies the web client of this via the Content-Encoding header in the response.
Content-Encoding: gzip
Gzip is the most popular and effective compression method at this time. It was developed by the GNU project and standardized by RFC 1952. The only other compression format you’re likely to see is deflate, but it’s less effective and less popular.
Gzipping generally reduces the response size by about 70%. Approximately 90% of today’s Internet traffic travels through browsers that claim to support gzip. If you use Apache, the module configuring gzip depends on your version: Apache 1.3 uses mod_gzip while Apache 2.x uses mod_deflate.
There are known issues with browsers and proxies that may cause a mismatch in what the browser expects and what it receives with regard to compressed content. Fortunately, these edge cases are dwindling as the use of older browsers drops off. The Apache modules help out by adding appropriate Vary response headers automatically.
Servers choose what to gzip based on file type, but are typically too limited in what they decide to compress. Most web sites gzip their HTML documents. It’s also worthwhile to gzip your scripts and stylesheets, but many web sites miss this opportunity. In fact, it’s worthwhile to compress any text response including XML and JSON. Image and PDF files should not be gzipped because they are already compressed. Trying to gzip them not only wastes CPU but can potentially increase file sizes.
Gzipping as many file types as possible is an easy way to reduce page weight and accelerate the user experience.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 1:24 PM | Comments (79) | Permalink
May 24, 2007
High Performance Web Sites: Rule 3 - Add an Expires Header
Web page designs are getting richer and richer, which means more scripts, stylesheets, images, and Flash in the page. A first-time visitor to your page may have to make several HTTP requests, but by using the Expires header you make those components cacheable. This avoids unnecessary HTTP requests on subsequent page views. Expires headers are most often used with images, but they should be used on all components including scripts, stylesheets, and Flash components.
Browsers (and proxies) use a cache to reduce the number and size of HTTP requests, making web pages load faster. A web server uses the Expires header in the HTTP response to tell the client how long a component can be cached. This is a far future Expires header, telling the browser that this response won’t be stale until April 15, 2010.
Expires: Thu, 15 Apr 2010 20:00:00 GMT
If your server is Apache, use the ExiresDefault directive to set an expiration date relative to the current date. This example of the ExpiresDefault directive sets the Expires date 10 years out from the time of the request.
ExpiresDefault "access plus 10 years"
Keep in mind, if you use a far future Expires header you have to change the component’s filename whenever the component changes. At Yahoo! we often make this step part of the build process: a version number is embedded in the component’s filename, for example, yahoo_2.0.6.js.
Using a far future Expires header affects page views only after a user has already visited your site. It has no effect on the number of HTTP requests when a user visits your site for the first time and the browser’s cache is empty. The impact of this performance improvement depends, therefore, on how often users hit your pages with a primed cache. (A "primed cache" already contains all of the components in the page.) We measured this at Yahoo! and found the number of page views with a primed cache is 75-85%. By using a far future Expires header, you increase the number of components that are cached by the browser and re-used on subsequent page views without sending a single byte over the user’s Internet connection.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 11:10 AM | Comments (95) | Permalink
April 26, 2007
High Performance Web Sites: Rule 2 - Use a Content Delivery Network
The user's proximity to your web server has an impact on response times. Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user's perspective. But where should you start?
As a first step to implementing geographically dispersed content, don't attempt to redesign your web application to work in a distributed architecture. Depending on the application, changing the architecture could include daunting tasks such as synchronizing session state and replicating database transactions across server locations. Attempts to reduce the distance between users and your content could be delayed by, or never pass, this application architecture step.
Remember that 80-90% of the end-user response time is spent downloading all the components in the page: images, stylesheets, scripts, Flash, etc. This is the Performance Golden Rule, as explained in The Importance of Front-End Performance. Rather than starting with the difficult task of redesigning your application architecture, it's better to first disperse your static content. This not only achieves a bigger reduction in response times, but it's easier thanks to content delivery networks.
A content delivery network (CDN) is a collection of web servers distributed across multiple locations to deliver content more efficiently to users. The server selected for delivering content to a specific user is typically based on a measure of network proximity. For example, the server with the fewest network hops or the server with the quickest response time is chosen.
Some large Internet companies own their own CDN, but it's cost-effective to use a CDN service provider, such as Akamai Technologies, Mirror Image Internet, or Limelight Networks. For start-up companies and private web sites, the cost of a CDN service can be prohibitive, but as your target audience grows larger and becomes more global, a CDN is necessary to achieve fast response times. At Yahoo!, properties that moved static content off their application web servers to a CDN improved end-user response times by 20% or more. Switching to a CDN is a relatively easy code change that will dramatically improve the speed of your web site.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 9:03 AM | Comments (49) | Permalink
April 3, 2007
High Performance Web Sites: Rule 1 - Make Fewer HTTP Requests
In The Importance of Front-End Performance, I reveal that 80% of the end-user response time is spent on the front-end. Most of this time is tied up in downloading all the components in the page: images, stylesheets, scripts, Flash, etc. Reducing the number of components in turn reduces the number of HTTP requests required to render the page. This is the key to faster pages.
One way to reduce the number of components in the page is to simplify the page's design. But is there a way to build pages with richer content while also achieving fast response times? Here are some techniques for reducing the number of HTTP requests, while still supporting rich page designs.
Image maps combine multiple images into a single image. The overall size is about the same, but reducing the number of HTTP requests speeds up the page. Image maps only work if the images are contiguous in the page, such as a navigation bar. Defining the coordinates of image maps can be tedious and error prone.
CSS Sprites are the preferred method for reducing the number of image requests. Combine all the images in your page into a single image and use the CSS background-image and background-position properties to display the desired image segment.
Inline images use the data: URL scheme to embed the image data in the actual page. This can increase the size of your HTML document. Combining inline images into your (cached) stylesheets is a way to reduce HTTP requests and avoid increasing the size of your pages.
Combined files are a way to reduce the number of HTTP requests by combining all scripts into a single script, and similarly combining all stylesheets into a single stylesheet. It's a simple idea that hasn't seen wide adoption. The ten top U.S. web sites average 7 scripts and 2 stylesheets per page. Combining files is more challenging when the scripts and stylesheets vary from page to page, but making this part of your release process improves response times.
Reducing the number of HTTP requests in your page is the place to start. This is the most important guideline for improving performance for first time visitors. As described in Tenni Theurer's blog Browser Cache Usage - Exposed!, 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these first time visitors is key to a better user experience.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 9:41 AM | Comments (45) | Permalink
March 20, 2007
High Performance Web Sites: The Importance of Front-End Performance
In 2004, I started the Exceptional Performance group at Yahoo!. We're a small team chartered to measure and improve the performance of Yahoo!'s products. Having worked as a back-end engineer most of my career, I approached this as I would a code optimization project - I profiled web performance to identify where there was the greatest opportunity for improvement. Since our goal is to improve the end-user experience, I measured response times in a browser over various bandwidth speeds. What I saw is illustrated in the following chart showing HTTP traffic for http://www.yahoo.com.

In the figure above, the first bar, labeled "html", is the initial request for the HTML document. In this case, only 5% of the end-user response time is spent fetching the HTML document. This result holds true for almost all web sites. In sampling the top ten U.S. websites, all but one spend less than 20% of the total response time getting the HTML document. The other 80+% of the time is spent dealing with what's in the HTML document, namely, the front-end. That's why the key to faster web sites is to focus on improving front-end performance.
There are three main reasons why front-end performance is the place to start.
- There is more potential for improvement by focusing on the front-end. Cutting it in half reduces response times by 40% or more, whereas cutting back-end performance in half results in less than a 10% reduction.
- Front-end improvements typically require less time and resources than back-end projects (redesigning application architecture and code, finding and optimizing critical code paths, adding or modifying hardware, distributing databases, etc.).
- Front-end performance tuning has been proven to work. Over fifty teams at Yahoo! have reduced their end-user response times by following our performance best practices, often by 25% or more.
Our performance golden rule is: optimize front-end performance first, that's where 80% or more of the end-user response time is spent.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted at 9:16 AM | Comments (27) | Permalink
Subscribe
Recent Blog Articles
view all
YQL Open Table for Google Buzz now live
Tue, 09 Feb 2010
INSERT INTO twitter.status ...
Mon, 08 Feb 2010
Announcing the Yahoo! Brasil Open Hack Day 2010, 20-21 March
Mon, 08 Feb 2010
Marketing hacks, linchpins, and tech women of valor
Sun, 07 Feb 2010
Yahoo! India invites you to join the first India Hadoop Summit
Thu, 04 Feb 2010
Recent Links
Appcelerator Titanium + Yahoo YQL on Vimeo
Mon, 08 Feb 2010
Tue, 02 Feb 2010
PhoneGap | Cross platform mobile framework
Sat, 30 Jan 2010
Web developers can rule the iPad - O'Reilly Radar
Sat, 30 Jan 2010
rc3.org - Is the iPad the harbinger of doom for personal computing?
Thu, 28 Jan 2010
Archives
2010
2009
2008
2007
2006
2005
Recent Readers



