
« June 2007 | Main | August 2007 »
Randy Troppmann (with designer/developer Sarah Ramsden) reports continued success with the Yahoo! Maps and Flickr APIs on RunningMap.com. Visitors can create, edit, save, share, and add Flickr photos to maps of their favorite running routes. Or, if you've forgotten your permalink, you can search over 13,000 user-generated routes.

RunningMap is quick, easy to use, and does just the sort of thing we were hoping for when we started opening up the APIs: it meets an ongoing need that could only be expressed and fulfilled by a committed online community. Congratulations, folks; we'll see you on the road.
Kent Brewster, Yahoo! Developer Network
Posted by Kent Brewster at 2:39 PM | Comments (2)
Yahoo's Doug Cutting and Eric Baldeschwieler played to a packed audience at last week's O'Reilly Open Source Convention in Portland. Their talk, called "Meet Hadoop", is a high-level presentation about the popular open source distributed computing platform that is used within Yahoo!, and elsewhere, to provide scalable infrastructure. We have the talk slides in PowerPoint format (part1, part2), a video of the presentation (video iPod format), as well as just the audio portion of the presentation (mp3).
Posted by at 2:30 PM | Comments (2)
For the last several years, every company involved in building large web-scale systems has faced some of the same fundamental challenges. While nearly everyone agrees that the "divide-and-conquer using lots of cheap hardware" approach to breaking down large problems is the only way to scale, doing so is not easy.
The underlying infrastructure has always been a challenge. You have to buy, power, install, and manage a lot of servers. Even if you use somebody else's commodity hardware, you still have to develop the software that'll do the divide-and-conquer work to keep them all busy.
It's hard work. And it needs to be commoditized, just like the hardware has been...
We too have been dealing with this at Yahoo. Analyzing petabytes of data takes a lot of CPU power and storage. And given the way our needs (and the web as a whole) have been growing, there will likely be dozens of similarly demanding applications before long.
To build the necessary software infrastructure, we could have gone off to develop our own technology, treating it as a competitive advantage, and charged ahead. But we've taken a slightly different approach. Realizing that a growing number of companies and organizations are likely to need similar capabilities, we got behind the work of Doug Cutting (creator of the open source Nutch and Lucene projects) and asked him to join Yahoo to help deploy and continue working on the [then new] open source Hadoop project.
What started here as a 20 node cluster in March of 2006 was up to nearly 200 a month later and has continued to grow as it eats terabytes and terabytes of data. It wasn't long after that our code contributions back to Hadoop really started to ramp up as well.
Here's a quick timeline of how things have progressed since then...
By supporting and contributing to an open source grid computing project, we hope to be part of providing a solid, efficient, and scalable system that anyone can use to attack the types of problems and data sets that are becoming more common on the web. And since it's open source, everyone benefits from the expertise of developers and users around the world. We've already seen similar benefits from our use and support of Apache, PHP, and MySQL (just to name a few).
As we noted last week, Doug and Eric Baldeschwieler (Yahoo's Director of Grid Computing) are presenting Meet Hadoop at the 2007 Open Source Convention this week. While this is one of the first times we're really talking about our involvement with Hadoop in public, it certainly won't be the last.
Looking ahead and thinking about how the economics of large scale computing continue to improve, it's not hard to imagine a time when Hadoop and Hadoop-powered infrastructure is as common as the LAMP (Linux, Apache, MySQL, Perl/PHP/Python) stack that helped to powered the previous growth of the Web. We're already seeing universities begin to teach about Hadoop (University of Washington) and looking at building their own clusters (Carnegie Mellon University).
We're still in the very early days of this revolution and very proud to be part of it.
Jeremy Zawodny
Yahoo! Developer Network
Posted by jzawodn at 10:30 AM | Comments (10)
Yahoo! has released YSlow, their web performance tool, on YDN under an open source license. Steve Souders, Yahoo!'s Chief Performance Yahoo!, made the announcement during his session at OSCon.
YSlow measures web page performance based on the best practices evangelized by Yahoo!'s Exceptional Performance team. Since many of these best practices focus on the frontend, YSlow is integrated with Joe Hewitt's Firebug, the web development tool of choice for frontend developers.
YSlow has three main views: Performance, Stats, and Components. Performance view scores the page against each performance rule, generates an overall YSlow grade for the page, and lists specific recommendations for making the page faster. Stats view summarizes the total page weight, cookie size, and HTTP request count. Components view lists each component (image, stylesheet, script, Flash object, etc.) in the page along with HTTP information relevant to page load times. It also contains several tools including JSLint. Try it out!
Posted by stevesouders at 4:31 PM | Comments (10)
The fourth iteration of Mashup Camp and Mashup University went off without a hitch last week, Monday through Thursday at the Computer History Museum, in Mountain View, CA.
Best Mashup Winners:
First prize: Chime TV, by Taylor McKnight and Chirag Mehta. Really sweet all-Flash video aggregator, featuring video from all over the web. (Here's Mashup Camp founder David Berlind, bouncing off ZDNet, YouTube, and Chime.TV, about how mash-ups work.) This was Mr. McKnight's second first-prize win; his first was two years ago, for Podbop.
Second prize: The Telephone Game, by John Herren. Responding to suggestions from the demo audience during two rounds of Speed Geeking, John built an application that took one input term, ran it through a bunch (six or seven, by the end of the second session) different search engines, and showed the results as it went along.
Third prize: ClubStumbler, by Nate Ritter and Chris Radcliffe. The designated driver's best friend, ClubStumbler helps you plot the best route to take (with map and directions) through a given cloud of clubs, pubs, or other night spots.
Other Fun Stuff:
Seegest: list, rate, exchange, and make plans to attend movies with your friends. User database powered by BBAuth.
EventsPad: social events calendar that aggregates, organizes, and presents interesting calendar events with rich media.
Next Mashup Camp: Dublin, Ireland, at Trinity College, September 10 through 14. Hope to see you there!
Kent Brewster, Yahoo! Developer Network
Posted by Kent Brewster at 10:28 AM | Comments (2)
Entity tags (ETags) are a mechanism that web servers and browsers use to determine whether the component in the browser's cache matches the one on the origin server. (An "entity" is another word for what I've been calling a "component": images, scripts, stylesheets, etc.) ETags were added to provide a mechanism for validating entities that is more flexible than the last-modified date. An ETag is a string that uniquely identifies a specific version of a component. The only format constraints are that the string be quoted. The origin server specifies the component's ETag using the ETag response header.
HTTP/1.1 200 OK Last-Modified: Tue, 12 Dec 2006 03:03:59 GMT ETag: "10c24bc-4ab-457e1c1f" Content-Length: 12195
Later, if the browser has to validate a component, it uses the If-None-Match header to pass the ETag back to the origin server. If the ETags match, a 304 status code is returned reducing the response by 12195 bytes for this example.
GET /i/yahoo.gif HTTP/1.1 Host: us.yimg.com If-Modified-Since: Tue, 12 Dec 2006 03:03:59 GMT If-None-Match: "10c24bc-4ab-457e1c1f" HTTP/1.1 304 Not Modified
The problem with ETags is that they typically are constructed using attributes that make them unique to a specific server hosting a site. ETags won't match when a browser gets the original component from one server and later tries to validate that component on a different server—a situation that is all too common on web sites that use a cluster of servers to handle requests. By default, both Apache and IIS embed data in the ETag that dramatically reduces the odds of the validity test succeeding on web sites with multiple servers.
The ETag format for Apache 1.3 and 2.x is inode-size-timestamp. Although a given file may reside in the same directory across multiple servers, and have the same file size, permissions, timestamp, etc., its inode is different from one server to the next.
IIS 5.0 and 6.0 have a similar issue with ETags. The format for ETags on IIS is Filetimestamp:ChangeNumber. A ChangeNumber is a counter used to track configuration changes to IIS. It's unlikely that the ChangeNumber is the same across all IIS servers behind a web site.
The end result is ETags generated by Apache and IIS for the exact same component won't match from one server to another. If the ETags don't match, the user doesn't receive the small, fast 304 response that ETags were designed for; instead, they'll get a normal 200 response along with all the data for the component. If you host your web site on just one server, this isn't a problem. But if you have multiple servers hosting your web site, and you're using Apache or IIS with the default ETag configuration, your users are getting slower pages, your servers have a higher load, you're consuming greater bandwidth, and proxies aren't caching your content efficiently. Even if your components have a far future Expires header, a conditional GET request is still made whenever the user hits Reload or Refresh.
If you're not taking advantage of the flexible validation model that ETags provide, it's better to just remove the ETag altogether. The Last-Modified header validates based on the component's timestamp. And removing the ETag reduces the size of the HTTP headers in both the response and subsequent requests. This Microsoft Support article describes how to remove ETags. In Apache, this is done by simply adding the following line to your Apache configuration file:
FileETag none
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted by stevesouders at 4:06 PM | Comments (42)
It hurts performance to include the same JavaScript file twice in one page. This isn’t as unusual as you might think. A review of the ten top U.S. web sites shows that two of them contain a duplicated script. Two main factors increase the odds of a script being duplicated in a single web page: team size and number of scripts. When it does happen, duplicate scripts hurt performance by creating unnecessary HTTP requests and wasted JavaScript execution.
Unnecessary HTTP requests happen in Internet Explorer, but not in Firefox. In Internet Explorer, if an external script is included twice and is not cacheable, it generates two HTTP requests during page loading. Even if the script is cacheable, extra HTTP requests occur when the user reloads the page.
In addition to generating wasteful HTTP requests, time is wasted evaluating the script multiple times. This redundant JavaScript execution happens in both Firefox and Internet Explorer, regardless of whether the script is cacheable.
One way to avoid accidentally including the same script twice is to implement a script management module in your templating system. The typical way to include a script is to use the SCRIPT tag in your HTML page.
<script type="text/javascript" src="menu_1.0.17.js"></script>
An alternative in PHP would be to create a function called insertScript.
<?php insertScript("menu.js") ?>
In addition to preventing the same script from being inserted multiple times, this function could handle other issues with scripts, such as dependency checking and adding version numbers to script filenames to support far future Expires headers.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted by stevesouders at 3:39 PM | Comments (4)
Redirects are accomplished using the 301 and 302 status codes. Here’s an example of the HTTP headers in a 301 response.
HTTP/1.1 301 Moved Permanently
Location: http://example.com/newuri
Content-Type: text/html
The browser automatically takes the user to the URL specified in the Location field. All the information necessary for a redirect is in the headers. The body of the response is typically empty. Despite their names, neither a 301 nor a 302 response is cached in practice unless additional headers, such as Expires or Cache-Control, indicate it should be. The meta refresh tag and JavaScript are other ways to direct users to a different URL, but if you must do a redirect, the preferred technique is to use the standard 3xx HTTP status codes, primarily to ensure the back button works correctly.
The main thing to remember is that redirects slow down the user experience. Inserting a redirect between the user and the HTML document delays everything in the page since nothing in the page can be rendered and no components can start being downloaded until the HTML document has arrived.
One of the most wasteful redirects happens frequently and web developers are generally not aware of it. It occurs when a trailing slash (/) is missing from a URL that should otherwise have one. For example, going to http://astrology.yahoo.com/astrology results in a 301 response containing a redirect to http://astrology.yahoo.com/astrology/ (notice the added trailing slash). This is fixed in Apache by using Alias or mod_rewrite, or the DirectorySlash directive if you're using Apache handlers.
Connecting an old web site to a new one is another common use for redirects. Others include connecting different parts of a website and directing the user based on certain conditions (type of browser, type of user account, etc.). Using a redirect to connect two web sites is simple and requires little additional coding. Although using redirects in these situations reduces the complexity for developers, it degrades the user experience. Alternatives for this use of redirects include using Alias and mod_rewrite if the two code paths are hosted on the same server. If a domain name change is the cause of using redirects, an alternative is to create a CNAME (a DNS record that creates an alias pointing from one domain name to another) in combination with Alias or mod_rewrite.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted by stevesouders at 3:10 PM | Comments (13)
Minification is the practice of removing unnecessary characters from code to reduce its size thereby improving load times. When code is minified all comments are removed, as well as unneeded white space characters (space, newline, and tab). In the case of JavaScript, this improves response time performance because the size of the downloaded file is reduced. Two popular tools for minifying JavaScript code are JSMin and YUI Compressor.
Obfuscation is an alternative optimization that can be applied to source code. Like minification, it removes comments and white space, but it also munges the code. As part of munging, function and variable names are converted into smaller strings making the code more compact as well as harder to read. This is typically done to make it more difficult to reverse engineer the code. But munging can help performance because it reduces the code size beyond what is achieved by minification. The tool-of-choice is less clear in the area of JavaScript obfuscation. Dojo Compressor (ShrinkSafe) is the one I’ve seen used the most.
Minification is a safe, fairly straightforward process. Obfuscation, on the other hand, is more complex and thus more likely to generate bugs as a result of the obfuscation step itself. Obfuscation also requires modifying your code to indicate API functions and other symbols that should not be munged. It also makes it harder to debug your code in production. Although I’ve never seen problems introduced from minification, I have seen bugs caused by obfuscation. In a survey of ten top U.S. web sites, minification achieved a 21% size reduction versus 25% for obfuscation. Although obfuscation has a higher size reduction, I recommend minifying JavaScript code because of the reduced risks and maintenance costs.
In addition to minifying external scripts, inlined script blocks can and should also be minified. Even if you gzip your scripts, as described in Rule 4, minifying them will still reduce the size by 5% or more. As the use and size of JavaScript increases, so will the savings gained by minifying your JavaScript code.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted by stevesouders at 2:32 PM | Comments (9)
The Domain Name System (DNS) maps hostnames to IP addresses, just as phonebooks map people's names to their phone numbers. When you type www.yahoo.com into your browser, a DNS resolver contacted by the browser returns that server’s IP address. DNS has a cost. It typically takes 20-120 milliseconds for DNS to lookup the IP address for a given hostname. The browser can’t download anything from this hostname until the DNS lookup is completed.
DNS lookups are cached for better performance. This caching can occur on a special caching server, maintained by the user's ISP or local area network, but there is also caching that occurs on the individual user's computer. The DNS information remains in the operating system's DNS cache (the "DNS Client service" on Microsoft Windows). Most browsers have their own caches, separate from the operating system's cache. As long as the browser keeps a DNS record in its own cache, it doesn't bother the operating system with a request for the record.
Internet Explorer caches DNS lookups for 30 minutes by default, as specified by the DnsCacheTimeout registry setting. Firefox caches DNS lookups for 1 minute, controlled by the network.dnsCacheExpiration configuration setting. (Fasterfox changes this to 1 hour.)
When the client’s DNS cache is empty (for both the browser and the operating system), the number of DNS lookups is equal to the number of unique hostnames in the web page. This includes the hostnames used in the page’s URL, images, script files, stylesheets, Flash objects, etc. Reducing the number of unique hostnames reduces the number of DNS lookups.
Reducing the number of unique hostnames has the potential to reduce the amount of parallel downloading that takes place in the page. Avoiding DNS lookups cuts response times, but reducing parallel downloads may increase response times. My guideline is to split these components across at least two but no more than four hostnames. This results in a good compromise between reducing DNS lookups and allowing a high degree of parallel downloads.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted by stevesouders at 10:42 AM | Comments (9)
At our recent London Hack Day, Nick Bilton and Michael Young of the New York Times Research & Development Lab walked away with the top award ("Best Overall Hack") for their mobile/desktop synchronization and toggling solution: Shifd.
In the second part of this interview, we followed up with these two to talk a bit more about their solution, how they conceived and executed it, what's next for Shifd, and more.
Q: Shifd was one of the few specifically Mobile-focused applications we saw at this Hack Day. What design considerations did you make based on your past usage of mobile devices?
Mobile devices in general have terrible user interface and even worse browser specifications, and because of both of these problems, people really don't understand how to create or design content for these devices. It's apparent that mobile devices are really important in our life as you can see from the hoopla that arrived with the iPhone. We really aimed to make the design and User Interface simple, clean and easy to understand.Q: How did the idea of Shifd come about? Did you know you wanted to build this before arriving at Hack Day?
We were trying to come up with an idea for Hack Day a week or so before we came out there, but there was so much going on we didn't really have time to talk about it. A couple of days before we left, we found a white board floating around our office and we attacked it with a marker, making three columns: 1. Devices, 2. Code, and 3. Communication and then started to try to tie ideas together. We soon gave up on the board and barely a day before leaving for London we came up with the idea for Shifd. Primarily basing it on the concept of shifting content from your various devices in a seamless manner.Q: What feature/functionality did you leave out due to the time constraints?
We wanted to add a few features including being able to interact with shifd.com through text message. For example, you would be able to send a text message of a note, or a URL to Shifd and it would automatically be saved to your Shifd page to be viewed or edited later when you're back at your desk.Q: What user feedback did you bring to the process from your work in the Times R&D lab (or before)?
We have a series of presentations that we have given to The New York Times Company over the past few months explaining where the R&D Lab sees technology in the next 18-24 months, and one of the ideas we talked about was your devices being aware of your presence and even aware of your friends and family's presence making the way we interact with content and each other much more social and dynamic. The real push and idea for Shifd came from the idea of presence and wanting to build something that we can actually use and that we would find useful.Q: Are there Yahoo! APIs/Web Services that you wish were available (that weren't so at the time of this event)?
The collection of Yahoo! APIs is really impressive! We used the Local Search (version 2) API in the hack, to let users search for local listings. An API or tool to handle the feed aggregation would have helped, but wasn't necessary.
We are huge fans of the Yahoo! APIs and tools in the R&D group and use a lot of them in the prototypes and applications we build in our group: Maps, Local Search, YUI, Geocoding, Content Analysis, etc. The Content Analysis (text extraction) API has been very helpful to us! Any plans for an expanded version that supports entity extraction? We'd love to see that! (Ed. note: Sounds like a good idea. We'll pass this along to the development team.)Q: You mentioned in your podcast that you'd build a non-RFID-based version of Shifd. What design trade-offs will you need to balance in such a version?
The reason we used RFID was to detect your presence so you wouldn't need to do anything to access your content. You would be able to just pick up your phone, or put it down, and the computer would know what environment you're in. With the non-RFID version there will be a couple of extra steps, like typing in the shifd.com URL to get to the site and also clicking a SEND button when you walk away. Although this isn't that much of a trade-off, it's a few extra steps to a project that was intended to be seamless and ultra simple for the user. We are also going to explore a Bluetooth version.Q: Have you had any interesting conversations or opportunities regarding Shifd or your work arise since participating in Hack Day?
We are aiming to get this to an Alpha release so we can start to explore how people will use Shifd.com. An interesting idea that has arisen from our time at Hack Day would entail us doing a few videos around hacking Apps together for The New York Times. We'll have more info on this exciting prospect soon.Q: You received a Wii as the Hack Day prize. Any plans to setup the NY-based alternative to Yahoo! Brickhouse's Wii Wednesdays?
We actually ended up donating the Wii to the Great Ormond Street Hospital for Children in London, besides, we already have one in the Lab. Anyone at Yahoo! feel like getting beaten at Wii Tennis? (Ed. note: Next time we're out in New York, we'll gladly take you up on that challenge, gentlemen.)
All of us at the Developer Network were impressed with the project's scope, execution, and finesse. Further, it was a great demonstration of taking Yahoo!'s tools to a new experience. (While Yahoo! has many mobile offerings, Shifd showed a great offline mobile experience in a manner our products haven't supported.)
For those who haven't seen Shifd in action, be sure to visit the Lab's Shifd product page. And, for those curious, the Yahoo! tools used for this hack were:
If you've used a Yahoo! tool, submit your product to our Gallery (regardless of whether your product is a site, a mashup, a downloadable application, or a video). We want to help evangelize your successful use of our suite of services!
Micah Laaker
Posted by micahl at 10:06 PM | Comments (0)
Many of these performance rules deal with how external components are managed. However, before these considerations arise you should ask a more basic question: Should JavaScript and CSS be contained in external files, or inlined in the page itself?
Using external files in the real world generally produces faster pages because the JavaScript and CSS files are cached by the browser. JavaScript and CSS that are inlined in HTML documents get downloaded every time the HTML document is requested. This reduces the number of HTTP requests that are needed, but increases the size of the HTML document. On the other hand, if the JavaScript and CSS are in external files cached by the browser, the size of the HTML document is reduced without increasing the number of HTTP requests.
The key factor, then, is the frequency with which external JavaScript and CSS components are cached relative to the number of HTML documents requested. This factor, although difficult to quantify, can be gauged using various metrics. If users on your site have multiple page views per session and many of your pages re-use the same scripts and stylesheets, there is a greater potential benefit from cached external files.
Many web sites fall in the middle of these metrics. For these properties, the best solution generally is to deploy the JavaScript and CSS as external files. The only exception I’ve seen where inlining is preferable is with home pages, such as Yahoo!'s front page (http://www.yahoo.com) and My Yahoo! (http://my.yahoo.com). Home pages that have few (perhaps only one) page view per session may find that inlining JavaScript and CSS results in faster end-user response times.
For front pages that are typically the first of many page views, there are techniques that leverage the reduction of HTTP requests that inlining provides, as well as the caching benefits achieved through using external files. One such technique is to inline JavaScript and CSS in the front page, but dynamically download the external files after the page has finished loading. Subsequent pages would reference the external files that should already be in the browser's cache.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted by stevesouders at 8:05 PM | Comments (11)
At our recent London Hack Day, Nick Bilton and Michael Young of the New York Times Research & Development Lab walked away with the top award ("Best Overall Hack") for their mobile/desktop synchronization and toggling solution: Shifd.
In the first part of this interview, we followed up with these two to talk a bit more about their backgrounds, what's next from the Times R&D Lab, what lured them to Hack Day and more.
Q: Can you give us a little background info on yourselves?
Nick Bilton: I'm an Art Director, User Interface Designer, Technologist, Writer, Video Journalist, Photographer, Nick-of-all-trades... Master of some. I've also got a bit of a problem with gadgets, and have been known to wait outside in the cold at 6 a.m. to acquire new releases of certain products. I've worked in numerous different industries within the contexts of design and technology. Currently my time is shared between The New York Times newspaper where I'm involved in special editorial projects and the visual integration of the newspaper and website. The other half of my time is spent as the User Interface Specialist at The New York Times Research & Development Lab. I was also the co-designer of the Times Reader.
Michael Young: I'm a Creative Technologist in the Research & Development Lab at The New York Times. I've been at The NYTimes for a little over a year now, focusing primarily on building new mobile applications and exploring a vast array of new areas such as the 'digital living room.' I also spend a lot of my time trying to figure out why Nick calls himself the "Nick-of-all-trades." I hail from the West coast (Oregon) and worked at a few interactive TV startups in San Francisco before moving out to New York about six years ago.
Q: What brought you to The New York Times R&D Lab? What do you do there?
NB: I started working in the web and digital multimedia world in 1995 and floated in and out of different digital contexts since. I have been at The New York Times for about 3 years and began my career here as the Art Director for The New York Times Business & Circuits Section. I was then assigned to work with Microsoft as the User Interface Designer on the Times Reader; the NYTimes version of a digital newspaper that can be synced to your computer or tablet PC and is similar to the news reading devices people used in the movie 'Minority Report.' When there was rumors of the R&D Lab opening up within the company I jumped at the chance to get involved as it would allow me the opportunity to work on a variety of different skill sets including hardware hacking, design and user interface, video and flash, coding, and our pet project -- news reading helicopter robots. (We're kidding about the helicopters, but that would be pretty neat!)
MY: I was working for an interactive TV company called OpenTV prior to joining The NYTimes, where I was building ITV applications for networks like CNN, ESPN, NBC, etc. I've always been a huge fan of The Times and they had just started the R&D group when I was looking to change jobs. A month prior to interviewing at The Times, I had created a very simple mashup that displayed AP news stories on a map, using both the Google Maps API and Yahoo! Geocoding API. (I wanted to learn both APIs at the same time.) Anyway, the mashup got a little coverage in the press and helped me get in the door for an interview at The Times.
I'm working on a variety of projects in the R&D group, primarily around mobile and ways we can make the paper more interactive through mobile devices. Another area I'm starting to look at is the idea of the 'digital living room' and where our place is in that.Q: The New York Times R&D Lab sounds like an exciting playground for new technology and applications. Aside from the digital paper prototypes that are surfaced every year or so by the futurists, what types of projects do you tackle?
You can clearly see that in the next few years we will live in a world with ubiquitous WiFi and most Americans will have a broadband-enabled mobile device, that will not only connect you to the Internet, but connect will be location aware. That's a pretty exciting prospect, especially from a news and content delivery standpoint. We're currently working on a wide array of projects including GPS, mobile -- including location aware delivery, 2-D barcodes (semacodes) and SMS -- Analytics and Search. We're also exploring all types of eInk devices, tablet PC's and foldable screens while trying to figure out how our content will be delivered to these different devices and screen sizes. Another area we are starting to dabble in is news delivery on gaming consoles and TV set top boxes.
Q: How did you hear of the London Hack Day? What lured you across the Atlantic for the event?
We've been following what Yahoo! has been doing with their internal Hack Days for the past few years and have been really been inspired by the idea. (We are actually hoping to do one internally at NYTimes.) I forget where we first saw mention of the London Hack Day, but it was on either Chad Dickerson's or Tom Coates' blog. The London trip worked out great for us -- it's not too far of a trip from New York and it gave us the opportunity to meet with some other news organizations while we were out there.
Q: How did the Hack Day event meet your expectations? (What did you expect?)
We were really surprised and excited by how organized Hack day was Yahoo! and the BBC really took care of all of the hackers. Free food and beer tickets -- couldn't beat it. We really didn't have much of an idea what we were going to see when we arrived, and we were chuffed and excited when we walked into a room of 500 super nerds. It was really inspiring to see all of the hacking going on around us -- so much creativity in one room!
Micah Laaker
Posted by micahl at 11:34 AM | Comments (0)
By now, many of you have heard a lot about Facebook’s F8 Platform, a set of services that allows 3rd parties to build applications into Facebook’s site. Since Facebook launched their platform on May 24th, companies and developers have released hundreds of applications that now entertain, delight and overwhelm Facebook users everywhere.

On June 29, we here at Yahoo! Music released our Music Videos application, which lets users post music videos from our extensive catalog to their Facebook Profile, and allows them to dedicate videos to their friends. This release capped a crazy month-long period that started when Matt Kozlov, of our Corporate Strategy group, and I attended the F8 launch. We were impressed at the applications that were already being showcased by developers who had gotten in the game early. Over the following weekend, our General Manager of Music, Ian Rogers, tracked the explosive growth of music applications like iLike, which quickly gained hundreds of thousands of users.
By the following Monday night, Ian was nagging, er, urging us to get into the game. On Tuesday I hijacked half of a meeting with my product colleagues Lucas Gonze and Roberto Fisher, and whipped up a crappy, off-the cuff product spec centered around music videos (one of our premier products), and the next day I handed a hand-drawn spec to Jim Bumgardner, one of our crack engineers.
That week, we enlisted the help of our very talented designers Ruth Kaufman and Lino Weihen. As Jim dug into the APIs, Ruth and Lino iterated on designs, adjusting as Jim discovered what was possible and what wasn’t, removing functionality that would have been hard and adding features that created more virality (like the video dedications, a weekend brainstorm of Jim’s).
We evolved the application to take particular advantage of the social networking elements of the APIs, including access to a user’s friend list and the contents of their profile, like their favorite bands. The application enables users to post our music videos to their profile pages, and lets the user draw from videos from their favorite artists as well as their friends’ favorite artists. The APIs also allowed us to take advantage of the update features in Facebook, such as the news feed that tells you what your friends are doing, but at the same time puts in limitations to avoid spam, like preventing you from inviting more than 10 friends at once (this last one was new – in the first couple of weeks, applications could invite all of your friends, but facebook later clamped down).
Somewhere along the way, Ian decided he wanted something launched by the end of June, so the team moved fast, with Jim working on this in his spare time, but still making good progress. After two weeks of prototyping and designing, the team started development in earnest. Two more weeks later, they were done. On Thursday, June 28th we had our final meeting to review the finished product.
And I must say, it rocked.
As we thoroughly reviewed it to make the go/no-go decision, Jerry Yang (our CEO) and Jeff Weiner (head of the Yahoo! Networks Division, of which Yahoo! Music is a part) happened to be in our Santa Monica offices. This is rare, since most of the senior executives work up North (as we call it) in Sunnyvale. While our team of product, design, engineering and marketing folks reviewed the application, Ian was showing it to Jerry and Jeff. They walked by the conference room in which we were meeting, and Jerry opened the door.
"Ship it!" he told us.
Who am I to argue with the CEO?
So we launched it the next day.
Over the weekend, the reviews came in. FaceReviews.com, a site specializing in the cottage industry of analyzing Facebook applications, wrote:
"Yahoo has joined facebook and hit a home run in the process with the Yahoo Music Videos Application. This is an official application from Yahoo Music. This application is one of the best designed, executed and deeply integrated facebook applications we have seen yet. The user interface is amazing and very intuitive."
Hell, yeah.
Others were also positive, though of course I chose the most glowing one to quote directly.
Over the past couple of weeks, we’ve seen modest growth. Nothing like iLike, but we’re happy to be bringing music videos to the Facebook audience and have some ideas on how to improve uptake of the application. Most importantly, we got ourselves out there where users are already.
If you’d like to read about this crazy journey from the perspective of Jim Bumgardner, he’s posted on his experiences on our semi-official Yahoo! Music blog.
Oh, and if you want to get the Facebook application, check it out at: http://apps.facebook.com/yahoomusicvideos/
Cheers!
Michael Spiegelman
Director of Product Management, Yahoo! Music
Posted by Matt McAlister at 9:36 AM | Comments (1)
CSS expressions are a powerful (and dangerous) way to set CSS properties dynamically. They’re supported in Internet Explorer, starting with version 5. As an example, the background color could be set to alternate every hour using CSS expressions.
background-color: expression( (new Date()).getHours()%2 ? "#B8D4FF" : "#F08A00" );
As shown here, the expression method accepts a JavaScript expression. The CSS property is set to the result of evaluating the JavaScript expression. The expression method is ignored by other browsers, so it is useful for setting properties in Internet Explorer needed to create a consistent experience across browsers.
The problem with expressions is that they are evaluated more frequently than most people expect. Not only are they evaluated when the page is rendered and resized, but also when the page is scrolled and even when the user moves the mouse over the page. Adding a counter to the CSS expression allows us to keep track of when and how often a CSS expression is evaluated. Moving the mouse around the page can easily generate more than 10,000 evaluations.
One way to reduce the number of times your CSS expression is evaluated is to use one-time expressions, where the first time the expression is evaluated it sets the style property to an explicit value, which replaces the CSS expression. If the style property must be set dynamically throughout the life of the page, using event handlers instead of CSS expressions is an alternative approach. If you must use CSS expressions, remember that they may be evaluated thousands of times and could affect the performance of your page.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted by stevesouders at 7:52 AM | Comments (5)
Despite what you may read on the side of Yahoo!'s Caltrain shuttle, Yahoo did not invent PHP. Rasmus Lerdorf invented PHP. Fortunately, Rasmus now works at Yahoo! and he'll be at OSCON on July 23rd along with Thomas Sha, progenitor of the Yahoo! User Interface library, for an intensely practical half-day tutorial called You Got JavaScript in My PHP! And.... In a mere 3.5 hours, the duo will "build a sample modern web application with Lerdorf driving the backend and Sha the frontend," according to the tutorial description. Expect some top-notch AJAX and PHP hacking from two experts in their respective fields.
Jason Levitt
Posted by at 6:04 PM | Comments (3)
Rule 5 described how stylesheets near the bottom of the page prohibit progressive rendering, and how moving them to the document HEAD eliminates the problem. Scripts (external JavaScript files) pose a similar problem, but the solution is just the opposite: it’s better to move scripts from the top to as low in the page as possible. One reason is to enable progressive rendering, but another is to achieve greater download parallelization.
With stylesheets, progressive rendering is blocked until all stylesheets have been downloaded. That’s why it’s best to move stylesheets to the document HEAD, so they get downloaded first and rendering isn’t blocked. With scripts, progressive rendering is blocked for all content below the script. Moving scripts as low in the page as possible means there's more content above the script that is rendered sooner.
The second problem caused by scripts is blocking parallel downloads. The HTTP/1.1 specification suggests that browsers download no more than two components in parallel per hostname. If you serve your images from multiple hostnames, you can get more than two downloads to occur in parallel. (I've gotten Internet Explorer to download over 100 images in parallel.) While a script is downloading, however, the browser won’t start any other downloads, even on different hostnames.
In some situations it’s not easy to move scripts to the bottom. If, for example, the script uses document.write to insert part of the page’s content, it can’t be moved lower in the page. There might also be scoping issues. In many cases, there are ways to workaround these situations.
An alternative suggestion that often comes up is to use deferred scripts. The DEFER attribute indicates that the script does not contain document.write, and is a clue to browsers that they can continue rendering. Unfortunately, Firefox doesn't support the DEFER attribute. In Internet Explorer, the script may be deferred, but not as much as desired. If a script can be deferred, it can also be moved to the bottom of the page. That will make your web pages load faster.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted by stevesouders at 5:04 PM | Comments (27)
Yahoo!'s Doug Cutting, creator of Lucene and Nutch, and Eric Baldeschweiler, Sr. Engineering Director,will be talking about Hadoop (named after Doug's child's stuffed elephant) at O'Reilly's OSCON in Portland this month. Their session "Meet Hadoop" will be on Wednesday afternoon, July 25th.
![]()
Yahoo! has been a major contributor to the Hadoop effort, and we've adopted it internally for some of our large-scale data analysis and research projects. We're big fans here. You won't want to miss this presentation, I'm sure.
Matt McAlister
Posted by Matt McAlister at 2:30 PM | Comments (0)
Just back from the first stop on the Adobe AIR Bus Tour; here are some quick impressions:
In addition to seamless integration with Flash, AIR is very easy for HTML developers to use and understand. You write an XML wrapper around your HTML source, your HTML brings in your CSS and JavaScript, AIR brings everything up in WebKit, and it's running, just like that. I've got a tiny proof-of-concept on my personal site; it's a stand-alone Yahoo! search widget that weighs in at a whopping 4642 bytes. More sample applications are coming online in a steady stream; check out Tweetr, Pownce, and Jack Slocum's ExtJS. Community sites are also popping up; see AIRApps.net, the Adobe Labs Showcase, and the AIR Applications Wiki, for example.
The bus-tour concept is outstanding. They've taken a luxury motor coach, shrink-wrapped it in bright Adobe red, and filled it with everything a hard-working nerd needs--sketchy bandwidth, Guitar Hero, heavily-caffeinated beverages, and sixteen tons of O'Reilly books--to code on the fly. (Remember that scene in Buckaroo Banzai when the Hong Kong Cavaliers pull up? It's sort of like that.)
Oh, and yes, the bus has an API. Feeds include goodies like geolocation, events, videos, weblogs, and Twitter, and they're adding to it constantly. Mike, Kevin, and Daniel are all nice, approachable guys with clear technical expertise, and many of the examples they show are hosted on their own sites. This is a pretty good sign that the outfit you're dealing with "gets" technology evangelism: they're not afraid to allow their people to build their own personal brands.
Tour dates are online; if the Big Red Bus is coming to your town, check it out!
Kent Brewster, Yahoo! Developer Network
Posted by Kent Brewster at 11:09 AM | Comments (1)
While researching performance at Yahoo!, we discovered that moving stylesheets to the document HEAD makes pages load faster. This is because putting stylesheets in the HEAD allows the page to render progressively.
Front-end engineers that care about performance want a page to load progressively; that is, we want the browser to display whatever content it has as soon as possible. This is especially important for pages with a lot of content and for users on slower Internet connections. The importance of giving users visual feedback, such as progress indicators, has been well researched and documented. In our case the HTML page is the progress indicator! When the browser loads the page progressively the header, the navigation bar, the logo at the top, etc. all serve as visual feedback for the user who is waiting for the page. This improves the overall user experience.
The problem with putting stylesheets near the bottom of the document is that it prohibits progressive rendering in many browsers, including Internet Explorer. Browsers block rendering to avoid having to redraw elements of the page if their styles change. The user is stuck viewing a blank white page. Firefox doesn't block rendering, which means when the stylesheet is done loading it's possible elements in the page will have to be redrawn, resulting in the flash of unstyled content problem.
The HTML specification clearly states that stylesheets are to be included in the HEAD of the page: "Unlike A, [LINK] may only appear in the HEAD section of a document, although it may appear any number of times." Neither of the alternatives, the blank white screen or flash of unstyled content, are worth the risk. The optimal solution is to follow the HTML specification and load your stylesheets in the document HEAD.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted by stevesouders at 12:26 PM | Comments (17)
There are plenty of web sites that use one or two APIs to add spice to their content and design, but Niall Kennedy's startupsearch.org, a site that tracks Web 2.0 startups, has more API bling than most. Written in Python using the Django framework, Niall used our YUI library for web pages, Flickr to display company images, Yahoo! Pipes to create custom product data feeds, and the Yahoo! Search Site Explorer API to keep track of inbound web links to product domains. He also used a number of Google and Amazon APIs as well as APIs from some smaller companies. Check out his credits tabs for details.
Jason Levitt
Posted by at 11:44 AM | Comments (0)
Thanks to our colleagues in the Berkeley research labs, we can now watch all the Hack Day London demos in this handy video viewer.
"You can see a list of all the hacks (including which hacks won the judges awards), you can launch URL demos for many of the hacks, and, best of all, you can jump to any hack instantly to watch it (no waiting for it to load or fast forwarding through the video to try and find things). We’ve also displayed the list of hacks directly on the video time line, which makes scanning around for hacks incredibly simple."
For example, you've probably heard about the flying Bli.mp hack, the Coke and Mentos rocket hack from Blue Steel, Hack Hud mashing BBC news with Yahoo! services, among about 70 other great hacks. Hack TV let's you jump right into those demos.
And, of course, you can see the overall winning hack from The New York Times R&D Lab: shifd.com
Enjoy these and all the demos on Hack TV: http://timetags.research.yahoo.com/hackdayuk
Matt McAlister
Posted by Matt McAlister at 3:12 PM | Comments (1)
The time it takes to transfer an HTTP request and response across the network can be significantly reduced by decisions made by front-end engineers. It’s true that the end-user’s bandwidth speed, Internet service provider, proximity to peering exchange points, etc. are beyond the control of the development team. But there are other variables that affect response times. Compression reduces response times by reducing the size of the HTTP response.
Starting with HTTP/1.1, web clients indicate support for compression with the Accept-Encoding header in the HTTP request.
Accept-Encoding: gzip, deflate
If the web server sees this header in the request, it may compress the response using one of the methods listed by the client. The web server notifies the web client of this via the Content-Encoding header in the response.
Content-Encoding: gzip
Gzip is the most popular and effective compression method at this time. It was developed by the GNU project and standardized by RFC 1952. The only other compression format you’re likely to see is deflate, but it’s less effective and less popular.
Gzipping generally reduces the response size by about 70%. Approximately 90% of today’s Internet traffic travels through browsers that claim to support gzip. If you use Apache, the module configuring gzip depends on your version: Apache 1.3 uses mod_gzip while Apache 2.x uses mod_deflate.
There are known issues with browsers and proxies that may cause a mismatch in what the browser expects and what it receives with regard to compressed content. Fortunately, these edge cases are dwindling as the use of older browsers drops off. The Apache modules help out by adding appropriate Vary response headers automatically.
Servers choose what to gzip based on file type, but are typically too limited in what they decide to compress. Most web sites gzip their HTML documents. It’s also worthwhile to gzip your scripts and stylesheets, but many web sites miss this opportunity. In fact, it’s worthwhile to compress any text response including XML and JSON. Image and PDF files should not be gzipped because they are already compressed. Trying to gzip them not only wastes CPU but can potentially increase file sizes.
Gzipping as many file types as possible is an easy way to reduce page weight and accelerate the user experience.
Steve Souders
[Steve Souders is Yahoo!'s Chief Performance Yahoo!. This is one in a series of Best Practices for Speeding Up Your Web Site. This article is based on Steve's book High Performance Web Sites, published by O'Reilly.]
Posted by stevesouders at 1:24 PM | Comments (64)
Copyright © 2008 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Copyright Policy - Job Openings