Chris Sells posted this recently on twitter: Pingdom: Exploring the software behind Facebook, the world’s largest site. It's a very interesting read. Some of the highlights:
-
Facebook serves 570 billion page views per month (according to Google Ad Planner).
-
There are more photos on Facebook than all other photo sites combined (including sites like Flickr).
-
More than 3 billion photos are uploaded every month.
-
Facebook’s systems serve 1.2 million photos per second. This doesn’t include the images served by Facebook’s CDN.
-
More than 25 billion pieces of content (status updates, comments, etc) are shared every month.
-
Facebook has more than 30,000 servers (and this number is from last year!)
Those numbers are just staggering. Astronomical. Mind-boggling. Etc.
A lot of people doing web stuff (usually the non-technical ones anyway) think they have to match these, or something similar, "just incase". The simple answer is this:
Your website will never, ever, do 1% of 1% of 1% of these numbers. The exception to this is if you go work for Facebook, Google, Microsoft or maybe Apple. But the chances of that are statistically small.
The real trick and talent of designing and architecting software is working out what you will actually need, and designing to that, not going for some fictional "what if" number.
For example, I've been working on a site (with others) which has the potential to have quite spikey traffic. We have had frequent discussions about "what happens if we need to scale up etc".
So far, we've been Fry'ed, as well as Palmoa'ed (is that a term?), as well as various newsletters being sent out (to the order of half a million people, usually). Our single AWS small instance didn't even break sweat, let alone fall over.
So the question I think every architect should be asking when they design a system is this:
Does this really need to be this big? This complex? Is there a better way to do this, where we can scale out, not start big?
Sometimes, the answer is yes - you DO need to be this big, but mostly, the answer is no. Is 500K users/day a feature for first release? Or is 10K users/day good enough for first release (and put 500K/day in the second release backlog)?
Only design for huge if the data supports huge, and preferably NOT data from overly enthusiastic founders/managers/sales people.