We recently announced the MocoSpace Games platform, the first mobile browser based social gaming platform for smartphones.

The platform is built to take advantage of advanced HTML 5 capable smartphone browsers and leverages our Moco Gold virtual currency system and ad mediation and optimization platform to help developers monetize effectively. The platform uses familiar OpenSocial, REST and JavaScript interfaces to make development and integration quick and easy.

We’ll be releasing additional details on our platform and developer program in the coming weeks, including an SDK to help developers get started quickly. Stay tuned!

Hitwise just published a ranking of the Top 20 Social Networking Websites, where they ranked MocoSpace at #10. Of course, this only accounts for Web traffic – 70% of our traffic is from the mobile Web and apps!

Hitwise Top 20 Social Networking Websites, June 2010

Hitwise Top 20 Social Networking Websites, June 2010

We’re happy to announce the release of our iPhone and Android applications! MocoSpace members can now enjoy the convenience and added benefits of using MocoSpace via an application, in addition to the mobile Website. There are many things that make using the apps especially cool, including direct photo uploads, device notifications of new messages and instant messages, and a very slick interface that makes using the things you love most on MocoSpace even easier and more fun. Plus we’ve got lots more cool things coming up in future updates. Check it out and let us know what you think!

High Scalability recently published a post I wrote, copied below, with some details on the system architecture we’ve developed here at MocoSpace and some of the tools and techniques we’ve employed to scale to 3 billion page views per month.

This is a guest post by Jamie Hall, Co-founder & CTO of MocoSpace, describing the architecture for their mobile social network. This is a timely architecture to learn from as it combines several hot trends: it is very large, mobile, and social. What they think is especially cool about their system is: how it optimizes for device/browser fragmentation on the mobile Web; their multi-tiered, read/write, local/distributed caching system; selecting PostgreSQL over MySQL as a relational DB that can scale.

MocoSpace is a mobile social network, with 12 million members and 3 billion page views a month, which makes it one of the most highly trafficked mobile Websites in the US. Members access the site mainly from their mobile phone Web browser, ranging from high end smartphones to lower end devices, as well as the Web. Activities on the site include customizing profiles, chat, instant messaging, music, sharing photos & videos, games, eCards and blogs. The monetization strategy is focused on advertising, on both the mobile and Websites, as well as a virtual currency system and a handful of premium feature upgrades.

STATS

  1. 3 billion page views a month
  2. Top 4 most trafficked mobile website after MySpace, Facebook and Google (http://www.groundtruth.com/mobile-is-mobile)
  3. 75% mobile Web, 25% Web
  4. 12 million members
  5. 6 million unique visitors a month
  6. 100k concurrent users
  7. 12 million photo uploads a month
  8. 2 million emails received a day, 90% spam, 2.5 million sent a day
  9. Team of 8 developers, 2 QA, 1 sysadmin

PLATFORM

  1. CentOS + Red Hat
  2. Resin application server — Java Servlets, JavaServer Pages, Comet
  3. PostgreSQL
  4. Memcached
  5. ActiveMQ’s job + message queue, in Red Hat cluster for high availability
  6. Squid – static content caching, tried Varnish but had stability issues
  7. JQuery + Ajax
  8. S3 for user photo & video storage (8 TB) and EC2 for photo processing
  9. F5 BigIP load balancers – sticky sessions, gzip compression on all pages
  10. Akamai CDN – 2 TB a day, 250 million requests a day
  11. Monitoring – Nagios for alerts, Zabbix for trending
  12. EMC SAN – high IO performance for databases by RAIDing (RAID 10) lots of disks, replacing with high performance Fusion-io solid-state flash ioDrives, much more cost effective
  13. PowerMTA for mail delivery, Barracuda spam firewalls
  14. Subversion source control, Hudson for continuous integration
  15. FFMPEG for Mobile to Web and Web to mobile video transcoding
  16. Selenium for browser test case automation
  17. Web tier
    1. 5x Dell 1950, 2x dual core, 16G RAM
    2. 5x Dell 6950/R905, 4x dual core, 32G RAM
  18. Database tier
    1. 2x Sun Fire X4600 M2 Server, 8x quad core, 256G RAM
    2. 2x Dell 6950, 4x dual core, 64G RAM

ARCHITECTURE

  1. All pages are dynamic, with user data and customizations as well as many browser and device specific optimizations. Browser and device fragmentation issues are much greater on mobile than on the Web. Many optimizations, adaptations required based on browser capabilities, limited support for CSS/JavaScript, screen size, etc. Mobile Web traffic is often served via network proxies (gateways), with poor support for Cookies, making session management and user identification a challenge.
  2. A big challenge is handling the device/browser fragmentation on the mobile Web – optimizing for a huge range of device capabilities (everything from iPhones with touchscreens to 5 year old Motorola Razrs), screen sizes, lack of / inconsistent Web standards compliance, etc. We abstract out our presentation layer so we can serve pages to all mobile devices from the same code base, and maintain a large device capabilities database (containing things like screen size, supported file types, maximum allowed page sizes, etc) which is used to drive generation of our pages. The database contains capability details for hundreds of devices and mobile browser types.
  3. Database is sharded based on a user key, with a master lookup table mapping users to shards. We rolled our own query and aggregation layer, allowing us to query and join data across shards, though this is not used frequently. With sharding we sacrifice some consistency, but that’s Ok as long as you’re not running a bank. We perform data consistency checks offline, in batches, with the goal being eventual consistency. Large tables are partitioned into smaller sub tables for more efficient access, reducing time tables are locked for updates as well as operational maintenance activities. Log shipping used for warm standbys.
  4. A multi-tiered caching system is used, with data cached locally within the application servers as well as distributed via Memcached. When making an update we don’t just invalidate the cache and then re-populate after reading again from the database, rather we update Memcached with the new data and save another trip to the database. When updating the cache an invalidation directive is sent via the messaging queue to the local caches on each of the application servers.
  5. A distributed message queue is used for distributed server communication, everything from sending messages in realtime between users to system messages such as local cache invalidation directives.
  6. Dedicated server for building and traversing social graph entirely in memory, used to generate friend recommendations, etc.
  7. Load balancer used for rolling deploys of new versions of the site without affecting performance/traffic.
  8. Release every 2 weeks. Longer release cycles = exponential complexity, more difficult to troubleshoot and rollback. Development team responsible for deploying to and managing production systems ¿ ¿you built it, you manage it¿.
  9. Kickstart used to automate server builds from bare metal. Puppet is used to configure a server to a specific personality i.e. Webserver, database server, cache server etc, as well as to push updated policies to running nodes.

LESSONS LEARNED

  1. Make your boxes sweat. Don’t be afraid of high system load as long as response times are acceptable. We pack as many as five application server instances on a single box, each running on a different port. Scale up to the high end of commodity hardware before scaling out. Can pick up used or refurbished powerful 4U boxes for cheap.
  2. Understand where your bottlenecks are in each tier. Are you CPU or IO (disk, network) bound? Database is almost always IO (disk) bound. Once the database doesn’t fit in RAM you hit a wall.
  3. Profile the database religiously. Obsess when it comes to optimizing the database. Scaling Web tier is cheap and easy, database tier is much harder and expensive. Know the top queries on your system, both by execution time and frequency. Track and benchmark top queries with each release, need to catch and address performance issues with the database early. We use the pgFouine log analyzer and new PostgreSQL pg_stat_statements utility for generating profiling snapshots in real-time.
  4. Design to disable. Be able to configure and turn off anything you release instantly, without requiring a code change or deployment. Load and stress testing are important but nothing like testing with live, production traffic via incremental, phased rollouts.
  5. Communicate synchronously only when absolutely necessary. When one component or service fails it shouldn’t bring down other parts of the system. Do everything you can in the background or as a separate thread or process, don’t make the user wait. Update the database in batches wherever possible. Any system making requests outside the network need aggressive monitoring, timeout settings, and failure handling / retries. For example, we found S3 latency and failure rates can be significant, so we queue failed calls and retry later.
  6. Think about monitoring during design, not after. Every component should produce performance, health, throughput, etc data. Set up alerts when these exceed thresholds. Consolidated graphs showing metrics across all instances, rather than just per instance, are particularly helpful for identifying issues and trends at a glance and should be reviewed daily — if normal operating behavior isn’t well understood it’s impossible to identify and respond to what isn’t. We tried many monitoring systems – Cacti, Ganglia, Hyperic, Zabbix, Nagios, as well as some custom built solutions. Whichever you use the most important thing is to be comfortable with it, otherwise you won’t use it enough. It should be easy, using templates, etc to quickly monitor new boxes and services as you throw them up.
  7. Distributed sessions can be a lot of overhead. Go stateless when you can, but when you can’t consider sticky sessions. If the server fails the user loses their state and may need to re-login, but that’s rare and acceptable depending on what you need to do.
  8. Monitor and beware of full/major garbage collection events in Java, which can lock up the whole JVM for up to 30 seconds. We use Concurrent Mark Sweep (CMS) garbage collector, which introduces some additional system overhead, but have been able to eliminate full garbage collections.
  9. When a site gets large enough it becomes a magnet for spammers and hackers, both on site and from outside via email, etc. Captcha and IP monitoring are not enough. Must invest aggressively in detection and containment systems, internal tools to detect suspicious user behavior and alert and/or attempt to automatically contain.
  10. Soft delete whenever possible. Mark data for later deletion, rather than deleting immediately. Deletion can be costly, so queue up for after hours, plus if someone makes a mistake and deletes something they shouldn¿t have it¿s easy to rollback.
  11. N+1 design. Never less than two of anything.

I’d really like to thank Jamie for taking the time write this experience report. It’s a really useful contribution for others to learn from. If you would like to share the architecture for your fablous system, please contact me and we’ll get started.

Very interesting findings just released showing that nearly 60% of the time spent on the mobile Internet in the US is on social networks.

Percent of Time Spent on Mobile Internet Usage by Category
U.S. Mobile Subscribers
Week ending April 4, 2010

Category Percent
Social Networking 59.83%
Portals 13.65%
Operator 9.02%
Messaging 7.35%
Mobile Downloads 1.27%
All Other 8.88%

Source: Ground Truth, Inc. Census of mobile subscribers for the week ending April 4, 2010 (n=3.05 million U.S. mobile subscribers).

What is likely to be very surprising for many people, the data also shows that mobile-centric social networking sites such as MocoSpace are much better at engaging consumers than heavyweights like Facebook and MySpace.

Mobile Social Networking Usage
U.S. Mobile Subscribers
Week ending April 4, 2010

Sessions per
Subscriber
Pages per
Subscriber
Pages per
Session
Time per
Subscriber
Average 68.1 310 4.56 0:52:12
MySpace 57.6 246 4.28 0:40:19
Facebook 56.9 205 3.61 0:30:54
MocoSpace 63.9 476 7.45 1:31:02
FunForMobile 17.4 101 5.83 0:19:50
AirG 58.8 520 8.84 1:31:03
Facebook Photos 18.9 59.7 3.15 0:10:10
Cellufun 13.5 145 10.8 0:23:55
MBuzzy 64.3 359 5.58 1:09:41
MocoSpace Photos 15.7 57.2 3.63 0:12:22
MobaMingle 42 278 6.62 0:47:06

Source: Ground Truth, Inc. Census of mobile subscribers for the week ending April 4, 2010 (n=3.05 million U.S. mobile subscribers).

We’ve been extremely fortunate to be able to bring some great live interviews with some of the most incredible emerging and chart topping musicians to MocoSpace. These include live streaming video and call-ins, with questions from the community. The results have been awesome, with some of the artists adding over 100,000 friends on MocoSpace!

Featured interviews include:

We’ve also recently launched the MocoSpace Music Charts. Top artists currently include:

So far we’ve seen a real jump in traffic across the board this year for MocoSpace, which is great.  However, as we first mentioned in October, what is really interesting is that Android was and continues to grow faster than iPhone in both absolute and relative terms in the MocoSpace community.  Although we like iPhone a lot it’s never fun to be at the mercy of a monopoly, so the good news here is that won’t be the case.

This week brought the announcement that Microsoft is actually rolling out a pretty cool OS for mobile.  Will they surpass iPhone or Android anytime soon? Safe to say no.  Are they a longer term competitor in the space? Absolutely.  What does this all mean? For consumers there will be plenty of choices.  For developers there will be plenty of fragmentation….sorry I meant opportunity.  Will Apple’s innovation and early lead eventually be a eroded and surpassed, ie a repeat of the PC? Too soon to say.

Today our fastest growth is happening on Android and iPhone.  Tomorrow, maybe it’s Microsoft.  Fortunately, I don’t have to pick a winner, or rather I think the winner is mobile.  And that’s the bet MocoSpace will continue to make by delivering a fun, engaging friend finding experience across all handsets iPhone, Android, Microsoft, Blackberry, etc.

Mobile is Social

February 2nd, 2010

Great new post today on the dominance of social networking on mobile from our friends at Ground Truth finding that over 60% of traffic on the mobile Web is on social networks

Opera’s new State of the Mobile Web report for December 2009, with a special section on social networks, shows MocoSpace as one of the top and fastest growing social networks in the US:

Social networking site Growth rate in 2009 (users)
facebook.com 194%
myspace.com 25%
twitter.com -21%
mocospace.com 58%
peperonity.com 3


I’ve been working with the Mobile Web for 10 years now, and I can tell you there are a number of things that those new to this world would find shocking if they were to catch wind of them. A few examples …

First, there’s a recent AP story documenting an account switching incident involving Facebook users on AT&T. The culprit? Mis-configured gateways on AT&T’s network caused users private data, stored as browser cookies, to get mixed up, giving some people access to accounts that don’t belong to them. This is not the first time such a thing has happened, in fact we’ve seen it on nearly every major wireless network in North America. It can take carriers weeks, and in some cases months to identify these situations and address them. On a related subject, cookie support in general can be somewhat sketchy, with things like expiration times not being honored or data just disappearing.

Next we have frequent outages, either full, such as the recent incident involving T-Mobile Sidekick handsets, or partial as seems to happen every so often on Blackberry’s network. Outages on the Mobile Web are far more common, and often far longer in duration, than outages at major wired ISPs, with the carriers often never acknowledging such incidents to their customers or partners.

Then there are the transcoders. Many of the major networks use them in an attempt to make PC Websites accessible on less capable mobile browsers. This sounds great, however they often cause sites which are made for the Mobile Web to render improperly, or worse, make them completely unusable. In some cases the carrier may have a white list of Mobile Web sites that should bypass the transcoder, leaving it to the site owners to submit a request to be included. In others, there may be no white list and so other workarounds must be sought.

In short, those who only develop for the PC Web don’t know how lucky they are :-)