The Google Fonts catalog now includes Japanese web fonts. Since shipping Korean in February, we have been working to optimize the font slicing system and extend it to support Japanese. The optimization efforts proved fruitful—Korean users now transfer on average over 30% fewer bytes than our previous best solution. This type of on-going optimization is a major goal of Google Fonts.
Japanese presents many of the same core challenges as Korean:
- Very large character set
- Visually complex letterforms
- A complex writing system: Japanese uses several distinct scripts (explained well by Wikipedia)
- More character interactions: Line layout features (e.g. kerning, positioning, substitution) break when they involve characters that are split across different slices
To begin supporting Japanese, we gathered character frequency data from millions of Japanese webpages and analyzed them to inform how to slice the fonts. Users download only the slices they need for a page, typically avoiding the majority of the font. Over time, as they visit more pages and cache more slices, their experience becomes ever faster. This approach is compatible with many scripts because it is based on observations of real-world usage.
Frequency of the popular Japanese and Korean characters on the web
As shown above, Korean and Japanese have a relatively small set of characters that are used extremely frequently, and a very long tail of rarely used characters. On any given page most of the characters will be from the high frequency part, often with a few rarer characters mixed in.We tried fancier segmentation strategies, but the most performant method for Korean turned out to be simple:
- Put the 2,000 most popular characters in a slice
- Put the next 1,000 most popular characters in another slice
- Sort the remaining characters by Unicode codepoint number and divide them into 100 equally sized slices
- The core features we rely on to efficiently deliver sliced fonts are unicode-range and woff2
- Browsers that support unicode-range and woff2 also support HTTP/2
- HTTP/2 enables the concurrent delivery of many small files
Our analyses of the Japanese and Korean web shows most pages tend to use mostly common characters, plus a few rarer ones. To optimize for this, we tested a variety of finer-grained strategies on the common characters for both languages.
We concluded that the following is the best strategy for Korean, with clients downloading 38% fewer bytes than our previous best strategy:
- Take the 2,000 most popular Korean characters, sort by frequency, and put them into 20 equally sized slices
- Sort the remaining characters by Unicode codepoint number, and divide them into 100 equally sized slices
Now that both Japanese and Korean are live on Google Fonts, we have even more ideas for further optimization—and we will continue to ship updates to make things faster for our users. We are also looking forward to future collaborations with the W3C to develop new web standards and go beyond what is possible with today's technologies (learn more here).
No comments:
Post a Comment