Kaviraj       Archive       Tags

Python: Generating pdf containing emojis using reportlab

Understanding Emoji

Emojis πŸ˜ƒ πŸŽ‰ πŸ‘ πŸ’• have become so popular. Anyone who used Whatsapp or Instagram knows how cool emojis are 😎. And they are everywhere now. Emoji originated in Japan. The word β€œemoji” means β€œPicture(e) letter(moji)” in Japanese.

Technically, emojis are subset of Unicode character set. In short, Unicode is the collection of every writable symbol available on the planet. That being said, unicode includes symbols from every language that exist.

There are different types of encoding used to interpret different sets of unicode. Most popular is β€œUTF-8” encoding. To understand how unicode are interpreted, stored and retrieved here is a great article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Every emoji is a single unicode character that can be upto 4 bytes long depending on type of encoding used. Yes, every character is just one byte is no longer true.

Interpreting Emoji

The letter β€˜A’ corresponds to β€˜65’ in decimal and β€˜0x41’ in hex. Similarly emoji πŸ˜ƒ corresponds to β€˜128515’ in decimal and β€˜0x1F603’ in hexadecimal. Though every emoji(unicode) can be represented in decimal or hexadecimal, the standard way to denote emoji is by β€˜code point’.

Code point for πŸ˜ƒ is U+1F603. An emoji(unicode) in code point is represented as β€œU+” (to denote unicode) followed by its hex value β€œ1F603”. In Python it is represented as "\U0001f603" (u"\U0001f603" in python 2). Here is a list of emojis with its code points.

Every emoji have a name. Some apps like Github, Slack, Basecamp, etc, provide short name for emoji to make it easy to remember. You can find the list of emoji with its short name here emoji-cheat-sheet

Interpreting with Python

Get the name and category of an emoji using unicodedata module of python

In [1]: import unicodedata

# in python2 use u'\U0001f603'

In [2]: print('\U0001f603')
πŸ˜ƒ

In [3]: ord('\U0001f608')
128515

# use 'unichr' in python2

In [4]: chr(0x1f603) # hex
πŸ˜ƒ

In [5]: chr(128515) # decimal
πŸ˜ƒ

In [6]: unicodedata.name('\U0001f603') 
'SMILING FACE WITH OPEN MOUTH'

In [7]: unicodedata.category('\U0001f603') 
'So' # S- Symbol, o- Other

For more info on unicode.category https://docs.python.org/2/howto/unicode.html#unicode-properties

Emojis are becoming popular. You happened to see emojis is many web apps and mobile apps. Rendering these emojis inside a browser or an app is another problem.

If you are able to see all colorful emojis I have used in this blog, then your browser supports emojis well. Recently in our company we happend to work on some features related to printing text containing emojis to pdf.

Wait.. But how do we render these emojis in pdf?. The problem boils-down to type of font you are using.

Printing emojis to pdf

We are going to use reportlab python library to generate pdf

sudo pip install reportlab

Note: All the code samples used here are tested using python 3.4

Here is a simple example that writes text containing emoji to pdf

from reportlab.platypus import Paragraph
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.pdfgen import canvas
from reportlab.rl_config import defaultPageSize

width, height = defaultPageSize
pdf_content = "It's emoji time \u263A \U0001F61C. Let's add some cool emotions \U0001F48F \u270C. And some more \u2764 \U0001F436"

styles = getSampleStyleSheet()
para = Paragraph(pdf_content, styles["Title"])
canv = canvas.Canvas('emoji.pdf')

para.wrap(width, height)
para.drawOn(canv, 0, height/2)

canv.save()

Here is the preview of how it looks

What!! this is obviously not we wanted. Unsupported emojis are shown as β€œblack-box” by default font(Helvetica-Bold)

Lets try with custom font that supports emoji symbols.

Using custom font

Symbola font supports most of the emoji symbols.

Simple program that writes emoji content to pdf using custom font(Symbola)

from reportlab.platypus import Paragraph
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.pdfgen import canvas
from reportlab.rl_config import defaultPageSize
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

# Register font 'font_file' is location of symbola.ttf file

font_file = '/home/kaviraj/Downloads/fonts/Symbola_full/Symbola.ttf'
symbola_font = TTFont('Symbola', font_file)
pdfmetrics.registerFont(symbola_font)

width, height = defaultPageSize
pdf_content = "It's emoji time \u263A \U0001F61C. Let's add some cool emotions \U0001F48F \u270C. And some more \u2764 \U0001F436"

styles = getSampleStyleSheet()
styles["Title"].fontName = 'Symbola'
para = Paragraph(pdf_content, styles["Title"])
canv = canvas.Canvas('emoji.pdf')

para.wrap(width, height)
para.drawOn(canv, 0, height/2)

canv.save()

Here is the preview of how it looks with Symbola font.

Cool. Symbola print these emojis as promised. But Wait! can’t we get colorful emoji like we see in browser?. Yes. here comes the Emojione

Intro to Emojione

Emojione provides collection of wide range of emojis(Almost every emojis supported by Instagram, Facebook, Slack, etc..)

Collection includes all the png and svg image files for every emoji they support. Emojione provide following cool features

Any emoji can be coverted from it’s

  • Unicode to corresponding PNG or SVG image
  • Short name to corresponding PNG or SVG image
  • Unicode to short name (and vice versa)
  • Ascii symbol to corresponding PNG or SVG image( say, convert :) to πŸ˜„)

But unfortunately they doesn’t provide any library for python. So we have written one by ourselves πŸ‘ and we call it Emojipy

Inserting emoji using Emojipy and reportlab

Usually any app(web or mobile) render emojis by replacing its unicode with corresponding PNG or SVG image(thats why you could see all the colorful emojis present in this blog).

Lets do the same thing with pdf

Installing emojipy

git clone [email protected]:launchyard/emojipy.git
cd emojipy
sudo python setup.py install

Below is the simple python program that use emojipy library to render emoji in pdf

from reportlab.platypus import Paragraph
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.pdfgen import canvas
from reportlab.rl_config import defaultPageSize
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from emojipy import Emoji
import re

# Pdf doesn't need any unicode inside <image>'s alt attribute
Emoji.unicode_alt = False


def replace_with_emoji_pdf(text, size):
    """
    Reportlab's Paragraph doesn't accept normal html <image> tag's attributes
    like 'class', 'alt'. Its a little hack to remove those attrbs
    """

    text = Emoji.to_image(text)
    text = text.replace('class="emojione"', 'height=%s width=%s' %
                        (size, size))
    return re.sub('alt="'+Emoji.shortcode_regexp+'"', '', text)

# Register font 'font_file' is location of symbola.ttf file

font_file = '/home/kaviraj/Downloads/fonts/Symbola_full/Symbola.ttf'
symbola_font = TTFont('Symbola', font_file)
pdfmetrics.registerFont(symbola_font)

width, height = defaultPageSize
pdf_content = "It's emoji time \u263A \U0001F61C. Let's add some cool emotions \U0001F48F \u270C. And some more \u2764 \U0001F436"

styles = getSampleStyleSheet()
styles["Title"].fontName = 'Symbola'
style = styles["Title"]
content = replace_with_emoji_pdf(Emoji.to_image(pdf_content), style.fontSize)

para = Paragraph(content, style)
canv = canvas.Canvas('emoji.pdf')

para.wrap(width, height)
para.drawOn(canv, 0, height/2)

canv.save()

Here is the preview how it looks with emojipy

Wooo!! This is what we wanted!!

References

If you liked this post, you can share it with your followers or follow me on Twitter!

comments powered by Disqus