Mapping concepts to colors (terribly) with the Oklab perceptual colospace
by epilys on
TL;DR
What? I wanted a way to semi-automate associating colors to
topic tags stories posted on https://sic.pm.
Why not any color? I was curious to see what was reasonable
within this approach.
What’s the approach? I had no idea about colorspaces or
image processing before I started. My strategy would be to download the
top results of a query using DuckDuckGo image search, calculate their
dominant colors, and then calculate the overall dominant colors. Any
suggestions/corrections are most welcome!
Show me the result:
Oklab
HSL
RGB
[‘0.45’, ‘-0.04’, ‘-0.06’]
[‘200.00’, ‘0.44’, ‘0.32’]
[‘46.00’, ‘92.00’, ‘120.00’]
#2d5b76
Oklab
HSL
RGB
[‘0.60’, ‘0.18’, ‘0.04’]
[‘350.00’, ‘0.66’, ‘0.54’]
[‘220.00’, ‘62.00’, ‘95.00’]
#d73e5f
Oklab
HSL
RGB
[‘0.56’, ‘-0.07’, ‘-0.05’]
[‘190.00’, ‘0.49’, ‘0.39’]
[‘51.00’, ‘130.00’, ‘150.00’]
#338193
Oklab
HSL
RGB
[‘0.45’, ‘-0.02’, ‘-0.16’]
[‘220.00’, ‘0.70’, ‘0.40’]
[‘30.00’, ‘79.00’, ‘180.00’]
#1e4eaf
Oklab
HSL
RGB
[‘0.53’, ‘-0.08’, ‘0.03’]
[‘150.00’, ‘0.36’, ‘0.36’]
[‘59.00’, ‘120.00’, ‘89.00’]
#3a7c59
Oklab
HSL
RGB
[‘0.32’, ‘-0.00’, ‘-0.15’]
[‘230.00’, ‘0.70’, ‘0.29’]
[‘23.00’, ‘34.00’, ‘130.00’]
#16217e
Oklab
HSL
RGB
[‘0.76’, ‘-0.11’, ‘0.11’]
[‘93.00’, ‘0.49’, ‘0.56’]
[‘140.00’, ‘200.00’, ‘88.00’]
#89c658
Oklab
HSL
RGB
[‘0.77’, ‘0.05’, ‘0.11’]
[‘31.00’, ‘0.77’, ‘0.63’]
[‘230.00’, ‘160.00’, ‘86.00’]
#e9a156
Results for “programming”
What’s a
colorspace and what’s “Perceptually uniform”?
A colorspace is basically a way to model colors to
attributes. The well known RGB colorspace maps colors to Red,
Green and Blue.
If that space has three attributes, we can view them as coordinates
on a 3D space (Any n attributes can be viewed as an
n-dimensional vector space). Then we define color distance as
the usual Euclidean distance we use for tangible stuff in the real
world.
A uniformly perceptual colorspace aims to have the following
identity: “identical spatial distance between two colors equals
identical amount of perceived color difference”. The actual definitions
of those terms can be found in color science books and research.
Oklab is a
perceptual color space designed by Björn Ottosson to make working with
colors in image processing easier. After reading the introductory blog
post, I wondered if I could apply it to finding dominant colors of an
image.
Oklab has three coordinates:
L perceived lightness
a how green/red the color is
b how blue/yellow the color is
Uniformly sampling the Oklab colorspace in 8 parts per coordinate.
Uniformly sampling the Oklab colorspace in 16 parts per coordinate.
Dominant colors
I guess we would take an image and average all colors. What would
that produce?
#d2c6b6
Terrible. Obviously the approach can’t work with multiple colors
apparent in a picture. If the picture was mostly one color it’d be
somewhat useful:
#94706f
k-means clustering
From signal processing comes this dazzling technique: Given a set of
colors c, partition them to k buckets as follows:
Initially assign k average
colors somehow. You can pick them randomly for example. We will
incrementally improve on those averages to arrive to a centroid
color, or the mean (average) color of a cluster.
Assign every color c to the average closest to it
mκ by calculating Euclidean distances to each
m.
Recalculate mκ as the average of the updated
cluster κ.
Repeat until assignments are the same as the previous step; we’ve
reached convergence which is not necessarily correct/optimal.
Since we will use a perceptually uniform colorspace, we expect each
cluster to be perceivably close to the actual colors it contains.
And since we will be working with lots of sample images, we can
calculate the overall dominant colors by putting all the colors
together.
Implementation
To visualize the results, I chose to calculate the dominant colors
for each image, then calculate the overall dominant colors from
those.
I also uniformly split the Oklab colorspace into colors and clustered
all the dominant colors again, in order to see the difference of the
calculated dominant colors and the uniformly sampled ones:
Uniformly partitioning dominant colors.
The image results for most queries are stock photos or text, hence
there is a lot of black and white. We can deduce how black or greyscale
looking is a color by looking at its coordinates. In Oklab, the
a, b coordinates will be close to zero. In
HSL (Hue-Saturation-Lightness) a low L value means the
color is close to black. We can discard such colors by checking those
values.
Results
Searching for non abstract things such as fruits returns pictures of
the things themselves so we get good results:
Oklab
HSL
RGB
[‘0.74’, ‘-0.01’, ‘0.12’]
[‘48.00’, ‘0.52’, ‘0.52’]
[‘200.00’, ‘170.00’, ‘70.00’]
#c4ab46
Oklab
HSL
RGB
[‘0.38’, ‘-0.02’, ‘0.05’]
[‘59.00’, ‘0.38’, ‘0.20’]
[‘70.00’, ‘69.00’, ‘31.00’]
#46451f
Oklab
HSL
RGB
[‘0.89’, ‘-0.04’, ‘0.12’]
[‘60.00’, ‘0.63’, ‘0.69’]
[‘230.00’, ‘230.00’, ‘130.00’]
#e1e27f
Oklab
HSL
RGB
[‘0.22’, ‘-0.00’, ‘0.02’]
[‘49.00’, ‘0.39’, ‘0.08’]
[‘29.00’, ‘26.00’, ‘13.00’]
#1c190c
Oklab
HSL
RGB
[‘0.62’, ‘-0.00’, ‘0.10’]
[‘46.00’, ‘0.51’, ‘0.41’]
[‘160.00’, ‘130.00’, ‘51.00’]
#9d8433
Oklab
HSL
RGB
[‘0.52’, ‘0.00’, ‘0.08’]
[‘42.00’, ‘0.42’, ‘0.34’]
[‘120.00’, ‘100.00’, ‘51.00’]
#7c6732
Results for “banana” (is that an impostor?)
Searching for pharmaceuticals returns lots of pictures of
colorful pills:
Oklab
HSL
RGB
[‘0.59’, ‘-0.04’, ‘-0.08’]
[‘210.00’, ‘0.39’, ‘0.49’]
[‘77.00’, ‘130.00’, ‘170.00’]
#4d81ae
Oklab
HSL
RGB
[‘0.45’, ‘-0.01’, ‘-0.01’]
[‘190.00’, ‘0.08’, ‘0.33’]
[‘78.00’, ‘88.00’, ‘91.00’]
#4d585a
Oklab
HSL
RGB
[‘0.43’, ‘-0.02’, ‘-0.15’]
[‘220.00’, ‘0.66’, ‘0.37’]
[‘33.00’, ‘72.00’, ‘160.00’]
#20479d
Oklab
HSL
RGB
[‘0.84’, ‘-0.02’, ‘-0.05’]
[‘210.00’, ‘0.59’, ‘0.81’]
[‘180.00’, ‘210.00’, ‘230.00’]
#b1cfea
Oklab
HSL
RGB
[‘0.54’, ‘0.15’, ‘0.07’]
[‘360.00’, ‘0.50’, ‘0.49’]
[‘190.00’, ‘61.00’, ‘62.00’]
#ba3d3e
Oklab
HSL
RGB
[‘0.58’, ‘-0.10’, ‘0.05’]
[‘140.00’, ‘0.36’, ‘0.40’]
[‘66.00’, ‘140.00’, ‘88.00’]
#418b58
Results for “pharmaceuticals”
Searching for ethics returns pictures of signs that point to
stuff such as “Right” and “Wrong” and “Principles”:
Oklab
HSL
RGB
[‘0.74’, ‘-0.07’, ‘0.11’]
[‘79.00’, ‘0.40’, ‘0.53’]
[‘150.00’, ‘180.00’, ‘88.00’]
#99b757
Oklab
HSL
RGB
[‘0.71’, ‘-0.04’, ‘-0.06’]
[‘200.00’, ‘0.43’, ‘0.62’]
[‘120.00’, ‘170.00’, ‘200.00’]
#76a6c8
Oklab
HSL
RGB
[‘0.95’, ‘-0.00’, ‘0.01’]
[‘64.00’, ‘0.16’, ‘0.92’]
[‘240.00’, ‘240.00’, ‘230.00’]
#edede6
Oklab
HSL
RGB
[‘0.68’, ‘0.02’, ‘0.02’]
[‘20.00’, ‘0.17’, ‘0.60’]
[‘170.00’, ‘150.00’, ‘140.00’]
#a99387
Results for “ethics”
Searching for design returns a boring sea of brown and beige
thanks to interior design trends:
Oklab
HSL
RGB
[‘0.64’, <‘0.01’, ‘0.02’]
[‘34.00’, ‘0.10’, ‘0.53’]
[‘150.00’, ‘140.00’, ‘120.00’]
#93887b
Oklab
HSL
RGB
[‘0.79’, ‘0.01’, ‘0.02’]
[‘35.00’, ‘0.17’, ‘0.71’]
[‘190.00’, ‘180.00’, ‘170.00’]
#c2b7a9
Oklab
HSL
RGB
[‘0.84’, ‘0.00’, ‘0.01’]
[‘43.00’, ‘0.08’, ‘0.78’]
[‘200.00’, ‘200.00’, ‘190.00’]
#cbc8c2
Oklab
HSL
RGB
[‘0.53’, ‘0.01’, ‘0.02’]
[‘28.00’, ‘0.12’, ‘0.41’]
[‘120.00’, ‘100.00’, ‘92.00’]
#74675c
Results for “design”
Searching for programming identifies the classic green
terminal color along with other syntax highlighting palettes:
Oklab
HSL
RGB
[‘0.60’, ‘0.18’, ‘0.04’]
[‘350.00’, ‘0.66’, ‘0.54’]
[‘220.00’, ‘62.00’, ‘95.00’]
#d73e5f
Oklab
HSL
RGB
[‘0.45’, ‘-0.02’, ‘-0.16’]
[‘220.00’, ‘0.70’, ‘0.40’]
[‘30.00’, ‘79.00’, ‘180.00’]
#1e4eaf
Oklab
HSL
RGB
[‘0.76’, ‘-0.11’, ‘0.11’]
[‘93.00’, ‘0.49’, ‘0.56’]
[‘140.00’, ‘200.00’, ‘88.00’]
#89c658
Oklab
HSL
RGB
[‘0.77’, ‘0.05’, ‘0.11’]
[‘31.00’, ‘0.77’, ‘0.63’]
[‘230.00’, ‘160.00’, ‘86.00’]
#e9a156
Results for “programming”
Finally, philosophy returns pictures of books and statues,
so the results are predictable and omitted:
Results for “philosophy”
Improving the sample source
I’ve had some luck getting “better” results by searching for “book
about {query}” and “book about {query} cover” expecting topical books to
share color schemes, like the distinctive palettes O’Reilly uses in its
programming books.
I found Google Images to show less junk results but they have no API
you can use without an account.
Conclusions and notes
As expected, this doesn’t produce particularly mind blowing results
since abstract concepts lack color association in general. Even if you
have any type of vision synesthesia, the colors you see are usually
unique for each person.
To get back to the original motivation behind this experiment, which
was associating post tags with colors: you can achieve this by
clustering existing colors and for each new tag calculate dominant
colors, and choose one that belongs to the smallest cluster. That way
you can avoid common colors like black/white/blue/orange saturating your
tag cloud.
Sample code
import decimalimport itertoolsfrom wand.image import Imageimport numpy as npfrom scipy.cluster.vq import vq, kmeansimport coloriowand_color_to_arr =lambda c: np.array([c.red_int8, c.green_int8, c.blue_int8])OKLAB = colorio.cs.OKLAB()color_abs =lambda v: 0xFFif v >0xFFelse v if v >=0else0oklab_to_rgb255 =lambda o: OKLAB.to_rgb255(o)rgb_to_hex =lambda rgb: "#%s"%"".join(("%02x"% p for p in rgb))oklab_to_hex =lambda o: rgb_to_hex(map(color_abs, map(int, oklab_to_rgb255(o))))dec_ctx = decimal.Context(prec=2, rounding=decimal.ROUND_HALF_DOWN)arr_display =lambda arr: ["%.2f"% dec_ctx.create_decimal_from_float(i) for i in arr]def image_to_colors(img: Image): img.thumbnail(200, 200) colors =set(c for row in img for c in row) ret = []for c in colors: ret.append(OKLAB.from_rgb255(wand_color_to_arr(c)))return retclass Bucket:def__init__(self, rep):self.rep = repself.colors = []def__len__(self):returnlen(self.colors)def append(self, color):self.colors.append(color)def dominant_colors(oks, n=20): _r, _ = kmeans(oks, min(n, len(oks)))# sort dominant colors by cluster size buckets = [Bucket(rep) for rep in _r] _s, _ = vq(oks, _r)for idx, c inenumerate(oks): bucket_idx = _s[idx] buckets[bucket_idx].append(c) buckets.sort(key=lambda b: len(b), reverse=True)return [b.rep for b in buckets]def make_uniform_clusters(oks, n=20):def make_grid(n=20): code_steps = np.linspace(-1.0, 1.0, num=n)returnlist(itertools.product(code_steps, code_steps, code_steps)) prod = make_grid(n) buckets = [Bucket(rep) for rep in prod] _r, _ = vq(oks, prod)for idx, c inenumerate(oks): bucket_idx = _r[idx] buckets[bucket_idx].append(c) buckets.sort(key=lambda b: len(b), reverse=True)return buckets