Bo Xuanyi,
Hi everyone, welcome to the talk.
My name is Bo.
I'm going to talk about notification
volume optimization at Pinterest.
So the first,
what's the problem we're trying to solve?
Why do we even optimize notification volume?
So at Pinterest,
notification drives a significant amount of user engagement,
like DAU and WAU.
So the question becomes,
what's the right frequency we send to users?
And then basically there are two sides.
If we send too few notifications,
an engagement metric will hurt.
Like DAU, WAU, they will all drop.
On the other hand, if we send too frequently,
first it will hurt user experience,
and also it will actually hurt long-term metrics of the site.
Because, you know, people will get angry.
They will unsubscribe your emails.
They will delete your app.
You know,
Gmail,
like several other email vendors,
they will penalize your reputation.
Like they will have a higher probability
to put your email into spam folder
so that your users could not even see those emails.
So all these consequences can be caused by sending too frequently.
And also it will hurt your brand.
Like in the...
users' mind.
So the key to solve this problem would be to
use machine learning to do more personalization.
And based on whether the user actually liked the email,
or whether you really need to send
the email or notification to a user
to decide the right frequency for each user.
So this problem has been...
We have been working on this.
And before I joined the company,
we had the first version of machine
learning based volume control system.
So before that was even developed,
we have basically no machine learning.
And this is the
first version using machine learning to control frequency.
And when it was shipped,
I think early 2016,
it has huge impact.
What it does is basically...
It's something called dynamic cooldown.
So I will explain in details.
So basically we have many different
types of recommendation at Pinterest.
Like some are popular pins,
some are like the boards you recently followed,
etc.
So basically we can train machine learning models,
like a CTR prediction model,
to predict what's the probability that each user click...
Different type of emails.
Then basically for each type,
and we can compute the user's CTR for that email type.
And then we can calculate the CTR's percentile among all users.
And then we can decide
something called cooldown days based on the CTR percentile.
The definition of a cooldown is that,
say,
you send this email type today,
you cannot send the same email type in the next couple of days.
You can send the same email type in the next K days.
The general idea is that if this user has higher CTR,
we'll make the cooldown shorter so
that we can send that more frequently.
So this idea all makes sense.
So basically every day we use these CTR models to rank
all the emails that are not in the cooldown period.
And we pick the highest one to send.
Which is
quite reasonable.
Then after we get V1,
we've been using it for a couple months.
And then we start to realize some of the issues.
The first one is that actually there's no direct
control of the total volume a user gets in a period,
like a week.
This has several consequences.
For example,
it's very difficult to do rigorous experiments.
So at Pinterest we always try to
improve the content or improve the recommendation algorithms.
And a lot of these experiments
could result in increased coverage or volume of certain email types.
And then if we want to do experiments,
the question comes,
say you control an Enable group and
your Enable group has increased volume.
You really don't know.
Then you see,
of course,
typically if you increase volume from notification,
you will see some metric gain,
like weekly accurate user gains.
But it's very difficult for you to know that
whether those gains come from the same time period.
Does it come from just the volume increase?
Or does it actually come from quality improvements?
So if you want to be more rigorous
to design these kinds of experiments,
ideally you want to say
that for the control group,
if the Enable group has higher volume,
we want to try to match the volume for the control group.
For example,
we can use the actual volume to send
the best available existing email.
So that control Enables have the same volume that we
can really tell whether the gain comes from quality.
And also,
it's very difficult to iterate on ranking models.
Sometimes we update our ranking model.
Because the cooldowns based on the CTR percentile,
we always have different volume changes.
So it's also difficult to judge whether
your new models are better or not.
And also,
it's very difficult to add new email types.
Once we add new email types,
we will increase volume.
So basically,
let's come to the second point.
The second problem is that all these
cooldown rules are difficult to optimize.
And our objective function is unclear.
For example,
why we set 14 days cooldown instead of 12 days.
It's not easy to
automatically optimize those parameters.
And also,
some email types are generally better globally.
So as a result,
they should be sent more often globally.
But if we want to even have different
parameters for different email types,
it's more complicated,
which we don't want to do.
Then we decide,
OK,
we may need to just rewrite the system with a better mechanism.
The objective for this V2 system,
we call it the budget-based system,
is that we really want to have a direct control of total
email volume or notification volume for each user in a week.
Basically, we decide a weekly budget.
So this basically can help us improve experiments really well.
And it can allow us to easily add new emails.
It can allow us to easily test new ideas to
optimize volume.
And
we also want to have some clear object
functions that we want to optimize.
Then we can basically use machine
learning to directly optimize for it.
So it's going to be automatic.
There's no manual process for tuning parameters.
So the system looks as follows.
We have a notification service running
every day to process all users.
When a user comes in,
basically there's a module to something called Budget Pacer.
What the Budget Pacer does is every week,
the start of the week,
it will load the user's
budget or volume from a store.
And it will try to pace that volume to every day.
So the
simplest pacing algorithm could be like
even pacing,
meaning if you have three a week,
you will pace it to Monday,
Thursday,
or Sunday or something,
or Saturday.
Then the user comes in,
it will ask the pacer whether
today we have budget to send.
It's more like an ad system.
The budget is really not money,
but it's the number of
notifications we can send.
So if we do not have budget, then we don't send.
That's easy.
If we have budget,
then we go to the ranker to rank all the
available emails and pick the best to send.
And then there's an offline component that takes the
emails and takes the additional signal from the users.
Basically to learn the machine learning models.
And it uses the model to do the scoring,
the budget scoring,
every
couple of days.
So to compute the volume for each user
and publish that volume to the store.
So that's basically how things work right now.
So basically what's the object function we want to use in
our machine learning models?
The first question is what's the object we want to optimize for?
Let's say I want to directly optimize for site engagement,
like data active user.
Then the simplest version.
So what does this mean?
This means that
for a fixed number of notifications,
you really want to directly optimize for data active user.
Then this actually means that if the user
is too active on your site organically,
you probably don't want to spend your money on them.
Because even if you don't send notifications,
they come to the site anyway.
Why bother them?
So the simplest version of this kind of
modeling is that we can have some user.
We can have some utility score to capture what's
really the utility of sending a notification.
And one simple version could be like one minus
the probability they come to the site organically
times the probability they click a notification.
So this basically assumes that your notification
channels and your organic channel are independent.
And all these probabilities can be learned
by your machine learning models.
You can train a CTR model,
all your email clicks.
You can train another model to
predict how likely a user will come to the site organically.
So we have two plots here.
So the left plot,
the x-axis is basically the activity level of the users.
Basically the number of days they're active in the past 28 days.
And the y-axis is the CTR of emails.
So generally you can see that when users
come to the site more often,
they're more likely to click email because they love Pinterest.
They want to click everything.
They want to click everything that we recommend.
On the right side,
so basically if you use this simple formula,
the x-axis is still the activity level,
but the y-axis is this utility score.
You observe that the users are on the far right,
meaning they're very active already.
Their score is kind of low.
But on the
extreme left side,
meaning the users don't even come to the site,
their score is also not the highest.
The users that have the highest score are among
the casual region where they're kind of active
but not too active.
They're more like 4 to 15 days active.
So those are the users you really need to
use notification to get them more engaged.
That gives you the best ROI.
So this visual is a good start.
I guess if you want to start with something,
you could use this.
There are several issues with the visual score.
The fundamental one is that it's making some too simple
independent assumptions on the independent of the channels.
Sometimes it kind of makes sense,
but actually your notification channel and your organic channel,
they're quite dependent.
So as a result,
this score kind of penalizes too much on very active users.
So actually for those users,
a notification channel can still help
drive them to come back to the site.
And also,
if you want to use this to decide a weekly budget,
meaning how many you want to send in a week,
then the question becomes,
actually every send in the week,
they are not independent as well.
So if you send one and you send a second,
the second send doesn't have the same probability
to drive users to come back to the site.
So basically,
we want to further improve it,
and we did V1.
In V1, we basically want to
learn a full machine learning model to
predict your target metrics,
or we can call them rewards.
These rewards can be pretty flexible,
like it could be DEU,
it could be you want to optimize your revenue,
you can do something there.
And however,
what the machine learning model does is that
it's kind of really learning a function.
The function takes a parameter of the user.
And it takes a parameter of the number
of notifications you want to send.
And the label or prediction of the
function is the reward that you define.
And after you are able to train this kind of machine learning model,
then you can actually calculate the incremental
value of every additional email sent.
And basically,
you can select the most valuable ones among all users to send them.
So basically,
with this, we can do that.
And with this,
you are able to find out the volume for
every user.
And for the features of this machine learning model,
historical site organic visitation rates are very important.
And the email or
push notification CTR is important.
One interesting thing to note is that, like,
Android iOS actually have very different CTR for push notification.
So it's important to separate them into two features.
And also,
you want to calculate this kind of feature
in different time windows to capture,
like,
the more recent behaviors versus the more longer behaviors.
And also,
it's important to use the user's signup age,
like,
whether this user just recently signed
up or there has been a long veteran user.
So this will allow the model to have
different prediction for different users.
So when we observe,
what the model does is that for really new users,
we really decrease the volume in the first
couple of days compared with the control.
So that because when users just sign up,
we really don't have good signals to generate good recommendation.
But overall,
once the user gradually gets more signals,
we gradually increase the volume so that the long term,
again,
is high.
And also,
we can have other site engagement features,
like number of pins, number of requests. So that's basically capture how likely the
users are engaging with your site.
And some user demographic feature,
like country,
gender,
age,
et cetera.
So here are some results compared with the control group.
We actually treat our user into three separate groups.
Some users are, they only enable their email.
Some are only enable their push.
Some are enable both channels.
Overall,
we can significantly decrease the notification volume overall.
And also dramatically increase the CTR in both email and push.
And that also drive,
like,
site engagement metric at the EU.
So basically,
we are able to,
like,
email,
we drive 15%,
increase CTR by 20%.
Push, we decrease 5%.
Increase CTR by 17%.
And DAO, site wide DAO is up, like, around 2%.
And that's actually
10% of DAO driven by total,
like,
all notifications,
which is very significant.
And also,
another interesting anecdotal thing that happened
is that because we treat Android and push,
Android iOS in different signals,
in the end,
we end up having much higher impact on Android users.
So when we ship this experiment,
it actually helped Pinterest,
Android,
and DAO surpass the iOS for the first time in history,
which is very interesting.
So this analysis,
so on the left are the email volume in the control group.
The blue plots are the volume for the
control group for different activity levels.
And the red ones are the optimized volume.
And these two plots have the overall
same average volume.
So they're more comparable.
So you can see that with the optimized version,
we tend
to send more to the people that are not, like,
core users.
More like, but still not extreme dormant users.
Like, on the far left,
which a significant amount of users that are extremely dormant,
we actually drop their volume.
But for the people that are kind of casual,
we increase the volume.
So this means that the new algorithm makes your,
each notification more,
have higher utility to
a final metric.
On the right side,
we basically,
this is more like from a user experience perspective,
we look at the X axis is the email open rates,
or CTR,
of the users.
And the Y axis are the volume for the
optimized version and the control version.
So we're also able to see that for
people who generally like to click email,
they receive,
end up receiving a little bit more.
And the people on the far left,
like,
close to zero,
meaning that they never open email,
they even receive much less than CTR.
So these two plots generally show that,
the left shows that the new system is gonna,
is basically able to
make the notification more,
each send more valuable to the site.
The, on the right plot shows that,
even from the user perspective,
it feels more personal.
So,
conclusions,
we recently did this V2 volume control system
by doing some global optimization with machine learning.
So we are able to directly control the
total notification volume a user gets.
We end up reducing a lot of volumes
and increase CTR and user experience.
And we improve the notification utility to start a wide engagement.
And this also help us improve experiments'
rigor and engineering efficiency.
And we make it easy to test further ideas.
This is the collaboration project.
College of Management,
Kirito,
Burkai,
which are both engineering on our team,
and John,
That's all.
Thank you.
Any questions?
Yes.
So when you say you have direct control over the total volume,
is that a reference to engineering implementation,
or the fact that,
like,
the marketing board might end up with some
user listeners and emails on their own cadence?
So this is more like the total number
of notifications a user gets every week.
So does Pinterest have a centralized system where
every single email from any part of the board,
from any organization to any user,
is counted?
Actually,
in this project,
we only consider the recommendation-related emails.
But for some additional email,
they're outside the bucket.
Like, the legal-related emails.
The legal-related emails, they have to be sent.
So we don't use it.
But
our plan is to use this system to control more,
like,
the recommendation-based emails.
But for other kinds, they should.
If you have a chat with your friends,
those emails or notifications,
you should always get them.
So, yeah.
Yes.
When you measure the success of push notifications,
is it just like CTR,
or do you have some metrics after,
like,
measuring the push notifications?
The push,
yeah,
generally CTR and also,
like,
our top-line metric,
like,
DAU,
WAU,
and stuff.
When I do follow,
like,
any of the metrics,
like,
if the user is following some activity,
like,
things that are posted,
those scrolls,
how often do you have a push?
Yeah, well, sometimes we look at that.
But overall, we just look at top-line metric.
And also, like,
whether the user deleted the app.
It's also,
like,
if you send too many,
they will just delete.
Yeah.
Yeah.
Okay.
Last one.
No?
Good.