My uh, well first off all, welcome to my speech today.
I'm Jared Scheib. Shout out to all my Paperspace Party Parrots, Papernauts, Paperweights, Pizza Pirates, Papayas, and Pineapples. We did a fun census recently, and that was how folks identified.
I was hired as a Senior Software Engineer at Paperspace, starting almost a year ago, and built out GitHub OAuth (1-click sign up & sign in)
with the support of my team, as well a lot of the API for the latest major iteration of Gradient, the primary product I work on.
After a few months, I became a PM, and now -- we're not always big on definition at Paperspace -- I consider myself a Product Manager.
I've just released my latest major product effort to market: the new Gradient Community Notebook, which I will use in Beta today in this demo and which might just debut as Generally Available today. :)
I want to thank all the great minds who work at and who have worked at Paperspace, before me and with me, for creating and producing the technology that allows me to do this work that I'll share, which I find fascinating and profound.
And I want to thank also my good friend and collaborator Michael De Sa, who paired with me every step of the way to teach me, push me, and figure out and make this deep fake stuff happen with me. The notebook script is me and Michael's work.
And frankly, I don't think we say it enough, and maybe it's the mushroom talking, but I want to thank all whom I've had the honor of meeting, knowing, challenging, collaborating with, and learning from, in my life, who have facilitated and stewarded me to this moment, including thanks to myself, every moment.
I didn't really know what to expect for this talk at AI TechWorld 2019 here in San Jose, California. This was called a conference session, so... I appreciate everyone here showing up, including yourself.
My talk today is called Deep Fakes in Public. And, it's a very controversial idea out there, right now.
The question of identity, with regards to remixing someone's face, remixing reality, remixing the past, in a video...
to be someone else, to be transformed into a different, specific person, someone new, some species new, some other aspect different... anything.
And to look real. And to possibly be undetectably transformed, such that the new creation itself looks convincingly real, and may then appear to be reality.
That type of technology has a lot of potential power.. because of how we represent ourselves in society,
And because of how people are represented in the narratives that we tell about each other..
How truth is shaped, distorted, manipulated;
Or how stories are crafted, transformed, expressed;
So like any technology, it's powerful.
So today, I want to use it in – I think – a compelling way,
Which is to play with two pieces of art, and comedy.
Employing the deepfake technique in video by training a machine learning,
That already knows how to identify faces, how to identify 2 specific faces, that I will train it on;
and then to convert, or transform, in another video, the one person's face for the other, that it was trained on.
Now mind you, this can be done in real-time, like there's already research out there that uses a very beefy,
multinode experiment, wherein they have multiple GPUs that power the real-time conversion
Of one face into another, in real-time video... which is pretty wild, if you think about it...
Yeah, so I'm not going to do that tonight, though what I'm going to share can be refactored into doing that, probably...
Well, it almost certainly can.
And yeah, I'm just gonna walk everyone through actually how you can do that yourself,
I'm gonna show you how I did it,
And we're gonna do it..
In a way that's going to generate very poor results, but then I will revert to a model that
I have previously trained that did decently at another Faceswap task.
Then I'll show a couple of very successful deep-fakes from a couple of artists that Paperspace powers
to do their work, that you might've seen: Dr. Fakenstein and Ctrl + Shift + Face.
So first, I'm going to open up a notebook. A notebook is a very fundamental component of one technique for performing machine learning that many researchers and scientists and experimenters and mathematicians and statisticians who are experimenting with machine learning in general, first put down their ideas and learn to create a neural network artchitecture that can learn to perform some particular prediction problem: you know, is that a dog? is that a cat? something like that... ratattat. :) ... and iterate on their machine learning model, that neural network architecture, in order to produce the model and kind of almost run it scientifically, as an experiment, that you can actually study and improve that model as you iterate on it, by observing it, in this kind of.. petri dish, if you will.. which you can manipulate, which is your Jupyter notebook. You can manipulate the DNA of it through your code. So I will walk you throguh the program that I wrote to perform all of these operations.
So I'm opening a Jupyter notebook right now, and this is actually a Gradient Community Notebook on Paperspace,
It's a piece of technology, it' sa tool like any other piece of software, it's an app. It's just an app. Like an app on your phone. It's just an app.
And with this app, I can connect to a remote server that is hosted by Paperspace, and just pass
Paperspace the code, and have the machine learning model.. trained.. observed.. refined.. tested, validated.. iterated upon, and...
have all of your machine learning done for you, given some additional variables, some parameters, you pass into it,
A machine learning model is an arrangement of artificial networks, that communicate with each other
in different ways: different types of protocols, and whatever, in programming,
and converge in such a way as to perform the.. prediction task, or the creative, generative task..
That is requested by the programmer, or the software engineer who's writing the infrastructure necessary
To take advantage of this model once it has been trained and deployed into the wild.
They can communicate to this machine learning model with automation and say, hey..
"what do you think this is? what do you think this is? and what do you think this is?"
and let that happen auto-magically, if you will.
And so, that's the, that's the Jupyter notebook right here.
So I'll just walk you through the code right here.
This is a... how many people in this room consider themselves... programmers? Ok, cool.
And how many people consider themselves machine learning engineers? All right.
Or who practice machine learning?
(I want to be both inclusive and specific, definitive, if you will.)
So this is the notebook. This is a set of... these 3 constructs right here are like 3 books,
that you say, oh, what was the definition of that? Oh ok, and you say, what is the source
of the first video I want to train my model on? Ok, here's that video, like I would visit on YouTube.
Ok, and I want my first model to be
I want my first subject that I train the model to identify, to be Dave Chappelle.
And I want the second person that I'm going to train to be, ok you can see here that I'm putting
YouTube, video url, and you can see also that I'm putting the timecode, of the segment of the video
That I want to train my model on, like, look at _this_ footage and learn what this person look likes because it's really great footage of their face.
This is a live Python interpreting environment, so you can _script_ on the web.
So basically you can dynamically create apps, online, using a Python notebook, and watch it
Do its thing, and actually be already on the web, and you can actually train your machine learning model
inside this Python notebook. So what you end up having is this self-contained machine learning application,
that's the power of a notebook.
So here, I'm taking these 2 videos, I'm getting these 2 clips from these videos.
This is Dave Chappelle, this is Prince.
I'm gonna swap Dave Chappelle and Prince in Charlie Murphy's True Hollywood Stories. You know, the whole, "I'm Rick James, bitch!"
So we're taking a clip from that show, one where he impersonates Prince.
Charlie Murphy, Eddie Murphy's older brother, walks into a club and sees Prince dancing with a posse,
and Dave Chappelle has his hands coming up like this, and he reveals himself to be Prince.
I'm not going to play it right now, but I recommend everybody go watch it -- it's really funny.
So anyhoozlestein, I take this 30-second clip of Dave Chappelle, this 30-second clip of Prince,
and then _this clip_, which is the one where he impersonates Prince, and I want to change it to where
it was Dave Chappelle in the original clip, to actually be prince, instead of just impersonating Prince.
We're going to manipulate a way that the video was told.
So then you can start to see the power, and the potential terror, of this technology -- should I be saying that part? -- but it's true.
Because you could imagine, potentially, people manipulating in real-time the news that others are perceiving about reality.
So that's what we're gonna do.
Except we're going to do it about a video "in the past", which is Charlie Murphy's True Hollywood Stories.
So here's us downloading the YouTube clips, and cutting them into sections.
You can see we're using this program called `youtube-dl` -- it's like a utility for downloading the videos from YouTube,
and we are invoking the location of those videos online, we're passing them in as arguments, as parameters,
to this function, effectively, and you can see it downloading now with the percentage going by, and I can play you that video.
And this `ffmpeg` line is basically using another tool called `ffmpeg` -- you might've heard of "mpeg", it's a movie file format, like mp4 --
so you get this tool, ffmpeg, and you say, "start at this time code, and then for that whole segment, for this duration, give me that as a clip."
Ok, so now we have our two clips. And, normally, you'd want to train on a lot more clips, from different videos...
that's the most recent thing we are trying to update this software to be able to do more effectively, is take clips from different videos, or multiple clips from any number of videos,
but for now this is fixed, for the sake of demonstration at one video clip from each raw video, for the sake of simplicity to prove the concept and demonstration purposes,
but that will decrease the accuracy of the model you're going to train.
So you want more diversity in the data that you're ingesting for your machine learning training that's coming up.
So anyway, but for the sake of demonstration, I'm just doing these two, just 2 input clips to train the FaceSwap model.
So we here, are then, using Python -- we're saying, "hey, Python,"" you know, this language, this engine that runs this language,
"execute faceswap.py." FaceSwap is this open source library that someone published on GitHub. It's a freely available model architecture.
"and train this faceswap model, and we could go look at the code to look at what sub-models compose this module that we're using,"
but it's some combination of Convolutional Neural Networks, basically, maybe with some feature engineering, I don't know,
to recognize faces in an image, and I could walk folks through that in a minute, how that works.
But then to also to then be able to say, ok here's where that face was, you know, within this grid that is all the pixels of the image.
The face was starting here, where X was 30, and Y is -40, and that's the left cheekbone...
It creates what it calls an "alignments" file, which is another file on your computer, this JSON file on your computer when I ran that,
And then, basically, we train the model on both these people's faces, and it has a memory of one and the other,
And then we train it to transform from when it sees a Dave Chappelle face, instead put Prince's face,
based on that training where it learned to recognize each of their faces,
And use that, what you learned, to generate using, presumably, a generative network,
which means, basically, it creates things instead of classifies things,
which are actually kind of the inverse of one another, apparently, if I understood correctly from the fast.ai course by Jeremy Howard,
you know, like if you can tell that's a house, then you can just as easily be like "imagine a house" or "paint a house" or "generate a house",
you know, you can generate the thing you think it looks like, which you also are perceiving and have learned to understand what it is and make classifications about it:
You know, oh, that's a house, I just recognized a house because I learned what a "house" is, as a human.
So, anyway, you train on what do these two faces look like, you figure out what are the alignments in the target video you want to convert,
and then the FaceSwap library will generate a transformation, using the essence-of-Prince-face,
generate Prince's face onto the alignments on this other one.
And so, it can do that, because it's also learned what Prince's face looks like in all these different rotations, you know?
And, dream up a Prince in that alignment there, with that signature of that emotional profile of what you saw in Dave Chappelle's face, ya deep learnin' model, and, it does.
So that step is where you see this converge, because you say hey, take this clip where you actually have Dave Chappelle's face in the video,
and generate onto it using that stuff I just talked about, Prince's face instead.
And then, here's the video that it generated, you know, because it took all of those images and recombined them into a video.
So there you have it.
Here's the original video of Dave Chappelle.
And here's the one where we transposed Prince onto it.
If we had more diverse input footage, like more of Dave Chappelle's and Prince's faces from more angles, in different lighting, with different emotional expressions..
this would be a much more accurate model. So watch this thing over the next 10 days as we generate a more accurate model.
And you can do that because I'm about to make this notebook Public, for anyone to view and fork.
But, yeah, that's a deep fake.
And now, we have this product called Gradient, by Paperspace,
and with Gradient, you can run this notebook for free, right now, because of the product that I just released
that is Gradient Community Notebooks. Because I have written this notebook in a Public notebook that anyone can,
because of this new approach we have to machine learning proliferation, you can actually share this with the world,
and you can train it for free on your own videos.
So this means that literally anyone can go to this url, and clone, by hitting Play here, at this URL..
this machine learning model architecture and program, app, on Gradient, onto your Gradient account, and do whatever you want with it.
Talk about the limitations of the current implementation, and how Gradient can currently be used for multinode, serverless, and deployments, and directions for future product expansion.