CppCast - TensorFlow

Rob and Jason are joined by Andrew Selle from Google. They first discuss Ranges support being added to Visual Studio, and Compiler Explorer's support for using some libraries. Then they talk to Andrew Selle from Google about Machine Learning with Tensorfl

  • Play Speed:
Content Keywords: one
00:00:00
episode 257 of sleepycast with guests Andrew selling record, July 23rd 2020

00:00:07
Sponsor this episode of cppcast is the PVS Studio team the team promotes regular usage of static code analysis and the PBS Studio static analysis tool.

00:00:33
In this episode we discuss Library support and compiler Explorer.

00:00:38
And we talked to Andrew Sally from Google.

00:00:42
Andrew talks about machine learning with tensorflow and tensorflow Lite.

00:01:08
Welcomed episode 257 of cppcast the first podcast for C plus plus developers bicycle supposed to be lovers. Amaryllis Drive Irving Jovan my co-host Jason Turner Jason. How you doing today? Are you doing doing? Okay, I don't think I have too much news to share here. Very hot. We Care North Carolina, you actually had a slight pull down. It's only been in the low-90s here instead of the Opera on ice is nice and cold comments that I did just now like 10 minutes ago. I make my to be pecan registration. So I am planning to attend cppcon virtually, so

00:01:51
Just a reminder to our listeners that it's soon as that the early bird expires sometime at nine minutes, right? I still need to do that myself. So you'll be like a 2 weeks 2 weeks and it saves you $100. It's 200 vs. 300. So it's pretty significant 50%. 50% 30% savings bond worth it to sign up early. Okay. Well, I'm certainly true piece of feedback. We got this email from Michael about the episode from two weeks ago with the Disney Hyperion renderer team and they said this episode was great. I really enjoyed listening to Young and David hearing about how the Disney render their movies was super fast. They didn't keep up the good work and you name was actually eating Carl was when I actually recommended Our Guest for today. So that's cool. We love to hear your thoughts about the show you can always

00:02:51
Reach out to us on Facebook Twitter or email since you back at cvs.com and don't forget to leave us a review on iTunes or subscribe on YouTube joining us. Today is Andrew Celli Andrew the senior software engineer for tensorflow light at Google and is one of its initial Architects. He's also working on improvements to the core and a PIR sensor flow. Lee who worked extensively in research and development of Highly parallel numerical physical stimulation techniques physical phenomena light for film and physically based rendering you work on several Walt Disney animation films including Frozen and zootopia. It was a PhD in computer science from Stanford University and how well thanks for having me. I'm kind of curious what drove this move from animation simulation to the coast of Malaysian to tensorflow. I think if you look at the core of what you have to do to do physical stimulation for Fillmore physical stimulation physics, it's actually a lot of similar skill sets to what you do for machine learning and at the same time as I was doing things on.

00:03:51
Machine as I was doing things in film, I started getting more interested in what was happening in machine learning and I wanted to give it a try highly parallel mean it's in your Universe. Yeah, so I think this is very overtime. But when I got into physical stimulation there was a big problem with doing things in film is that people are using single computers. So one of the things that I worked on it Stanford was extending physical stimulation to work on MPI. And of course, this is I've been done for supercomputers supercomputers for a long time are we took serve algorithms that had been applied to film and we scale them. So we did the first simulation of clothing that was like a million triangles, which most people were simulator on 10000 at the time this little while ago so people have gotten to these levels as just bread and butter type of scale. But since then, you know, I've done a lot of cheap used stuff

00:04:51
A lot of distributive parallelism and just a lot of micro optimization on single-core as well. I distinctly remember the first time I saw a floaty thing or a cloth simulation physical stimulation is that siggraph and I was in high school is it like 1994 or something like that? It seems like complete magic at the time and it was like they had a single piece of fabric that they are able to stimulate, you know, like that things are going a long way since then it turns out that has the Collision detection. So when you have a cloth fold over itself, you have to make sure it doesn't fly through itself. And that's where sort of trivial parallelism where you say. Oh this part of the cloth is not anywhere near this other cloth starts to fail and you start to have this problem of like an N squared possible interactions where any point of the cloth can contact any other point and you have to check all those in an efficient way. So you do actually have to check them all out.

00:05:51
Shortcut, well, I mean you use spatial structures to accelerate that so when the most common one is a bounding box higher key where you say, well this triangles within this bounding box. And this neighbor of it is within this bounty box that we put those together then we can say there's about a box that contains both of them and you continue with that and you create a hierarchy of bounty boxes, then it it reduces, you know, not the worst case scenario, but the average case an hour or so that it's tractable. I mean in the worst case you could have all the clothing compressed into a single like tiny box and that would be very difficult to solve right cool. Okay. Well and you're going to have a couple news articles to discuss feel free to comment on any of these and I'll start talking more about tensorflow. Okay. Alright. So this first one we have an article on the visual specialist blog and this is that initial support 4C + 20 ranges are now available in the latest version for the studio 2019.

00:06:51
Persian 16.6 first user visible pieces of branches support parent have been working on it, under the covers for a while. But now you can actually come and go and test the dark stars. I feel like it's it's more significant than that as well. It's not just ranges but it is ranges bill on concepts yasso for a long time. The Rangers implementations that we've had been built with Concepts in Malaysia. Now, where were we could actually see the stuff coming together and Concepts looks like has been in there for the past three. Releases. So now I'm using Rangers with full Concepts good stuff you a chance to play around it brings it all Andy unfortunately know a lot of times we've been limited to see + 14 tenths of Lord. There is a request here before I move on that, please go. Try it out kick the tires submit bug reports and reminder that all the stuff is being developed on GitHub now the STL implementation

00:07:51
Microsoft is all there. So check it out if you can submit bug reports, whatever. Okay. Next thing we have is some new features in compilers. You know, how long it's been working on these Jason be able to link libraries is the new feature. I think I don't know how long they've been working on it. Although. I know that Matt would like us to point out that this was not just Matt's work. I don't know how much involvement he had in it is it is quite the team that helps him a compiler Explorer at this point. So but yeah, I think it's awesome because there's been a few times when I tried to use the format and compile Explorer just unusable because you have to use that are only version if it's already compiled whatever and so now they have like Library ceiling two to have a full list of libraries here. It looks like they're listening couple of the unit testing libraries, but not sure what other ones are available now live formats. The only other one that I know specifically is called out of seeing a test ones. I haven't looked myself.

00:08:51
I may have just found it. Yeah, Google Benchmark Intel TBB format cross cables much of that is catch to deliver Google test Benchmark also being supported because there's quick bench compiled Explorer integration. You haven't seen that the being compiled Explorer you can hit a button and go to Quick bench and I'll just copy the code over and for quick Explorer and having that live right now share between that means you can get a more and more complete integration between the two off-gassing haven't tried it yet though. The last thing we have is cmake 3.18 releases out anything worth pointing out with this Jason to Big release. I'll ask Andy. Did you read this if you have any interest in seeing make America a lot in the past, but I haven't recently tensorflow was using C make for Windows support before basil supported windows, but now I think we're not

00:09:51
Isn't it that much but yeah, I enjoy using C mag for small projects and the cross platform support brings me back there when I need to do that often are absolutely right. So I be out. There's nothing specific that I feel like I need to call out on here and there's just so many little changes and it's clear that there's one thing I'll call out profiling support. So if your cmake project currently takes forever to configure and generate you can run the profiler figure out where it's sending its time time to try that. Alright. Well Andy, we've had a couple people recently asked us to do an episode on tensorflow. So can we maybe start off by just letting you explain what exactly tensorflow is for listeners of never work with it, which I think includes both me and Jason.

00:10:42
Sure, I mean a big project and it offers a lot of functionality. But the main core idea behind it is that it's an open source library that lets you develop and train machine learning models and it is the basis behind it was the idea that you can start from research and researchers can create a model and you can take it all the way to production and deployment on a wide variety of devices including smaller devices like mobile devices. So given that scope it has a lot of features and I'm not even an expert on a large majority of them. Okay mobile applications. What what what what is that look like to be doing machine learning on a phone? I'm curious. So that's the other half of it. So once you train a machine learning model using data you often want to use it to integrate into applications. So a lot of the work has been doing a inference on a mobile device where you you carry the weights.

00:11:42
Describe the function and the program that describes function and then you can evaluate it on whatever inputs there are we can get more into a tensorflow lite, which is that the mobile product for apartments for in a bit used words, like weights and models. And for those of us who don't use it. I would love do you like the explainlikeimfive description of what machine learning is? Why are we training models? What are we doing with them? How does this work? Yeah, definitely. So traditional machine learning is about creating a function and its you know, the process some inputs and it produces some outputs. How do you create that function without machine learning? We have a way of doing that algorithmic. So we say I want to make a function that compute the sum of two numbers. I can do that. I know how to write a note in the do that it's fairly a mathematical the big difference in machine learning is that use data and how use data can vary a lot of different ways.

00:12:42
But if you kind of think of the simplest thing you could do with with machine learning like the simplest form. She money would be something like a you-know-what linear regression. So you have a bunch of XY and you want to find the line that closely most closely matches it so you think about that function and the way you describe a line is basically a slope and intercept or two point. There's many ways to describe a line but the idea is that given that data what is the best line that fits that data and fast can be described as many different ways is it, you know, the the distance the perpendicular distance to Altoona arm etcetera, but without getting into the details of like how you describe these kind of are functions. Let's just imagine something like a I want to determine red marbles from White marble. How do I do that in real life? I might lay them on the floor and they're all mixed up and then I might want to say well which ones are red and which

00:13:42
Are green which ones are yellow etcetera? I might start pushing them into piles, right and by pushing them into piles it becomes much more obvious how many there are and I can quickly draw a line between them and that's kind of a machine-learning tries to do it tries to warp the data in a way through function. So that these kind of decisions become obvious by some point in the function. This is sort of making some sense shall I be even more concrete sure. So one type of problem that is usually used for machine learning is Eid of classification. So if we have an image and we want to put it through a function that function could maybe describe, you know, what that image is in terms of like a category. So if I have a classifier that determines what kind of dogs are if I have a bunch of different dogs, you know, one of them might be a Boston Terrier one of them might be a German shepherd and I want to give that image and output sort of a number which represents a class or string which represents a class.

00:14:42
Michelina machine learning is about how I create that function. Okay. So so the model is the function effectively or exactly and this is not that different from traditional modeling. So if I want to if I drop a ball from a height and I want to compute what its velocity is and what its position is, I might develop a model at the physics model. That's a Newtonian physics and I might do that empirically I might do that. I might try to come up with a mathematical function that measures it really accurately and that's sort of a Newton did the problem with that approach is that it works really well for a simple phenomena, but if you have complicated things like this example of identifying different dogs and images it becomes much more harder to to do that algorithm. So the idea with machine learning is that if you can create a class of functions a model architecture that is perhaps really complicated. Can we make a way of getting the particular?

00:15:42
Founders of that model to do a good job. Okay. Yeah. So back to my linear example if I have a particular line and then I look at my data. I can measure how good my day my line is doing against that data and if it's doing badly, can I improve it the way I might improve it is I push that line until it's the right angle. I push that line up and down until it matches the right bias and that's what's happening in machine learning. But at a higher dimensionality using much more complicated higher Matrix order things and nonlinear days. If you kind of make that function more complicated it's harder to intuitively see it but it's basically doing the same thing. So are all of these machine learning algorithms. I mean do you specify when you're training a model how what dimensionality you wanted to try to fit to the data or does it just do whatever it does?

00:16:37
Yeah, exactly in traditional machine learning you would specify the model architecture. So you would say okay if I want a a linear feet to something I would choose how many dimensions in and how many dimensions out I would also choose so like that would correspond to the image size in our example. I would have correspond to how many classifications so completely if I'm just going to do a simple Matrix as my model and I say the image is in a 256 by 2:56. Then I'm going to have you know, 256 squared elements on the rose. My Matrix are the columns my Matrix and then on the other dimension mymatrixx, I'll have the number of classify classification and my desired output for that function might be what's called a one hot back door. So if I give it a a Boston terrier and Boston Terrier is code 3 then I would get 0010 all zeros. And if I say my German Shepherd is class zero that I would get one followed by all zeros.

00:17:37
And coding do you have to tell me how many classifications there are upfront how models are trained specifically with tensorflow to help you train Emma models and there's a couple of things that come up over and over when you try to train a model 1 is how do you get data? How do you represent data? And how do you shove it through the system efficiently the second one is how do you describe models? And how do you specify them? So tensorflow provides a library TF. Data which is a way of sending data within it. The second thing. It provides is a way of specified model architecture. So there's a lot of conventions that have come across from successful research projects, like fully, fully connected which would be basically a matrix multiply layer and then on top of you might stack another thing, which is maybe a convolutional thing.

00:18:37
Which knows how to do things like blur has an edge detections and all of these kind of layers can be accumulated in the library and create a potential model architecture. Then the next phase is when you actually start training Jackson start raining you need to go from a particular set of parameters that perhaps were randomly initialized that don't do a good job at all and you try to put herb them until they're good quality and the way you do that is using gradients differentiation. So if you evaluate your function on a set of data that gives you an output from that output you can also compute what is the derivative with respect to all the variables all the parameters in the model and that will give you sort of a perturbation that you can apply to all those variables that will make it better that will make the air less. Okay, and you keep doing that over and over again running your same data over and over again against her model Computing these small perturbations the model using your great-aunt.

00:19:37
So tensorflow is helping you define it architecture in terms of the gradients. It's also do an automatic differentiation, which is basically allowing it to you to specify your model architecture in a straightforward way and then compute the the gradients automatically so you don't have to apply the chain Rule and you know, all the differentiation rules that everybody's forgotten from calc. I so I want to let see maybe focus on something it might be an approved focus on but you were saying you can run the input through convolution Matrix or whatever Edge detection or something like that. It kind of sounded to me like that means for something like this dog classification example, you're not necessarily just training one model. You're not saying here's an image Now give me the output. Do you do then split it and like say, okay. Well, I'm going to wait. I'm going to train a model that's based on edge infection and I'm a

00:20:37
Training models based on colors and I'm going to train a model that's on something else and then have those things work collaboratively or is that all part of one process? So essentially you're trying to create a huge composite function and composite functions can have multiple stages and I think that's what you're sort of feeling your intuition is okay, basically correct example of a matrix are the example of a line, which is also a matrix. Those are simple linear models. Those aren't that powerful and it turns out there's a, you know hard limit to what you can do with them. So the way people saw that as they add multiple layers that do more more steps, maybe just more steps of linear and between those linear layers, there's also nonlinear functions and all these kind of functions allow you to have more resolution power and represent more complexity. The problem with that is that it gets harder to reason about and it gets harder to train when you get deeper. So the idea of deep learning which came out a number of years ago and that's one of the major

00:21:37
Price of a I recently is that you now contracted Lee train those multi-layer models because traditionally you were not able to do that. It was not tractable at all. You didn't have enough computation power and you didn't have any way of dealing with kind of the numerical issues that occur when you do gradients across multiple are so you can imagine if I want to eat you think about the butterfly effect, right? I prepared some are some where does it cause a tsunami somewhere. This is kind of what happens when you go through many many layers of a machine learning model you try to figure out what the causality of a particular output is, and it becomes harder and that's called The Vanishing gradient from it turns out there were techniques like drop out and data augmentation that helped make it so that these types of model these types of problems retractable and that's what allowed deep learning and so deep only allows you to make multiple higher keys or sort multiple layers. So when you were talking about the edge,

00:22:37
detection what happens in a deep learning image model is that you have some something that's very close to the image, which is Computing very low-level features, like edges like blurs like and then the layers subsequent to that create higher level features, like, you know, maybe course shapes course orientation and as it goes down and down to the model it gets higher and higher level features until it can successfully do a classification from

00:23:09
Three sponsors the PBS Studio team the company develops that PBS Studio static code analyzer designed to detect errors in the code of programs written in C C plus plus C sharp and Java recently. The team has released a new English version addition to working under Windows 2 C sharp part of the analyzer can also operate under Linux and Mac OS however for C plus plus programmers will be much more interesting to find out that now you can experiment the analyzer online. No diamond. Godbolt. Org website. The project is called compiler Explorer and lets you easily try various compilers and code analyzers. This is an indispensable tool for studying the capabilities of compilers besides that change assistant when it comes to demonstration of code examples to find all the links in description of this episode.

00:23:51
I want to see if we can get more into the C plus plus being used for this before do that. Could you tell us a little bit about tensorflow Lite which it in your bio? You mentioned that you were one of the Architects for ya. So I'm tensorflow is aimed at training ml models and aimed at deploying them on on the server for serving. So if you wanted to do in friends over many users that were getting the same server at once tensorflow has been used for that. There's a library up Servo that does that but what emerged as we started deploying ml models on device is that the the overheads were high intensity flow which is okay for a server-based language because you basically have many inferences happen if you have many pieces of data come in at once so any sort of interpreter in efficiencies, we're not such a big deal because they're amortized over really bar large amounts of data.

00:24:51
Same time there other constraints like binary size that became really important on mobile devices. So an app developer doesn't want to have a huge binary that they have to the ship around so tensorflow Lite have the goal to make the overhead of individual operations be much smaller. It had the goal of having a very small binary size when we first shift to her about 100 kilobytes for The Interpreter and it also had the goal of basically having a very low latency to start up so you can imagine that there's a lot of Cabela guns that you can use and you know, if you have a lot of a big binary size, it's not such a big deal to load that that huge BMX if you're not going to use it all all the time, even if your initializing large parts of it, if you're going to run for like 10 days doing a machine learning training, it doesn't matter that it takes, you know, 10 seconds to load or whatever but on a mobile device when somebody's starting an interaction with their app and they want to

00:25:51
Get the result in like 2 seconds, then you want to minimize the latency so into a flight we focused on those design constraints and we made is like a subset of features that would work well on mobile and that we made a way in which you can take models from tensorflow and put them into a sensor for light so that you have a continuous offering process token tensorflow Lite do the learning also or just the inference the running of them at all. I'm getting this turns, right? So it doesn't support the learning as a first-class citizen. There are ways to do the degrading propagation manually. In fact, I think we have a blog post or an instruction on how to do that. But that's typically not done is commonly though as mobile devices are getting more powerful and people want to do it more daptiv algorithms. It does occur there certain types of applications that are deployed that do tree that do training but it's not the most common path right now. Okay, so I'm curious what the

00:26:51
Model actually looks like what is this thing that you generate and then hand off to your mobile device as we talked about the receipts different layers and each layer might have something like if it's a convolutional model, it might have a set of filters so you don't put a layer in and say it's an object model. He said the convolutional model and you should have learned the filters. So that that layer can be managed section. It could be a blur. It could be some other two swizzling data. So those are the weights. So when you put a model on a device when you sterilize it you remember the weights and you also remember the topology so intense full light. We have a flat buffer, which is memory map of oil that contains the weights and the topology. Okay, and so it has essentially a graph of a directed graph of these layers and or we call them Ops I guess and

00:27:51
That's the way to associate with them. So it can run multiple endings are developers or scientists use tensorflow using the C plus plus are they mostly using python bindings. How do they usually interact with it? So I guess I haven't mentioned that tensorflow is often interact with python so far. It's written in C plus plus and written in Python most researchers that are training models use Python. So that's how they interact with it. Most people that are deploying on to mobile devices are using a different language. So, you know tensor if you're using Android you would probably be using kotlin or Java if you're using iOS you might be using a check to see if Swift and both of those you could write a library in Seaport Boston use that so in terms of the bindings tensorflow has a key bindings, which is kind of the way in which you can create.

00:28:51
You can call tensorflow from that ticket load a model. You can run inference on a model using the Capi and similarly. See if life has a c i p by what you can about. So if you want to find to Java you would use j&i to write a binding layer that connects to Pensacola light. For example, if you are going to use tensorflow or test for light from objective-c well with check to see can you see. So you can just do it directly with no buttons. So I think most most findings are written by hand using the the sea layer and due to a bi compatibility. It's usually required that you write things in a sea lowest common denominator way to make them compatible across multiple compiled a person's comes up again. We had lots of conversations on the show about breaking Avi compatibility across C plus plus version, so it's been a while so I know how much effort.

00:29:51
Is it I'm trying to imagine like how high-level how small like how what kind of a footprint do try to keep with the sea binding layer to make sure that all this is maintainable. I like what is the surface area of see? How big should you make it that kind of thing? Yeah, basically anytime I've done language bindings to another language. I use Swig that purse is my see both of us had her files and it does that work for me. I've never personally done this like seeing, you know official see binding and then let people use that kind of thing. So I'm kind of curious if that ends up looking like in your world. I think if you look at machine learning inference or even training typically you interact with a small surface of the interface, if you're not involved in authoring individual notes, you really don't need to do anything except send inputs in and get out outfits. So it's not as big of surface as the whole thing and that that means that for a long time the python bindings were kind of a

00:30:51
Special case in tensorflow. The other thing that I would say is that even for that small surface area. It's often useful to make bindings idiomatic. So going back to your swag example. I seen a lot of libraries a lot of applications that have bound with swag and they found their whole C API and that turns out to be really good if you're kind of a C plus plus developer and you want a prototype cplusplus things in Python. So I think my had a good like fairy direct API for doing this which isn't Freddy animation tool and I used it extensively and I've also done this in my own projects, but if you're trying to make something that's idiomatic python or idiomatic Swift radiomatic objective-c. You ten-to-one. I like rewrite the bindings in terms of manually. So you want to use the language features and the language idioms that are considered good for that particular language and that's why a lot of people hand write two payments. So it's so your footprint using it is pretty low and basically just comes down to like loading a model and executing the model from the

00:31:51
Directions for light goes I believe there are ways to the construct a model. I haven't looked at what the current situation of it is, but in terms of what most people use from see it is to to run inference or to run a train Loop possible in terms of things. We do actually use a a rapper generator. We just don't wrap up the entire cplusplus Library wrap a smaller interface before that is a special case thing. Yeah, and traditionally we used to Swig as well more recently. We switch to Pi bond for the tents for library and Pi bonds a library created by a graphics researcher who does a lot of rendering work mental Jakob and he has a lot of other interesting suppose things. You should look into him as a future cast. I would say he's done a lot of things on machine learning combined with Graphics, which is really interesting differential rendering. That's cool. I'm also curious

00:32:51
Passing large chunks of data around through the Capi like what does the how do you do an object Lifetime by cuz the C plus plus side of things is going to have some notion of things in the sea things you have to do you like this typical alike create an object and then ask The Binding to destroy it for you kind of thing or yeah. I mean it's, you know typical you'll have a free function associated with a n o a candle in the Capi for a tensorflow in terms of what we do intensified light. A lot of times we R memory mapping the the model in which case we assume everything has kind of all those weights have an infinite life time and we try to not copy any of the really big day that we only copy and create internal representations of the topology of the graph from the memory map whenever you need it. Yeah, exactly. And in fact one of the big differences with what we did in tensorflow intensive real life is how we dealt with memory allocation in General Tso.

00:33:51
Interflow for its traditional runtime uses kind of a reference counted tensor handle. So there's like a buffer and then tensors are sort of a copy of a copy on rice, but that same kind of feeling which is that you have these handles and multiple reference counted of use of them to emulate kind of a value semantic situation in TF light. We do a head of time memory planning where we try to create an arena. That's the whole memory that you would need. So it's kind of like an Uber activation frame of what's needed for the model where we can overlap different parts of computation that are used at different times to make the a minimal overhead in terms of memory allocation, or at least a smaller minimum.

00:34:41
Minimal becomes very tricky if that's truly our goal. So is tensorflow Lite written as a separate library, or is it like a pair down if death sound version of tensorflow right now, but doesn't are a number of the same operations. I think you said it shares a similar operations. There's some differences in operations Define some of its operations has fused versions. So, you know, we're talking about convolution and it turns out that after convolution or before it is depending how you look at it you often do I buy us which is just adding a vector to everything and then you do some nonlinear thing and if you can't imagine doing those all at once while you're loading a single element are you do the Bisons at activation which are kind of like a nonlinear function. Then you have produced memory bandwidth like typical Colonel Fusion type strategy. So just like to find some of the key operation.

00:35:41
Psych compilation by sad as a fused operation to get higher performance. There's different implementations. A lot of the tipline implementations are optimized by hand using assembly and we use another Google open source Library call drewy for that does basically fast quantized and Floyd Point matrix multiplication, which is a perimeter of the to use by a lot of these types of machine learning operations. And you mentioned before that. You are limited to C plus plus 14 with tensorflow. Yeah. So I mean there's kind of a long story if you want to support software on a wide variety of machines. It tends to be you got a lot of complaints. If you go to to new to fast the other thing that's interesting intensified light side, which you know, I'm much more familiar admittedly is that we have a lot of people that are trying to take tensorflow Lite and and take it over.

00:36:41
Small devices like microcontrollers and wild microcontrollers have gotten to be a lot better at handling nude compiler toolchains since arm is so ubiquitous and respond as well coming up. That means that you can use C plus plus where is traditional embedded developers would never touch the both of us there still sometimes a little bit behind and it's sort of like sometimes the chips that you might choose you choose for Hardware reasons and their tool chain might not be as advanced even though there are like definitely microcontrollers that are make every version of people Source available. You know, it's a shame that you can't move Beyond see if that's what's 14 but also really cool that you're able to at least use 2 C plus plus 14 on these devices. I mean, I see a lot of possible benefits of you know, the C plus plus 17 and 20 features. I think if you look at EF data and some of the streaming operations that we do

00:37:41
They look a very similar to cover teams and using cover teams directly might be really interesting and then lots of other Creature Comforts.

00:37:51
But we talked a lot about training these models and deploying them and tensorflow light and small devices and everything. Is it now and now I'm like well what are people actually using tensorflow Lite for what kinds of models are the executing on handheld devices are microcontrollers. Yeah. I mean so like if you look at ten simple life, it's been deployed in like over 4 billion devices and the Fulani yasso the way is so it's been used in a lot of Google's core applications. You can imagine what types of things that might be useful. If you look at it like an app like Google photos and you know typing something into the Google photo search you can say I want to find a flower pot and I'll show me the flower pots and that's basically an image classification and that's a model that's been strained. There's some parts of that that can run on device some parts of that that run on server.

00:38:51
Look at other models like speech recognition is another big one. There is a lot of work on speech recognition and that's enabled such devices like Google home where you can talk to it instead of having to interact with a traditional input devices. You can also do that on your phone. And then there is like a again some of that can be done on device and some of that can be done on server and it turns out that as mobile devices get better and better you can do more and more on the mobile device and have to rely on the server less. Okay fascinating just thinking about in my personal career how many times we've gone to you know, everything on the server know everything local and everything on the server? Okay. Now let's find the balance that makes sense for everyone. So it's interesting. I think it's going to be a constant push and pull and there's always tradeoffs and this also happened in graphics where you think about what you can render on a GPU vs. What you can render on on a CPU

00:39:51
Or what you can render in, you know, 30 frames per second vs. What you can render overnight and you're always going to be pushing both of those at the same time. Another application that is really interesting that we demoed at Google IO that I worked on was the idea of using pose. Like if you point a camera at someone you can actually figure out what kind of orientation all their limbs are in it and you can do that and you can use that to come and teach him sex. So in that case, we took a dance instructor and show people how to do dance moves and then we slowed it down and then we use that post-match to tell them when they're doing a good job and gave him a score and allowed them to improve and give them feedback on what they're doing. So a lot of different kind of applications of people have done with, you know, very specialized equipment like motion-capture like specialized cameras now can be done with regular cameras on device and that's a really exciting thing because it's going to just make it more ubiquitous.

00:40:51
So is there been any I'm thinking about you know your desire to run these things on small fast in quickly on small devices whatever has anyone done any work on like actually compiling the model itself? Yeah. So this is kind of an interesting questions. There's been a lot of work on compilers for machine learning. So part of 10 surf like released with tensorflow is Axel a which is actually a compiler and it takes us an input each hello, which is an instruction site, which is basically linear algebra operations. So you can actually take tense of operations and they can be lowered to this dialect and then they can be compiled to CPU to GPU and this is actually how the Google's TPO is actually fat you you make programs into these oxalate programs and the way this works in tensorflow or float.

00:41:51
Tensor Processing Unit so that gives you the ability to compile. Thanks a lot of the benefits that you can get from that is Colonel Fusion. Like we had talked about with centralite you can imagine doing that on the Fly you can imagine doing this with basically actually as integrated as a chit so you can basically tell tensorflow to compile this thing and then that becomes a new function that actually goes to a compiled version of it school. There's also another a framework that's been created called mlir, which is what started a Google by Chris lattner and it is essentially a framework for creating multiple levels of representation of IRS. So you can build compilers for a wide variety of things observation. There is that you might have different sort of dialects are different IRS that are useful at different times. So you can imagine actually being a dialect within it or you can imagine

00:42:51
Tensorflow on being a dialect with it. And in fact the converter from tensorflow to ten Sapporo life is actually implemented using Emily our technology. So there's a lot of exciting things happened with compliation what traditional tensorflow did for its implementation is it used a lot of eigen? So this is kind of using C plus plus as a compiler for any DSL essentially. So a lot of the operations in tensorflow were implemented in terms of terms of Oregon. So at that stage you can imagine so you asked if there's like hella Temple ization of tensorflow and mostly at The Icon stage, but if you look at some of the operations there often template eyes and their Colonel implications in terms of Dimension and terms of type. So you got a specialized version for type and dimension and I can can sometimes do better with its pack ization where it's going to mapping these high-level operations that are linear algebra.

00:43:51
Enter operations in to pack ties to SSC forms at work better on see if you all right. That's pretty awesome. Sounds familiar, but I don't think it's something we talked about in quite a while. That was created several years ago now to to handle linear algebra, so is aim to be a C plus plus oriented linear algebra library and its uses expression templates to sort of describe repetitive operation. So if you're doing like a bunch of operations on small matrices, you'll get a specialized small Matrix for that. Let's go size and type that can outperform. So the basic idea of expression templates where try to take all the dynamic tax and turn them into templates. I believe that the tensor functionality was added by essential oil developer the time.

00:44:51
That's the extension that was used to create the first version of tensorflow. I'm always in favor of more things being done at compile time if they can be of course. Oh, yeah, there's a lot of things that I tell you can imagine doing that compile time. So as I said the colonel Fusion one thing the other thing is this memory planning the idea if you can infer what the shape of a tensor is, so tensors are basically multidimensional arrays and various of East Mission Lane operations could be formulated. No way that you can infer what the final shape of those tensors are but if you change certain inputs like the image size, then the shape of the output sensor might be completely different. But if you can infer that it's the same then you can allocate the memory more efficiently. You can pre partition for parallelism etcetera etcetera. So lots of models generated you definitely know the sizes of these things are some types of models. There's also Dynamic models for this is not the case.

00:45:51
Play function. There's some types of functions like a convolution if I stipulate that the image size is always something or I resize it first then everything below that resize is known shape. But if I take an operation I say give me a tensor that has all the positive elements. Then the size of that Oprah tensor will be smaller or equal to the input sensor. So the Shaker an unknown there that's kind of a nice case because of shape is is bound by the input shape at least so you might have to do more there but in general you could create a function where there is no guarantees. I've been was just chatting with my nephew about recently is some of the Articles and stuff. I've seen, you know come up on Twitter or whatever about unintentional bias built into our machine learning models, and I'm just kind of curious if you have any opinion if you'd like to talk at all about the ethics of

00:46:51
Holding these models and a bias intentionally or otherwise that can be built into them. Yeah, so I'm not an expert on this area, but I think it's super important and one of the new things is that Google is actually looking at this because we take that really seriously, there's a whole website devoted to AI responsibility that Google has me to check the URL. I think its responsibilities and it talks about what are you know, some of the sources of bias some best practices to avoid it and I think there's even extensions to some of our tools so we didn't talk about tensorflow has a visualization tool that's really nice for understanding what's going on your medical tensorboard and one of the nice things I believe there's an extension allows you to understand what the characterization of your data is, but I'm personally speaking I would imagine like one of the downsides of using data. Is that You Know You're vulnerable to not choosing the right day to having some kind of sampling problem with it.

00:47:51
And you need to actively work against us and you actively need to compensate for that and be aware of what you're putting into it. There is no magic bullet to it. People have to be vigilant. It sounds like it would be the kind of thing. That would be at least today and possible to automatically detect by us in the system. Right? Because like if you imagine an automatic detector you would how would you make that he would have to use data to make that detector so against other detectors or something right thing about machine learning models is that they are kind of observable you can put new inputs into and see how they do. That doesn't mean you understand the internals. There's a whole branch of AI about a I understand ability, which is trying to take these models that are kind of train through automatic means and try to make them interpretable and some classes of training only want to create interpretable models, you know, so you can think of

00:48:50
Why human? Yeah, but you're actually understanding how they work and this time in these areas are still under active research and I'm not an expert so I can't really say anything too intelligent about them. But I think these are the kind of tools that were going to need to improve the situation with any new technology. There's always this. Of of understanding how to use understanding other second-order implications that are really important a fascinating, I understood you right you're saying that the average model that's created a human cannot understand. I mean, it depends on what types of models that you use like, there's that that is one of the criticisms against deep learning often which is that it's harder to understand the model people have found techniques to kind of analyze them. They're certainly being validated against their validation set. So they're producing seem the right answer for all the stuff that you look into but it's fundamentally and unsolved problem right now.

00:49:50
You want to go over before we let you go. I think we covered a lot of things. I don't want to overwhelm everybody. But I mean, I think it's a really easy area to get into there's lots of tutorials on doing machine learning. There's lots of YouTube channels lots of content on on how to get started with this and it's and it's really fun pick some problem that you want to do and do a weekend project with it, and I think you'll find that. It's really exciting what you can achieve with very little code, and the framers have made it a lot easier, especially tensorflow one of the major things that we've been focusing on the last couple years is making it easier to use tensorflow the TF duck Terrace library, which is kind of a high-level for creating models has made things way more understandable and, you know, there's still the lower-level library if you need to dive into a lot of details, so I encourage everyone to give it a try. Thanks very much. Thank you. Thanks. Thanks so much for listening.

00:50:51
I love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in or if you have a suggestion for a topic love to hear about that to you can email all your thoughts to feedback at see if you can cast., it. Also appreciate if you can like see if you cast on Facebook and follow cppcast on Twitter. You can also follow me a trial W Irving and Jayson At Last Kiss on Twitter. We also like to thank all our patrons who helped spread the show through patreon if you'd like to support a senpai Shrine you can do so I patreon.com sleepycast and of course you can find all that info and the show notes on the podcast website CPP cast.com.

00:51:29
Play music for this episode was provided by podcast games.com.
Translate the current page