so i i'm martin robinson and i mean a gully and they work and what


and i

so they wanna talk was it about the work we've been doing with like a

G D K and especially i'm gonna focus on some practical things for people who

in bed but okay

some changes you'll have to make if you for your application directly to

i just wanna say like to preface this talk by saying that for us to

make a G T K this

this table really celebrity was really a revolutionary step in the development of the library

rather than on an evolutionary step really changed

one of the characteristics of the library so

we're actually really excited about it

so i suppose there with a quick review for those of you who aren't

intimately familiar with like it talk little bit about

what is what it is for so

what it is

is what's referred to as a web content engine which basically means if you have

a web browser everything inside

inside the chrome in that little box

is rented web content and that's what the libraries responsible for

as well as some ways in which that content touches the outside world

so right it processes in renders web content and processing includes both parsing the H

T M L and the C S in rendering it as well as running the

java script


it was started as a for kick H T M L and for a little

while it was closed source but eventually with open source and two thousand five and

on the page one of the goals of the project is actually that it's open

source that it's

this is usable and visible to everyone

as well as these to sort of companion goals compatibility in compliance compatibility meaning that

there's a lot of content on the web and that the engine should be able

to render that content

it shouldn't break websites that exist

the actually the their criteria for breaking websites

it has to be something very important and websites have to be a very small

percentage of other sites on the internet for instance on the blink mailing list recently

they were talking about removing the feature and the feature was use on something like


a percent of websites and some was like that's a lot

and it is a lot when you have millions and millions of pages that's a

lot of pages

so the other part of this is compliance which means that the engine should be

should be

compliant with the specs

and is a kind of a competing goals away because sometimes to be compatible with

pages you need to not be compliant with the spec so it's always this kind

of back and forth conversation we have

obviously stability performance are important because the web browser should be fast and it shouldn't


also security which all talk a little bit about more about the security issue is

very important portability it should be written in a way that's that makes it useful

a lot of systems not just a mac not just intel computer usability in package

that would be and hack ability is really a statement about the quality of the

code the code to be written in a way that's easily readable easily changeable

it should be abstracted away and in the right amount not too much not to

will just enough to make it easily hack able

you never wanted to be a pain to have to go change the code to

fix about

any time there's a barrier in the way that means less bugs will be fixed

and then they also stay on the website some non goals which is in some

sense equally important because sometimes you shouldn't be turning this wiring tool for web browser

it's not meant to be able web browser it's meant to be a component it's

reusable inside webbrowsers

so they need to be a dividing line between what features go in the library

what features belong in the embedding application recline

it's also not a science project it should be which means that it should be

relevant to what exists in the world today it's made to render web content that

exists it shouldn't necessarily be place to experiment with things the

people will never user are important right now those things can be worked out in

what you can meet them halfway

the third thing here is it's not meant to be split into a bunch of

reusable components which is kind of and sometimes in contrast work with going on because

a lot of times in get home when we see that there's a piece of

going on that's useful for a lot of other tools suisse you know split into

a library and web get the fourth is a little different you know

every time you split a something out to library there's some overhead and maintaining that

you have more consumers

so it's a little it's a little bit more

i guess like of a hermit community you know where together working on this thing


you don't always wanna likes but also means we can

right so another the interesting about what is it split into things called ports


you can kind of see what is going there's a T K pork important you

know for a mac and windows for tutors on safari import so

are essentially

the common web get code which is most of the code is common

in some layer at the bottom which abstracts away the platform

for instance networking or how to draw to a campus

how to talk to system

and then that's at the bottom and then at the top is the api there

the egg i layer is what the embedding application uses

and way web "'cause" is design is the every and there is a little different

so for instance for the wreckage indicate for

in the problem later we use once you for networking use cover restoration opengl for

making the scene raffles will talk more about later web gel injuries you refer media

and what gets made in such a way that these components

in most of the web get code are totally abstracted away

into a wrapper classes that had the same semantics whether you're writing on a mac

or on for G T K and anytime the semantics differs it's kind of like

a little bug that needs to be fixed usually

there's always a little tricky bits of getting the semantics of different platforms of to

match up

because a C G canvas core graphics isn't necessarily the same as a cover canvas

for instance in cairo used or the path on the canvas but it's a little

different in some other platforms


and then at the top of like a G D K there is the A

P I later which is essentially a single a G T K widget the website

web you that would you that is the browser went the window into the web

content and some G I D K P Is around that

and some of the consumers of repeated a game betters are epiphany but or you

know that so maybe you're familiar with these is applications

okay so here's an example of what i was talking about so this is a

so simple by

architecture diagram of what can and at the bottom there's this thing called the media

which is essentially a little bit like booze

it's like a

i it wraps it makes it was a little nicer to use include some collections

some platform abstractions abstracts away like threads

and javascript for

which is the javascript engine and these days another blankets for jobs to for is

the only just in general it

and sitting on top of that is

so what for which includes a platform layer and the rest of web for and

i'm separating those because again the platform layer are our classes that rap

cairo for instance where is the rest of web for are

is functionality that's common to all platforms

like the functionality that takes

a stream of data and parses out C S rules

sitting on top of that is web kit

which is

how do i describe that a web get is sort of like

the glue between web for

and the browser

so this includes the api layer but also includes some code for like

handling different situations and sort of translating that into a pi concepts

that's a little fuzzy but

on top of that's it's the application


noticed it right now in this diagram again this is what get one these are

all on the same process this is just a normal library


before i start talking about web get to i just wanna talk a little bit

of a little bit about the motivation for what get to so some minor philosophical


which i think is what

the thinking that drove the creation of chromium and draw the creation but get to

and i

means that this is the future of the way


code has about this they crash the program

or just bucks

all got has boats

and colours bugs that allow arbitrary code execution


especially if

that code includes

a java script engine that's

writing machine code into memory

and not only just what happens cut has dependencies that have bugs

so maybe you've written perfect code but you're using library like phone configure higher that

has a bug

one of these buttons

and four point is even if everything was looking good live the your code the


you're gonna be processing

things from though from the world that you don't trust their like little programs france

and images S V G images and these are all like small set of instructions

that mean that the scope of the data your processing is why and in the

the chance of writing a

a font they can we can crash your browser actually i mean it's

it's very hard to eliminate these problems


well it was a pragmatic response this

i mean maybe you can say that

that we're gonna work are gonna fix all the buttons in our browser so that

it doesn't crash we're gonna eliminate these security issues

but you also have them at the security issues in your dependencies you also have

to work with sanitise in your input data which is very hard


instead we say yes that's keep working on fixing the crashes my browser but let's

also say that if something goes wrong let's make sure that it doesn't

we've our users vulnerable to attack


for instance when we talk about arbitrary code execution one thing to keep in mind

is that

is it these days web applications

are our applications they're like

they're like just up applications now and not only other like that stuff publications like

you might be running you know angry birds in your browser and like i want

side it is your banking information and maybe anger birds you know can reach over

and touch your bank account

and this isn't like a hypothetical situation this is this is things that actually happen

so the web is huge remember

so this is what we can do

we can

we can acknowledge at the web platform is huge in everyday it's getting bigger it's

adding more functionality each and you add functionality add more chances for vulnerabilities for crashes

and we can we can think of a way to make the crashes less

inconvenient for users

maybe instead of

when the web rendering crashes it doesn't crash the browser we just crashes that's have

or just crashes

the web rendering part

and we can prevent crashes from exposing

crashes and screen doors from exposing data from outside the scope of the current page

and the way we can get as we can put that data maybe

in another address space words harder to get to put some more separation between the

data of the different applications

and we can also prevent bugs and crashes from damaging the system

or executing arbitrary cut

that's another name for sandbox

so even if even if some paid crashes the browser you can try to that

hard this

because that process can try to the heart

and finally even if we're not talking about a much just page are just talking

about it a page that has a really heavy while

it shouldn't prevent you from using other pages or clicking a menu it shouldn't prevent

you from closing the browser to get away

so this is a this is thinking that drives this because

to be honest

well get to and from in these are like very complicated architectures and

and they deserve a good reason

so this is the end result

we can

we can put each web rendering part into it's own process and have some pair


and we could to we call

the web rendering process

the web process we compare process they why process

because the actual from of the browser is in this you are process

and we can sandbox the web rendering

because you know once you separate out the web are it's it doesn't need to

write to the hard disk or even read from the hard disk

and i'll talk a little bit more about

how to make sam boxing easier later

so this is sort of

the first web could to architecture diagram a on the left you can see the

older architecture diagram a little bit different but you see the api boundary was between

the application with kit and here we have now two processes

and the A P I is in the U I process but underneath that api

it's talking the I P C the inter process communication to another process which has

the rest of the library

so even if this web trust what web process crashes it's not gonna be able

to crash the browser

or indeed read arbitrary information from the address space

of the U I process

and the foregoing are there any questions about this particular "'cause"

okay reasonable is it a pretty old concept of this point since programs around for

a few years

so to teach you details about what's inside which i think i put this here

to make it easier to understand the practical bits


essentially we have to process is now they need some way to communicate

and i said is what those ways into three distinct

one of the first is messaging so say D web process reads the browser title

and then it needs to tell you i process that i've read the title you

know change

the title bar to reflect that sends a message with some arguments the arguments in

the message or serialise into a chunk of data it sent across socket to the

other side

and then de serialise


and there's also a shared memory which is used for sending big chunks of data

like the what processes finish rendering the page to an image and sends that it's

too big for this socket

it sounds that as a target sure and memory you are process we avoid making

unnecessary companies

and the third is a shared services which are different the czech memory because is

typically are on the gpu

the what processes put something on a gpu you know what's the send it to

the U I process without downloading the data from the gpu again

putting in shared memory in the real putting it

so for instance in

in the X eleven version of repeated okay we use X composite and next damage

sort of like we make a little window manager and we send these gpu services

to the you i process to run

and why do we have to do that that's because

web pages these days more are just asking graphs like colour sing graphs

for three main reasons the first is that we wanna prevent wanna prevent unnecessary redraw

say like some D of is moving animating on top of the rest of web

content only this dave is changing and maybe just only in the position so instead

of constant reread redrawing entire page what if we just stored all the different layers

of the page in the textures and just we can positive those textures on the

gpu again and you use actually really good a composite it turns out so

it it's quite fast you do of really and second thing is three C S

transforms the way those work usually is that they're done on the gpu with a

opengl and in so once you once you start doing work on the gpu it's

really expensive just stop in bring it back into main memory

only to re uploaded again so you can display it that's actually enough to kill

your frame right so

so it sort of a non starter to do that and the same with what

you know web G obviously is opengl which is on a gpu downloading and again

downing andrea pointing again will bring the frame rate below the

the limits of the human eye so

right so the way it works is that the scene graph is built in process

in the web process and web process

and what's the scene graph is there and all the rendering is there

you the composing there you need some way to send those results to do i

process and that's where X composite next damage comes and sort of like the way

a application does all the rendering insensitive the window manager

in the way this will work and lemon is probably that will use a

and embedded women composite

so working that

alright so that sort of

the high level overview of web get to and

in you know we end up inventing work in a few places so some if

you may be asking

should i pour my application to web get to if you use what could U

K or even any other port of work that and

the answer is yes

you should fortification with get to in fact

even if you don't think it'll be useful

the reason is

okay G K is moving in the maintenance moon


it turns out that it takes a lot of work to maintain a web chip

or so

when your team has to maintain to it's a bit harder

in addition

what did you think it work it won't be deprecated at some point because once

you start maintaining work it then you start wearing about security vulnerabilities and fixing bugs


the good thing about this is that web get to is a better api it's

richer it exposes more functionality it's more in line with other web to web reports

it just all around a better right guy because it's the second time around we

made an A P I so we got a lot better at it

and top of all that if you put your navigation web get to

without doing anything other importing it will be faster more responsive

when some random might kind then crashes

but it won't crash or application you can just we started it's very nice


but it's not necessarily easy

for all use cases

some of the problems are that there's not yet up or to porting guide which

is the better shame

because we've and promising it for a while and we don't we have it yet


but there is really good A P I documentation

and the differences between the two basically boiled down to the second point which is

that before

before it made sense to do things synchronously so when you wanted to save the

page images away into the save is done

but in my pocket to that makes a little less sense because now you're

you're sending a message to the web process which again you don't necessarily trust anymore

you know we're starting to just trust things across a process boundary and instead of

waiting for maybe it's better to just

just send the request you know save the page and when you're done with that

let me know


what this means is a lot of it guys very synchronous now and they look

a little bit harder use you have to pass a callback

and use sort of G I O style

J O style is intrinsically i

so the really tricky bit is that if you were doing some sign a some

kind of deep integration with the web content you were interacting with the page changing

in real time then it becomes actually quite a bit trickier because before you could

actually reach down into the library and modify the actual down in memory

but now it's not in memory more it's and some other process

so some of the process you notice that we trust

so what you have to do is used one of these for techniques jetted script

source custom protocols you have to die down bindings are page axes

we the jesse api

so injected script source is a is essentially a and it and the web you

would you give it a string of javascript source

and you send that to the web process to be executed in the page content

in the page context

and the resulting javascript return value will be serialised and sent back to you

so you can imagine writing a small javascript program to walk

the elements of the page and do some processing maybe find

say the password field the kind of the pasture field in getting back a string

from we process

and that looks a bit like this

you call what but you run javascript

with the web you and then the string here is actually the

the script you're right

and then you get a callback pretty simple

and then the callback you call

but it would you run javascript finish like T I O again

and you get this

serialise return value and everything below that is getting the actual javascript core values from

the return value this is funky a J S A P I is are the

javascript for api this is like the A P I for touching the javascript engine


but you can see that we're just converting this value into a string and then

converting that string into a C string it's a little bit of a of the

paying a bit verbose but

but really like other than this callback it's similar to what you would do before

so before talk about

a custom protocols so

maybe views are chromium before maybe and you type about

and you get a web page

and it's almost like instead of H diffusing this about protocol

and that's

exactly what custom particles are

it's that you're gonna grading with the networking library to add a new protocol

to the to the web engine

and not only can you can access pages by unloading them you can actually use


to interact with the with the U I process for instance you can

for instance we have a innovation we have a page about plug ins

and it's not there yet but eventually they'll be a button that says disable

and what that could do is you could send an ajax request

_2d protocol and when it gets that request it process it as if it was

a web server

again to disable the plug in without reloading the page

the big issue with this is that it's a web browser and it subject to

same origin security restrictions which essentially means that if you doing ajax promoting resources there

are restrictions for accessing resources in another


scheme postport triplet which means that if you try to access the cost this your

custom protocol

from a web page on a she's ep then it's not gonna work it's gonna

be a security but quite a security restrictions

don't disables

so this what this looks like now

again we're just sort of

registering this about protocol and again with just a callback

what happens here is that

is that we get the request and we can read the different properties of the

question the path

in here i'm just use in the past the printout a response i'm sending the

response back to the browser

as if i was a web server

so before talk about the other ones i wanna talk about web extensions

so what makes engines are essentially the way

that we've exposed some of the more common techniques of interacting with the page

in this multiprocessor environment

essentially it's the shared object that the web process finds it loads it it's own

address space


you don't have to do in the I P C really

if you just working inside the confines of the web extension

it's a bit like a plug and the loads in the web process

and so you can do things synchronously like walk through the dom and it won't

block the U I process at all we're not you are processed maybe doesn't even


and you have to worry about i the overhead of I P C or

or not

in is great because you have actual direct access to the dom objects just like

you did before

answer and on top of this

the sort of common idea of it injected bundle you something that web get to

exposes and all ports

sometimes it inside a web extension you want to communicate with the U I process

in which case you can just use D bus or whatever you went back

typically we use device

and this is that what that looks like so occur is a source file with

this web kit web extension initialize which is sort of like that you for the

name of the entry point to the to a shared object and what happens is

once we compile this new we shared object and set the extensions directory you'll find

the shared out we can load it and all this call this

this function


you can print but also you can

used G object on bindings


i guess i should probably explain is a little bit too if you're not familiar

with those

so essentially

there's the doll

and if you're familiar with web development you use the dom and javascript

to access the internal structure of the page so you can say like page give

me your

your dave's and you can look at all the did you can see their contents

you can see other properties or C S properties whatever

and that's

that's the javascript down bindings

what that means is that it exposes these you there's inside or see possible subjects

it exposes them to javascript

and likewise you've written G a breakdown bindings which means that you can walk the

dom with do you object

and that means you can walk the don't see or any other language it supports

geography introduction

which is quite nice

and unfortunately not of the dom is in another process we can just do that

from the from the you i process anymore we have to do it in the

web extension

and again we see the

web kit web extension initialize function which

in which we connect

to the page created signal of this extension object so page created is like

you open the browser to

and now we have a new browser time

here in the callback for page created we attach to the document what it signal


so what obviously fires when the document is finishes loading

and that point maybe we need a title using the exact same down binding it

pi so we had a market one

so if you more steps and we kind of get to feature parity with work

at one


so at this point we're waiting

the value of all those things i mentioned before

security stability not exposing users banking information to fishers and scammers versus like a couple

function calls and compound sure object

so finally the most flexible approach which will be unveiled global be and upcoming work

htk release

is that we can

we can use directly the javascript core api to interact with the page

and what this means is that not only can we walk the dom

but we can make a new javascript objects that are backed by native code say

like you make a new object in the page can actually interact with that object

for instance maybe you want to expose some system functionality

to the page

if you're making a hybrid application for instance and you want it to be able

to like put the screen to sleep

or maybe prevent the screen from sleeping if you want your video player application to

not a

some like at a simple it's

what's playing what video

what you can do is you can use this A P I to expose new

objects into the world of the page and have the page javascript interact with it

interact with the application

and as well is that you can just execute arbitrary javascript and the web process

for this you need to know the jobs to cory pi which isn't actually

so complicated but at some point we really like to be able to

just exposed you objects directly with see that that's a ways off


this is the most flexible approach and it's really like it if you really need

the interaction with the page you'll have to do this

our so that was a practical section i hope that it was useful for some

betters to sort of see what's involve important work it to and how about convince

use that it's worth it

and keep in mind that like this is not just what can stick at the

whole web this is beginning to look like this multiple processes

and it it's a

it's beginning to look like this because the web is beginning to look like an

operating system the web platforms getting to look like the application platform

and we already user browsers like this

i mean many of you probably keep

a web browser open all the time with one application running

i mean that's not different in keeping an application running in your window manager i

mean the distinction between web applications and applications is

is almost gone

i keep saying it but it's like a thirty happened


so what's gonna happen with get to in the future

given us the architecture diagram gets a little bit more complicated we have more processes

because we did it once in a work so when i keep doing it

and so we run out of

process handles


so what we have here is the not only do we have web processes we

have no word process worker process stored process

it seems first it seems like a little bit superfluous to be also is also

something like why so many different processes

but really it makes good sense

in fact


when you think about it

we really wanted to send box the web process

we didn't want it to be able to read the disk or

even access the network you know maybe

maybe it's dangerous to allow arbitrary code execution to talk to that work

and one interesting thing is that

the way make it to works now is when the web process crashes all your

times crash

and really it would be nice if it was like from in where when

attack crashed with just that time

so that means we need multiple web processes running

which means that they're all trying to talk to network which should be fine they

could do that separately but once they talk to the now to take all their

data and they try to put into the cash they try to the cookie store

and maybe that cookie store shared among different processes

which means that we start having like contention issues and we have to worry about

multiple writers multiple readers

so instead of handling all that we just split are all the networking all the

cookie storage into it on process and we have all these different processes talk to

this one or process


there are a pi is in the web platform

what if you actually that write to the disk

and if we sandbox the web process to laurie range of the desk and those

if you guys won't work

so instead of having that

capabilities write to the disk there with this possibly militias java script code we split


the disk access use worker process or starts is stored process

and the way that we want to think about like these process communications again is

that we just trust the process on the other side

we will have to cover is if

as if that process has already been compromise is it sending us the most people

message as possible

but that's a lot easier

then if there was no single point of communication between the processes there wasn't just

if we had to make a decision all the time like overseas just we're doing

I P C handle

a similar was talking about snow

we isolate applications from each other as well as really why

our and the

the web process regression all the taps just crash you know that one page

makes a marketing lot easier

the nice thing about this storage process is that this access is really slow so

there's always some walking going on if we if we always do that is increasing

in another process there's no issue with that

it could be a threat but then we couldn't it sandbox


that's a feature vector to and that was my talk so is there any questions

i can answer them now