Google I/O 2010 - How Google builds APIs

>>Zach Maier: Alright. Hi, everybody. Welcome to this session. I can't believe it's standing

room only. This is pretty exciting [laughs] I wasn't expecting that. So my name is Zach

Maier and I'm the product manager for our API Infrastructure team. Joining me on stage

later will be Mark Stahl, the tech lead for the API Infrastructure team, and Yaniv Inbar

and Joey Schorr, two of the engineers on the API Infrastructure team. So, just like every

other session we've had so far during IO, you can follow along with live notes and leave

comments and questions for the end at bit.ly/apiwave and if for some reason you really, really

want to live tweet what I'm saying right now, you can do it with the #googleapi8 so people

can follow along. Alright, so this session is a little bit different from the other session

we have at Google IO. It's kind of a sneak peak at how Google's been building APIs in

the past and what we've noticed and how we're going to build APIs in the future so you're

going to get to see like the foundation of our APIs for the past five years and then

some upcoming technology that we're really, really excited about. So, what we're going

to talk about - this is a 200 level session so we're talking about a few advanced technologies

and some terms that not everybody might know like off the top of their head so I'm going

to give you a quick refresher as to Google's API 101, the underlying technology that all

Google APIs are built on. We'll then go into - Mark will come up and talk about how we're

making future APIs awesome, things we've noticed in the past, and how we're going to be fixing

those. Some things that have already been fixed and some things will be fixed in the

future so you can be on the lookout there for some power features that you can use as

your programming Google APIs, and then what you're really probably all here waiting to

see, Joey's going to come up and show a really awesome demo of the internal tools we use

to build our own APIs. Kind of a behind the scenes. Nobody has ever seen this before outside

of Google so it's really exciting stuff and then of course, questions and comments. If

you have any questions or comments, again, bit.ly/apiwave. Alright, so Google's API 101.

What are the underlying technologies that all Google APIs work on? Well, first off,

since we're a web based company all Google APIs use REST. REST is short for Representational

State Transfer and very simply speaking, its clients and servers exchanging resource representation

somehow. It's good for cached and layered systems which is basically the Internet, right?

So cached and layered systems; Representational State Transfer. In HTTP, every time you make

a get request to a resource, you are using a REST based system so in this case

if you're using the G data YouTube API, you get the resource and it will return like a

list of videos or something like that. The resource representations that we use in Google

APIs - right now we use AtomPub which means they're modeled as feeds of entries and more

generally it's useful to think of these as collections of resources and you'll see why

later. So think of - anytime you get a representation, it's a collection of a bunch of different

resources and just to make sure we're all on the same page with what collection of resource

means; this is sitting on a server right now. I have a collection and I have a bunch of

resources in there. Now, depending on which API I'm using, this collection can be a collection

of contacts if it's a contacts API, a collection of videos if it's a YouTube API, a collection

of documents, or anything else living in a cloud if it's like the documents list API.

Each resource is a document or a contact or a video in that case. So, what can I do with

this information that site on the server? Well, first off, I can get that information

and bring it to my local client, HTTP Verb Get, everybody should know what that means.

So once this is on my client, this resource is on my client, I can of course modify the

resource. Once I modify the resource, I can use PUT Verb and put that resource right back

into the collection, essentially updating that resource with any of the changes I made

on the client. So, what else can I do to these resources? Of course, I can delete a resource

which if I delete a resource, guess what happens? It gets deleted. I can also post a new resource

into the collection. Now this is inherently different then every other operation we have

up here so far. This post actually operates on the collection, not the resource because

since there's no resource living on the server so far, you have to post into the collection

and then once you post into the collection, new stuff appears in the collection. One other

thing you can do on the collection level is you can also get everything in a collection

which in essence lists all of the different resources so you can pull all of the resources

by calling a GET on the collection. Alright, so that's a basic REST recollections resources;

the underlying fundamental technologies of how all our APIs work up to date. So, a quick

question for everybody. Who's seen this logo before? Anybody? Anybody? A few people have,

which props to you guys, because you've been around for a long time then because this is

the very, very first Google Data APIs logo and it's kind of obvious what it looks like,

right? It looks like an atom and as time has progressed, we've dropped the little electron

arrows flying around and we have the Google data cube that we use to represent APIs nowadays.

So what I'm trying to get at with this awesome history of the logo over the past five years

is that right now Google data is basically equivalent to Atom. You must understand Atom

to use the APIs. This piece of core of our APIs is built around the Atom Syndication

Format and the Atom Publishing Protocol and then over time we've extended those core features

with features like query, batch, and concurrency. So far, like this approach that we've taken,

this Atom based approach, has been very, very successful. We have more then 25 APIs at Google,

as I'm sure you're all well aware, and across all those APIs we get about 2

billion hits per day which is a pretty impressive traffic number. If you divided that out, you

can figure out what QPS that is. So, a lot of you have used most of these APIs before;

Blogger, YouTube, calendar, spreadsheets - all these APIs run on the foundation so

we have this awesome foundation built up but we

want to keep it going, right? We don't want to stop with what we have. We want to make

it better, we want to get more APIs out there, and we want these APIs that we launch to be

higher quality. So, inside of Google, and this is the look behind the scenes that I've

guaranteed everybody. We are moving to a brand new API Infrastructure and the cool thing

about this is we've done it transparently so if you're using any of these new APIs that

have been new in the past few weeks; the Google Moderator API, the Google Buzz API, or the

Google Latitude API; you are using this brand new infrastructure. Alright, so I've given

you the foundation, laid the foundation here, told you that we're moving to a brand new

infrastructure but I'm just a product manager so I can talk about all these cool things

but to actually tell you how it's really done here is the tech lead, Mark Stahl. [applause]

>>Mark Stahl: Hello. My name is Mark Stahl and I'm the tech lead for Google APIs. I have

been building APIs for Google for about five years or we've been building infrastructure

that have allowed teams to build APIs for about five years. In the part of the talk,

I'm going to explain a little bit about some of the things we've learned over those five

years; we've been trying to listen; and some of the rough edges we've noticed and the ways

we're trying to improve APIs going forward and the new features and stuff that we're

trying to build into our new stack. So, one of the - I'm going to talk about some of the

rough edges in a couple of areas. First is just that we're dealing with resource representations

on the wire and so the formats on the wire are very important. This is what you guys

are actually - these are the resources you're manipulating. I'll deal a little bit with

REST itself and some of the difficulties that it provides when trying to do certain types

of operations and we also maintain a set of libraries that we try to make available to

help you use our APIs and I'll discuss some of the difficulties we've had and the changes,

again, that we're making going forward to make these things better. So, first, in the

output formats, one of the things about REST you'll know is that it's based on transferring

a resource or as the technical way, it's a representation of a resource; that is some

wire document that says this is the state of everything in that resource. Now, in order

to modify, you saw that you have to transfer that resource representation twice. You have

to pull down a full copy of the state of the resource, you modify what you like, and you

put back a full copy; and this means that just to do as much as changing a single flag

requires you actually changing, transferring the whole resource representation twice, and

a resource can be pretty much anything. It cane be, you know, five lines of configuration

or it can be 400 pieces of meta data and sometimes that can get a little bit verbose. Also, we

built on AtomPub right from the beginning which means we're using Atom Syndication Format

and XML as the core document and we also - maybe some of you have realized that -

resources on the wire can be a little bit verbose. We get this problem a lot of times.

So, we've been looking at this problem for a while and we have been looking for RESTful

ways to solve this problem and we have introduced now, in the last couple of months, you may

have seen come out, we allow for the possibility of partial operations. Realistically speaking,

you are only operating on part of a resource 90% of the time or the only part that you

want so partial operations come in two basic flavors. There is the partial response which

is when you return this resource to me, only give me the part of the resource that I really

want, and I'll just give you a quick example. Here's an example from YouTube and I'm just

going to click through and show you. This is the full XML resource. This is a search

of YouTube for the number one Google IO video and this is a lot of feed metadata coming

up here at the top. You can see the actual entry, this is the beginning of the actual

result. Lots and lots of metadata, lots and lots of metadata; somewhere in here is the

content that I want. Can anyone see it? I think it's up here. No, I can't find it. I

know it's in here because this is actually live and you can see that this is an awful

lot of data to transfer for one resource so assuming that what you want to do is only

say - you say display just the title and of course once you have the content field, you

only want to actually display the content itself which is a link at the moment to Flash.

All you really want is these two fields. So, partial GET, every operation now supports

this concept of a field's parameter and from the field's parameter, you specify a mask

and a mask just says give me what exactly matches this mask and just to make life a

little bit easier, somebody show me a live result. Resolving proxy, that's not good,

is it? Ah, there it is! So this is actually a live result and you'll see this is - if

you're say working in a mobile environment and you're trying to get just a few things,

this is a big deal. This is something that people have really been wanting for a long

time. It's still RESTful and it works just the way you need it to and you'll see here

this is actually a full document in terms of its structure. It actually has a feed tag,

it has an entry tag, and it has a content tag so it still parses the same way except

it's only a subset of what Atom Syndication Format requires. So, going back - we also

have defined another part of partial which we call the partial update. In this case,

you use the exact same masking syntax and you say I'm going to send back a same partial

representation and only update some fraction of that resource. Now I'm not going to go

into this in detail. This was actually launched and this was discussed actually at last Google

IO and we launched it like about three months ago so it's available now on about four APIs

and you can actually go and read all the details. If you are working in a bandwidth constrained

environment or you really need to save space on a device, this is the type of feature you're

going to want to exploit so this is something that's currently available and we'll be rolling

out to all APIs, databased APIs, in the future. Another issue that some of you may have dealt

with; how many people are programming in JavaScript? Anybody? Not me actually. I'm programming

in Java but perhaps you've [laughs] you're trying to use XML results on JavaScript so

we realized that we need to be able to offer alternate formats. XML works great in languages

that have a lot of XML support but JavaScript is actually another format that a lot of people

want and in fact formats need to be flexible. Resources are - it doesn't necessarily mean

AtomPub. A resource can actually be represented any number of ways and so we're supporting

multiple formats, and by multiple formats I mean these are native to the architecture.

They are both readable and writeable. Now, this required us to actually deal with some

architectural issues. When we stated building APIs five years ago, AtomPub was all the hotness

and we built our services actually exported directly into the Atom Syndication Format

so all our services were tired to Atom. When JSON developers came to us and said, "We want

JSON." Well, we built JSON; however, has anybody here worked with our old format JSON? Yeah,

there's a reason and the reason is the JSON is actually XML coded as JSON and it's not

the most pretty thing. JSON developers, of course, is not a natural structure to have

to put namespaces in your JSON objects. It's not a pretty sight. Another feature of this

particular hack that we implemented is that it's only a one way. This is a read only API

so if you used the JavaScript client libraries, you actually could write back but what you

didn't see was under the hood we transcoded it back to XML and then sent it back to the

server, and that was because our servers were built around the concept of reading and writing

and parsing Atom's Publishing Protocol and Atom Syndication Format. So, what we've done

is we've re-architected our how we build APIs from the ground up and we've built and introduced

a generic data concept which you'll see here. We're using, of course, Google's favorite

structure which is Protocol Buffers, and you'll see a little bit later exactly how we do this

but what we've also introduced then is just what everybody else is familiar with is templating

languages but these aren't just simple templating languages, these are bidirectional templating

languages. So by writing a template, you actually get both a read and a write format so we've

solved some difficult problems in order to make it possible for people to really build

APIs and new languages so just to give you a quick, brief idea of how this really impacts

APIs, this is our new Buzz API. You may have all seen this here. So, this is the Atom that

comes out of Buzz and it's an awful lot of Atom here. Buzz is built on the activity streams

specification which means they have an awful lot of metadata but you can see there's an

awful lot of metadata in order to get one piece of content here and somewhere in here

- here he is, there it is. This was my content, I made a Buzz. I'm excited to be speaking

at Google IO. Now, if you look at the same thing, all you have to do now is specify all

JSON and what you're going to get back is this somewhat ugly blob but it actually gets

near - if I can find the right key combination - ah. So, this is a much neater structure.

This is actually native JSON, it's not XMLized JSON and you'll see somewhere in here, a lot

easier to read, is my actual content labeled as actual content so APIs going forward will

now support their own native JSON read write format. We are not longer constrained by the

syntax requirements of Atom, we can actually build a format that's natural for you to work

with, and one of the other nice features of this particular change is that the way we

re-architected our system, this templating model, doesn't restrict us to just these formats.

We can actually start introducing other formats just as easily and in fact we hope to introduce

new formats in the future. Whatever the new hotness is, we're ready to be able to introduce

it into our systems so our APIs will grow as the Web changes and our APIs will be able

to adapt to how they actually change. So, I'd like to switch to our second topic, things

we've noticed in APIs. REST is very much a popular approach, an architectural style and

there's a good reason REST is a popular architectural style. It's built - REST is based on transferring

these resource documents. It's exactly what all your HTTPL and all your web browsing does,

transfer documents. It's really great for cacheability. It's really simple for people

to use these type of APIs but the way it works actually means it can be awkward for certain

type of operations so I built up a small example here. This example starts with the idea of

a Picasa web. Say you want to rotate a photo and we're going to rotate this photo in binary

and we're going to do this a very traditional REST. First, transfer your resource representation

over the wire and get your JPEG. Rotate your photo and write your photo back. So you see

here, depending on how big this resource representation here, I've actually transferred a photo twice

over the wire to do what's a fairly simple operation. Now, how would we solve this in

other ways? Typically you think, "Oh, I should just be able to send the server a command

saying rotate this photo." Well, there is no such command in REST. You can do things

like oh maybe I expose metadata that says let's give the photo a rotate metadata. This

is a certain hack that we've done. This actually is how Picasa did it because we were constrained

by REST. Of course now, of course, you have to transfer this rotate state to set it back.

There is no way to send an imperative command and there are certain types of imperative

commands that are even more common. Say, send email to all attendees of a conference event

or reset this machine. This type of imperative statements are inherently difficult to do

in REST. You can always hack it. Everything can be done, everything can be faked but it's

not a natural approach. A natural approach would be RPC but to switch to RPC, you're

giving up on REST and you are now in a world where you don't have the benefits so what

we've decided to do at Google is we've decided that we're going to be introducing a very

lightweight form of RPC, a RESTful approach, the idea that resources can have extra options

on them instead of the basic three or four verbs that REST gives, we are actually introducing

a form of RPC we call Custom Verbs and this says that I can have a resource, can export

something that says when REST is a difficult way to approach it, here's a custom verb that

let's me do that resource exactly what I want. So, in the Picasa web case, this simplifies

our world greatly. All I do is I send along a command saying identify my resource, here's

the operation I wish to perform on it, here's the parameters, and away I go. Now you'll

see that we've really reduced the amount of information on the wire and we've made it

possible. Within the construct, we still have a RESTful API as our base but we've given

ourselves a way to stop the struggle between RPC and REST and solve some of the tricky

problems by allowing more capability to write APIs when it's appropriate. So, I'll just

show another example real quickly. You probably all have dealt with - has everybody used our

task list API? That's because we don't have a task list API but if we did this is how

you would say set a task done. You take a task, you go to its resource, get it, modify

the done bit, and then you're done and basically put it back. A custom verb approach then is

just to have a method, mark done, and using the same resource identifier you can now perform

operations on resources so this is a feature that we're planning on rolling out on APIs

as necessary in order to make them more powerful. Another tension that we've noticed in API

communities is that not everybody is buying into the RESTful approach. There are other

approaches out there. How many people have dealt with, say open social APIs? If you've

done with open social, you know that open social community decided on JSON RPC as the

standard approach to APIs. However, if you look at what the open social JSON RPC API

is, a large portion of those are actually the same RESTful commands that we operate

in the RESTful world and so what we've decided to do is, you know, there's no reason that

we have to sit here and say, "We're only offering the REST, we're only offering the JSON RPC."

We can actually offer these in parallel and we can let them be good where each one is

best so APIs going forward, again, here's your simple RESTful model but there's not

reason that these all can't be mirrored as JSON RPC models. When you introduce a custom

verb, of course, they actually fit into this framework fairly easily so they work quite

well. Now, you say why are we offering RPC. REST, of course, I said benefits really well

because it's the way the Web works. You benefit from caching, you benefit from the simplicity.

JSON RPC, well what's it's benefits? Probably the number one benefit is going to be the

batching mechanisms and we'll actually support batching that spans multiple services. You

can actually start creating more complex systems through a common API infrastructure. How many

think this is going to be a really nice ability to use what's best at the time you use it?

So, and finally, I'd like to talk a little bit more about the third issue that we've

noticed in building our APIs is our client library strategy. One of the biggest problems

we've had in client libraries is keeping them up to date with all the APIs. Google engineers

are very innovative. They keep introducing new APIs and new features and the core problem

we've had is the way we architect our client libraries, all of that libraries actually

have - what do we have? We have classes literally for every XML element in the output streams

of all these libraries so anytime a service makes a single change to their API, we have

to do another release and what we find is that some of these libraries get a lot of

love. Java's gets a lot of love because everybody in Google, Java is one of our top programming

languages; dot net - some, some love but some other languages; Python - well, we're not

sure how much love - you know, it depends on how much love we get, how much time we

have and it becomes very difficult to keep libraries on the cutting edge so we realized

that this was an architectural issue. How we had designed our libraries and how we had

designed our client library strategy put us in a bind. You weren't able to get libraries

that worked with our APIs or they lagged behind because we had built them in a way that made

it difficult for us to keep up so we started to rethink client libraries from the ground

up, and the very first thing we decided to do is we're going to introduce the concept,

we're going to introduce the idea of discovery into APIs at Google. So, this discovery, what

we're basically - every API now will support a discovery document. It's just JSON but it's

simple to read and you use it to describe these resources. You can describe the URL's,

parameters, whatever so now there is a way for a library to be built that actually leverages

this information and this is built deep into the infrastructure of how we build APIs. It

actually means that these - once you publish an API at Google, the discovery document is

always up to date so I'll give you another quick example. The discovery is just another

API. Buzz being one of the first APIs built actually has a discovery document and here's

a - you'll notice here, this is a URL. This little number here, V zero point one is just

to let you know that discovery is an experimental API. We're not officially releasing it today

but we're innovating in the open, this is Google IO and you are welcome to go look at

this API and see it and give us feedback on it. So, here I'll show you a quick example.

This is the Buzz document and yes, it looks terrible. Now it looks a lot better and you

can see it's pretty straightforward. From the top, we have the name of the service,

it's version. We have a URL and then we start describing what resources exist in this API.

Here's a set of photos that exist in the API and here is a URL template that you can use

to construct access to those photos. I've already mentioned RPC and here's the RPC mechanism

and this is a set of methods that you'll see here. The insert method is a method for adding

photos to a system. Now you've got both of these methods and you can now see how to construct

a RESTful request and how to construct an RPC request without ever having - so the library

is going to be built around this concept. So based on this concept, we are re-architecting

our client libraries. I call them generic client libraries. I hate the name generic

[laughs] Does anybody have a better name? Because I hate the name generic but it really

means is client libraries that are built to be useable with any API built at Google and

that's really the concept. Several features that go into this generic concept - one is

the libraries themselves will be able to leverage this discovery so you no longer have to start

scraping URL's out of documents. You can actually use the names of resources that are a little

more intelligent to get things out of it. Another concept is I told you the data model

classes that was a big problem. We were modeling XML but it's actually, there's actually a

lot of better ways to go about it and so for JavaScript, you use JSON. For Java, we've

actually come up with a mechanism to map plain old Java objects directly to JSON and we'll

show you that in a few minutes and you can basically create the Java data model classes

yourselves in a few minutes. We're also rethinking how we expose some of the advanced features,

making them much easier to use and finally the client libraries have to work on all our

platforms. It's been a long time since we've had a Java library for G data that worked

on Android and that's one of the things we're gonna have. The thing that I most like, and

hopefully you'll like, is that once this library is realized it will work with any API. So,

I've talked about this as a generic concept. We've actually been building this. Again,

we're innovating it in the open and we have a sample to show you and I'm going to invite

- I'm just a tech lead which means I have to invite one of the software engineers up

to show actual code so I'm inviting Yaniv Inbar to come up and show an example of the

client library [applause]

>>Yaniv: Thank you, Mark. So as Mark has been talking about, we're all about innovating

in the open and today we've made available a Java client library for

all Google APIs. It's technology we're still working on, we're experimenting with, but

we wanted to give it out to developers like you so you can try it out

and give us feedback. As Mark said, I'm going to be demoing an Android application for the

recently announced Buzz API so let's take a look at how it works.

So, the first thing I did is when I set up my clips project here, is I checked out the

project from the open source repository where this sample is hosted.

The second thing I did is that I started the emulator and you see this is a G1 device from

2008 and the point I'm making here is that if you're a developer

that's making an Android application, you want to reach the maximum number of users

and the best way to do that is to target the 1.5 SDK which represents

virtually all of the Android market. If you are only targeting, say a device like the

Evo that many of you got today, you are only going to get say less then

a third of the Android market so you have to make that trade off between a better SDK

and greater reach for your application and in this application the

first thing the application does is using intent, it starts the web browser and it shows

an OAuth authentication page. The end user then looks at the set of

permissions that they are granting the application to do, they have to approve that and then

they have to grant our application access to the Buzz API. When

they click "Grant Access", what happens is Android defines, our application is defining

custom URI scheme called Buzz demo that this demo defined and if the

wireless is working we'll redirect back to the application. It looks like we're having

some technical difficulties here so I'll retry that. Alright, let me

just show you the code. So, let me show you a preview how JSON data is modeled into our

plain old Java objects. The key here - [talking in background] The

key here is that there's a content key and that's mapped to a JSON train. The JSON data

model and the Java data class, there's a field called content and

we're using an at key annotation to tell it, "Okay, take that field and map it into the

JSON key." The type here is trade, previous trade forward. If you're

used to JPA on web applications, this is a very familiar concept where they are using

that for persistence. So let's take a look at the Buzz activity class.

I hope you can see that. Now, the Buzz activity is just a container for the Buzz object. It

has an ID field which is represented by a Java field and it has

a Buzz object field called object. Again, I'm mapping from Java fields to JSON keys

and you'll notice that a Buzz activity actually has a lot more fields as

Mark showed you earlier but we're only starting the ones we care about. This is critical on

mobile devices where you really want to keep a low memory profile

for your application. So, let's take a look at what discovery looks like in a concrete

Java application. Here's an example of the post method. Say I want to

knit a canoe Buzz post. I defined a method called activity set insert and I give it a

set of parameters. In this case, the user id is at me and that's really

all the library needs to know in order to make an HTTP request. You execute the request

using the Buzz activity data class to serialize into JSON and we provide a serialization for

XML and in the future we'll provide it for other formats. Here's another method, the

delete method. Straightforward, they all look the same. You define the name of the method

I'm running, activities dot delete, and a set of parameters so let's look at the Buzz

activity feed. Again, this is just a container for Buzz activities and I'm using a list of

Buzz activity as the Java type of the field and at the at connotation, I'm overriding

the field name and I'm using items as a JSON key. There's also a list method here. I won't

go into any details, any more details and finally I'll show you the Buzz Parameters

class. The user id, the scope, and the post id - those were used in the discovery to construct

the URL path. The alt and prettyprint are query parameters so you might be saying to

yourself, "Wait a second, this isn't JSON." No, this is for representing a URL but I'm

using the same at key annotation approach so let's take a look at the - if I can get

the emulator working again - if this doesn't work, I'll go back to Mark. Hopefully it's

directing back to the application using the custom URI scheme that we defined. Alright,

well, it's not working. Oh, it's working. Great [laughs] and if this works, yes, it

will show up over here and let's test it and make sure that we're not just faking this

demo. Here's the profile page. Ah. Let's go back to emulator and let's make another post

[applause] Yes. Ah, let's just delete that one. Okay, so that's it. I encourage you to

download the sample, play with it. Try the application on your own application. Try it

in Android, try it and install a desktop application or a web application so I'll let you Mark

tell you where you can download and where you can give us feedback [applause]

>>Mark: Thank you. It's good we have software engineers for these things so I'd just like

to, just a quick summary what we think is better about this

approach. You've seen that there was some leverage of the discovery which made the API

very easy. The amount of code in that sample, even though we're using

the Buzz API, there was very little Buzz specific code that you had to write and there was no

Buzz specific code included in the library itself and finally,

of course, this Java client library runs on Android as well as APP engine and desktops.

You are welcome to go check it out. This sample has been archived for

your pleasure. You can go read it and see it. The library itself is still in alpha.

We have a pre-released version of it available in our public depository.

Please go try it, give us feedback, and tell us what it needs and what you think of it.

So, that was quick summary of the things that we're doing on the

future of APIs and I've mentioned several of the things that we've tried to fix around

partial data, formats, and so on. Lots of things that we're trying to

make changes in how we build APIs going forward. So, now I'll try and get to what you've probably

really come for. How Google really builds APIs. So, in

order to make all these changes, we've had to change how we build APIs from the ground

up and here to show you some of that architecture and some of those tools that we built, I'm

inviting up Joey Schorr who's another software engineer and he's going to show you exactly

how Google builds APIs.

>>Joey Schorr: Thanks, Mark. I don't need that actually [applause] So imagine that I'm

an engineer on a team and I have to build an API. Traditionally

speaking, I would have to hard code my API into my front end in whatever format was necessary

as we see in mostly Atom. I would then have to manage all the

necessary common functionality such as authentication, logging, and other production concerns. As

we've seen, this can be problematic and as a result we've

built a new powerful API stack that allows Google engineers to create an API in just

a few short steps. To begin, they start by implementing their internal

service. To do so, they define a set of abstract resources using protocol buffers, our internal

serialization and deserialization format, which is also now

an open source project. Then, once the engineer has defined his or her internal resources,

their next step so to define the set of collections and those

operations or verbs that can be performed on the collections by protocol buffer RPC.

Once the internal service has been launched and other engineers and

Google Apps can use it, the next step is to configure our new API stack. The configuration

is a simple JSON data file which maps the REST paths, RPC methods,

and query parameters to the internal collections, resources, and verbs that are necessary for

the API. The API stack also adds all the common functionality

that is needed - authentication, caching, logging - thereby removing the burden from

the Google engineer and putting it on the stack itself. Finally, the

engineer will write the output templates and these represent the bidirectional transformations

between the internal format, protocol buffers, and the

external format, JSON, Atom, XML, etc. Now I'm going to show right here during this demo

how we can implement a very simple API. In this case, a task list

API. So to begin, I start by defining all the resources I need in protocol buffer format.

To do so, I'll define a task message because I want tasks in my

tasks list. I'll define the fields necessary for my resource. In this case, the ID field.

Notice that I have to give the internal protocol buffer identifier

for the field. I'll probably want a description of my task and I might want to specify whether

my task has been completed. Now, once I've defined all the

resources that I need in my API, in this case just a simple task, my next step is to define

the collections necessary. In this case, I'll define the tasks

collection. I'll specify that the resource id is of type string because I used a string

here. I will then specify that the resource itself is the task, of

course, which I just defined right here above and then I need to list all the operations

or verbs that I want to make available as part of my internal

service. To begin, I'm going to work the common REST ones; GET to get a task, LIST to list

all the tasks, and INSERT to add a new task. I might also want a

custom verb. In this case, one to mark a task as being completed, mark as done. It will

take in, excuse me, the task id of the tasks that I want to mark as

done, and might want to return the actual task itself. Now, once I've defined all my

resources and all my collections, my next step is to run an internal

code generator which spits out an interface which I can then implement in order to get

this working as an internal service. I've already done so, so my next

step is to actually config the API stack itself. Now, as I mentioned earlier, our configuration

is just a simple JSON data file. However, we wanted to make

it even easier for Google engineers to create an API in just a few steps. To that end, we've

written a web based tool which I'm going to show to you

externally for the very first time today, which allows Google engineers to create an

API in under 10 minutes. To begin, I click "Create New API". I give the

name of the API, in this case, task list. I give a descriptive title, "My Task List

API". I then have to specify the address of my internal service. In this

case, running a local host 2500. Once it has found my service, my next step is to specify

the mappings of the internal verbs or operations that I just

defined to those operations or methods that will be exposed to the external world. To

begin, I hit "Add Method". I give the RPC name of my method, "Task List

dot tasks dot list". I also give a REST path. This is ensuring that it's accessible both

as REST and RPC so this will be "Tasks slash list". I have to choose

the internal operation that will be called and you can see the system is introspected

upon those operations I just defined, and then I can add additional

methods. In this case, I'm going to add one to Mark, a task is done. I'll give it "Tasks

slash the task id" and done, and I will choose Mark as done. Now, as

we saw, my custom operation required a parameter. In this case, the task ID. Again, you can

see the system has introspected on the fields I've defined and

given them to me here. Now, once I hit "Save", my new API has been created. However, I'm

not quite done yet. In order to truly use this API externally, I

have to define the template that maps the internal resource representation to the external

world. So you can see here I have my new API into the list. I

choose "Templates" and I'm going to want to map my list method to JSON so I choose JSON

and now you can see here the template that represents the

bidirectional mapping between the methods internal source representation and it's external

JSON representation. To begin, I'm going to want to list the tasks

defined in the list. A loop over, excuse me, a loop over all the resources defined in the

entity and then for each of the tasks I'm going to want to list a

JSON object that represents the tasks information itself. In this case, the ID, the description,

and whether it's been done. Now, once I hit "Save", I now

have a fully functioning bidirectional JSON API representing my simple task list and to

that end, I'll actually demonstrate it for you here. Hold on one moment. I'll just do

it over here. To begin, I'm going to want to list all the tasks defined in my service

and I'm probably going to want to prettyprint it to see what it actually looks like. As

you can see, I have a very simple two tasks that I've prepopulated into my service. You

can see the tasks JSON representation here that I've already defined and the ID and some

of the other fields. You can also, again, call this via JSON RPC and I'm also again

going to prettyprint it and you can see here it's the exact same representation with the

exception of the lack of the data envelope. Now, I might want to mark a task as done so

I will do "Mark Done". I'll give it the task id, in this case, "My Task", as defined right

here and of course I'll want to prettyprint it again and you can see now the task has

been returned with the done field set to true and if I go back and list the tasks, refresh,

you can see now the done field has been marked to true so you've seen how we can implement

a very simple yet powerful API in under five minutes during an IO presentation and this

demonstrates the true power of our stack [applause] Zach?

>>Zach: Alright, thanks Joey. So, as Joey just said, three steps and Google engineers

can build an API so I just want to make sure everybody realized what just happened because

it's so cool that every time I think about it, uh, I don't know [laughs] So he took what

we had as an internal service, wrote a few config files, connected it to the new API

stack, and then launched a new API and it was done in literally five minutes as you

guys watched. So, alright, so that being said, we are going to conclude our presentation

on that note. Questions and comments, check out bit.ly/apiwave.

>> I'll pop over there now.

>>Zach: Alright, I'll invite Mark and Yaniv back up so I can have them answer the hard

questions for me. Alright, and there's mics throughout the audience if anybody has any

questions as well. Alright. So, want to know how Google decides what features to offer

when building a new API? Okay, there's two things that Google - well, the sessions started.

Awesome. There's two things that Google wants to do when we launch a new API. One is Google

really believes that any data that belongs to you, belongs to you. Whether it's on our

system or on your system so when we build API's, we build them so they expose all the

data, that way if you want to leave Google, we don't lock you into Google services and

you can get all your data out. Now, that said, you know, exposing all of the data in a simple

API isn't really the best way. We saw a bunch of examples of mobile clients. You don't want

to get all your data every time you want to simply update something so when our engineers

build new APIs, they think about what the use cases for the APIs are and then give you

methods and field that will work the best for your API. You guys have anything to add?

You've built more APIs then I have so [laughs]

>>Joey: I was just going to mention that we add APIs where we can add value. We've added

a lot of APIs to again, a lot of services that make APIs possible. A lot of our apps

APIs are designed to make it possible for say enterprise customers in order to leverage

the app suite and basically teams are looking for ways to add APIs in different ways that

add value and that's just an example of how we choose APIs. So each product, of course,

comes with a slightly different story to a large extent.

>>Zach: Alright. Well, that was all the moderator questions so audience, do you have questions?

>> Yeah. How about a tasks API?

>>Zach: So, yes. There will be a tasks API coming soon. I can say that so I think that

answers your question.

>> We have to take Joey. Joey can probably do one in ten minutes.

>> So, would this discover API, would it be directly possible to have just one client

library that would like auto generate libraries for each and every Google API that supports

discover?

>>Zach: Yeah. So, I mean, as you can see from Yaniv's demo, there's very little Buzz specific

code in there and so with that client library, it could be taken and used with any API that

uses discovery without any updates to the client library itself which is the awesome

part of this. So any API that supports discovery going forward will work with, actually the

client library that you need is already written. It's not in it's final form yet but it will

work. Does that answer your question? Cool.

>> I think it's worth mentioning though that this is still technology we're still developing

so there's no guarantee that the way discovery worked right now in our demo is the final

form for it. We are all talking about innovation in the opening and we wanted to give you a

chance to give us feedback so that's why we released it.

>> And if you have feedback, come grab Yaniv afterwards since he'll be building all this

stuff later [laughs]

>> Why don't we take a live question while we wait?

>>Zach: Yeah. So, go ahead.

>> I'm a big fan of protocol buffers and I noticed that you've got that sort of exposed

internally but you're only exposing tech space external formats. Are there any plans to expose

-

>>Zach: For now, for now.

>> Expose a binary one?

>>Zach: So, the benefit of the new stack is that we can adapt any format very quickly

so be on the lookout for your favorite format coming soon.

>>Yaniv: And make your voice heard. If you need a format, let us know.

>>Zach: Alright. So, some more questions here. What new Google APIs are coming out in the

new future? Now, if I told you guys when new APIs were coming out in the near future, that

would ruin the surprise when they actually launched [laughs] but the second question;

when will Google release a tasks API? I kind of said soon already so there might be a tasks

API coming out in the near future. Maybe. I can't make any promises. Alright, the idea

of a discovery document sound similar to wizdl. Mark, what are the distinctions between ours

and wizdl's?

>>Mark: Well, we are much more similar to - I don't know if you are familiar with it

- but there's an alternative to wizdl called waddle. Are you familiar with that? Which,

if you look at the discovery document format that we're using right now, is actually almost

a JSON-ified version of waddle. The difference between wizdl and waddle is mainly wizdl is

an RPC focused document and it also comes along with it the SOAP protocol. Waddle, which

is a proposed standard, is an XML document that describes RESTful APIs. Ours actually

support REST APIs and JSON RPC so it's a slightly different subset of what we're describing

and we will probably be more compatible with waddle, and in fact, we may offer a waddle

document at some point. It's one of the things that we're actually considering as a feature

so I hope that answers your question.

>>Zach: So on that same line, I just want to reiterate what Mark said and reinforce

that with the fact that Google is really committed to open standards.

You've seen that throughout all of the presentations so far in the past two days. Obviously, we

are working with JSON RPC and Atom so as we do discovery, as it progresses further, you

can expect us to make sure that it's as compatible with as many different discovery formats as

possible. Alright, so - another audience question.

>> How do you deal with convergence issues like throttling?

>>Zach: So throttling and that type of the stuff is all taken care of by the API stack

so none of the teams that actually build an API at Google - it's one of those common features

that we take care of. I don't - do you guys want to talk about that?

>>Joey: Well, traditionally speaking, that was actually not the case and one of the benefits

of the new stack is that teams are no longer responsible, in a visual team basis, for implementing

throttling and logging and other production concerns is now part of new stack and they

get all those benefits merely by being part of this stack as we showed.

>> So do you have like a central operations team for the production side that deals with

those issues?

>>Joey: Yeah. They're sitting in the front row, right here. [laughs] and shrinking down

into their seats as I saw that. Believe that, we've been pretty busy the past few days.

>>Zach: What book would I have people read? Wow. [laughs] So I can honestly say that I

haven't read any books in the past two months because we've been getting ready for Google

IO. Do you guys have any books you'd like to recommend to the audience?

>> The number one thing I read a while ago was just Roy Fielding's thesis just to understand

the concept behind REST and then trying to figure out where that really - you know, the

concepts behind are abstract and a little bit deep in terms of thinking about APIs.

Other then that, I would actually suggest you go read APIs. Go look at Twitter, go look

at New York Times, go look at Facebook, go look at Flickr. This is where - these are

the people who are building the new RESTful APIs that are out there. This is where you

learn more. The books aren't there yet, or at least not that I found so that would be

my recommendation.

>> And of course, take a look at all the developer guides for all the new APIs that have launched

in IO. They are very good resources.

>>Zach: And if you're looking for a fiction book, "Day Watch" and "Night Watch" are awesome

books so you can check those out. Alright, so another question from the audience.

>> Question about the at me. How would you decide the API that uses authentication or

authorization? And the at me, how do you pass that in a message?

>>Zach: I'm sorry. Can you repeat the question?

>> How do you pass the at me in the message? Like when you are defining the message and

how do you decide that it uses authentication?

>>Zach: So at me is just a shortcut for the currently authenticated users so we are not

exposing your users personal email addresses and URIs that are being sent around the Web.

>> But do you define the message as using at me like when you're defining a message,

how do you say that it has to use at me as a parameter let's say?

>>Mark: Let me see if I can answer your question. Typically, we are addressing a resource. There's

a URL template. One of those parameters that are very common is the user ID and that user

ID can be an email address or some form of encoded ID or an at me token is a special

token saying insert user ID that is provided by authentication. It's implicit in the URL

template. Does that answer your question?

>> And that comes in from the stack?

>>Mark: So when you are configuring the stack, one of the fields you give in the tool, which

we didn't demo, is you can actually specify that certain methods can only be called with

certain credentials and as a result those credentials are passed on and in the case

of a user ID, will be filled in by the stack.

>> Okay. Got it.

>>Zach: So I think we have time for one more question before we're up so we'll go live.

>> The question I've had is just from designing different API's. It's always as you're building

them trying to document it and I saw you kind of fill out the API and I didn't know if while

you were building it you had somebody also in line describing kind of what happens or

- I mean, I've always had the problem like now I have this API and then I have to go

back and document it and this whole process. It be much better to merge so I didn't know

how you guys solved that.

>>Zach: So actually one of things that we did not show about this new tool is that it

auto generates documentation for the API. Well, if you go out and look at the Latitude

API docs, there's a reference guide there and that entire reference guide was auto generated

with no work by the engineers other then what you would annotate the fields with as they

are making the API. Obviously, something more like a developers guide where you have to

understand the basics concept that would be really awesome to auto generate that but the

stacks not self-aware yet so it's not going to be able to [laughs]

>>Mark: Just to expand a little bit on that. If you notice while I was writing the template,

it was a variation of JSON. We support JS doc like comments in there and if you annotate

fields with the script of information, when you then go to generate the reference guide

for example, it will automatically pull that information from the template and also from

the configuration information as part of the same tool I demoed.

>>Zach: Alright. So I think that's the end of our session. We've run to the max extent

allowed so thank you everybody for coming. We're glad to have a full room here. Grab

us afterwards if you have any questions [applause]

Temat: Wybierz temat ↓

Poziom:

Wszystkie

Google I/O 2010 - How Google builds APIs