AWS Made Easy

Ask Us Anything: Episode 19

Episode 19
September 27, 2022
1 h 07 min

Latest podcast & videos

September 27, 2022November 3, 2022

1 h 07 min In this episode, Rahul and Stephen continue the theme of Behind the Scenes by showing some of the automation which makes AWS Made Easy possible.

September 20, 2022September 28, 2022

1 h 07 min In this episode, Rahul and Stephen recap the "Behind the Scenes" episode 1, and then discuss a few new AWS announcements, and plan for Behind the ...

September 13, 2022September 20, 2022

1 h 10 min In this episode, Rahul and Stephen begin part 1 of a 3-part series in showing #AWS-powered automation, developed with DevSpaces and DevFlows, to show how they ...

August 30, 2022September 19, 2022

1 h 03 min In this “What’s New Review” post, Rahul and Stephen go over a variety of announcements from AWS. Most of the articles rated very well, with the ...

August 17, 2022September 19, 2022

1 h 11 min In this episode, Rahul and Stephen film from Anaheim, where they were attending an AWS Partner Summit. They filmed from a makeshift studio in a hotel ...

View all »

Summary

In this episode, Rahul and Stephen continue the theme of Behind the Scenes by showing some of the automation which makes AWS Made Easy possible. In particular, we are going to be talking about:

Generating the show notes from the descriptions
Getting the transcript, and running AWS Comprehend on it.
AWS QuickSight dashboard on the words, perhaps on the analytics. What else can we do with it?
Running Kendra on the transcript
Future plans!

Show Notes Generator

As part of each episode, we generate show notes. These show notes become the contents of our awsmadeeasy.com website. For example, check out episode 16 here. In order to create these show notes, we aggregate the text of the custom fields from all of the different segments. When are are doing article reviews, this includes the star rating and “Simplifies / Neutral / Complexity” flags that we use.

Comprehend to Tag the Transcripts

We use AWS Comprehend to Tag the transcripts. The main idea is that Comprehend can process natural language and attach semantic structure, such as the labeling of quantities and entity names and types. To demonstrate this, we use comprehend to scan for named AWS Services. This allows us to filter for episodes where we have mentioned certain services.

This is what it looks like from within ClickUp.

Rahul did a demo of getting started with AWS Comprehend in Episode 14.

Check out our repository for some sample AWS Comprehend code: https://github.com/AWSMadeEasy/LetsCode/tree/ep14

QuickSight Dashboard

We use QuickSight to understand metrics regarding our LiveStream. In this case we used the word-cloud feature, but QuickSight would also be very good for aggregating things like view count of the various Tweets, YouTube views, and LinkedIn posts associated with this.

Transcript

[music]
Stephen

All right. Hello and welcome to “AWS Made Easy: Ask Us Anything” live stream. And this is episode number 19. With your hosts Stephen Barr, that’s me, and Rahul Subramaniam. How you doing, Rahul?
Rahul

Doing very well. How’s your weekend?
Stephen

Oh, really good. We drove out to the Eastern part of the state and we did some stargazing, because Jupiter was close by. And we had some friends with the telescope and we saw Jupiter, we saw Saturn. The next day, they had a 99.9% light filter, so we can stare at the sun and see sun spots. It was really neat.
Rahul

Oh, that’s awesome.
Stephen

How about you, how was your weekend?
Rahul

I have a telescope at home, but I don’t think I’ve taken it out in over a year now. That’s a good prompt to try that out this weekend.

My weekend? Let’s see. I did a bunch of traveling. So, I’ve come into Bangalore. As you can see, my background looks different. And so just short trip over here, headed back to Chennai tomorrow. But different weather, definitely nicer. So, yeah, fun trip.
Stephen

Oh, Bangalore has nicer weather?
Rahul

Yes, Bangalore is at an elevation of 4,000 feet.
Stephen

Oh, okay. So, you get the cool…a little bit of a cool breeze?
Rahul

They get a nice, cool breeze. Yeah, pretty much. And evenings it actually gets chilly, you don’t need air conditioning or even fans for that matter. So, it gets pretty nice.
Stephen

Oh, cool. I’m really glad to hear that. That’s another thing… Oh, here’s another thing I’m doing this…well, actually, it will be after the stream today. This weekend, I found at Costco, for sale, a DJI Mini 2.
Rahul

That’s awesome.
Stephen

I can’t believe how tiny these things are. But after the live stream, I’m going to take the kids and we’re going to try and fly this around somewhere.
Rahul

Oh, phenomenal.
Stephen

So, excited to try that out.
Rahul

Yeah, looking forward to seeing some of the videos out of that setup.
Stephen

Yeah, get some good B-roll footage for the live stream, some good intro footage.
Rahul

Yeah. And what does this do, this has 4K?
Stephen

Yeah, this is 4K at 60 FPS. And I’m amazed that you have this, plus the controller that takes your iPhone, came with a memory card, for like $430.
Rahul

Wow.
Stephen

So, I think I’m amazed at how much it’s gone down in price.
Rahul

Yeah.
Stephen

So, looking forward to giving this a try.
Rahul

Awesome. Okay. We have a very batch review today. There’s tons of very, very exciting stuff.
Stephen

All right.
Rahul

You want to take it away?
Stephen

Absolutely, let’s jump into it. I will cue our transition and we’ll jump into it. Here it goes.

Okay. So, like I said, the first one we’re going to do is introduction and itinerary. And I wanted to give this a try, where I actually put the name of the segment. Oh, I guess it blocks our names. Okay. So, introduction and itinerary. So, I wanted to show you what we’re going to talk about today. And also, since it’s behind the scenes, I’ll just share my screen.

Now, this is normally off to the side when I’m looking. But, actually, so, we see we’ve got episode 19, which is a stream. I’ll see… Can you… Do you need me to zoom in a bit?
Rahul

Yeah, let’s zoom in a tad bit. Yeah. I think that’s good.
Stephen

There we go. All right. So, I’ve got episode 19, which is a live stream. And then I had some tasks that were ahead of time. I had “create the banner.” And I wanted to give a shout-out to our really, really good graphic designer, she, you know, created the banner for the show. So, that was in the advertising. We see the… I asked her to do a little bit of a “Back to the Future” style banner. And then so that happened beforehand. We planned out a few segments. Some of those segments have these sub-tasks that had to get done beforehand.

And then we remember from last time, because we’ve wired up…we have those transitions, those transitions get generated and those URLs get populated here. So, before the show, once I’ve got all the segments planned out in the right order, I click here. That will give me a download of that little…the segment transition videos that you just saw.

So, this is our plan. And so, yeah, I said usually I have this off to the side, but since it’s behind the scenes. We wanted to talk about generating show notes. Getting a transcript, and we’re going to run AWS Comprehend on it. We’re going to show a QuickSight dashboard based on some of the results of Comprehend. And running Kendra on the transcript. And maybe other things that we can do in the future, have another brainstorm.

So, without further banter from me, let’s switch over to the show note generator, step two. Ready?
Rahul

Let’s go.
Stephen

All right. So, show notes. I want to talk about what the show notes are, and also give a shout-out to our really talented Web developer Alex. So, I’ll share this screen. Here we go. Oh, I see. Let me stop this screen and I’ll share. I wish that StreamYard could let you queue up multiple different screen-sharing windows. But anyway, here it goes.

So, at the end of the show, we put together some post-processing notes. And remember, in our “What’s New Review” segment, we’ve been giving…we can go over an article, like high-memory instances, SageMaker pipelines local testing. And then I kind of summarize what Rahul and I talk about. So, again, this is episode 16, we talked about these huge… What were the name of these instances, the 112X large with the 12 terabytes of RAM?
Rahul

I think these are the X2e’s, or the new X2 instances.
Stephen

Yeah, exactly. Yeah. Yeah, that’s right. So, this is what episode 16 looks like. We have our article, our summary of the discussion, our number of clouds out of five, and our verdicts. Remember, we talk about…
Rahul

Simplified…whether the announcement actually simplifies the message or it just adds complexity, or the “complexity alert” that you see on the SageMaker pipelines local testing. I mean, that one still gets me. I still don’t understand why they have local testing.
Stephen

I mean, and I really like the… You know, we could have added a lot of different variables to this, but I really like just the rating and also is it…does it simplify your life or does it complicate your life.

So, how do we get to this point from the show? So, I’m going to share my…go back to my ClickUp view for a second and I’m going to show you what a completed show looks like. All right. Give me a second here to swap screens. Share. I’ll go back.
Rahul

Stephen, you need the UltraWide. You really need the LG UltraWide.
Stephen

Possibly.
Rahul

We were just talking about this before the show, about screen real estate needed.
Stephen

Yeah. No. I mean, well, normally, I’m not zoomed in this far. But all right.

Okay. So, here is… Let’s go back to episode 16, which we just saw. So, here’s episode 16. And we talk about our segment view. Here’s the ratings that we give it. Here’s the articles, out of five. Here’s the URL. And I like this view. But, of course, not everyone uses the same tool, and I have to be aware of that. Our designer, Alex, he wants a Google Doc that he can then turn into this blog post.

So, the way we do that is we have an automation. So, I’ll show how this gets hooked up. In our stream automation here, we have a live…one of these… Ah, here we go. When status changes from “Recorded” to “Processed,” then let’s call the “createShowNotes” lambda. So, I don’t know if you can see that, but one of…
Rahul

Yeah.
Stephen

It’s basically… Yeah. Okay. When “Recorded” to “Processed” and call the “createShowNotes” lambda. And then what that does is it will… Okay. So, let’s…
Rahul

And this is a lambda URL, right?
Stephen

Yes, this is a deployed…
Rahul

This is a lambda function URL.
Stephen

Yeah. So, let me go to stop my share of this and share the browser. Okay, here we go. Okay. So, here’s our lambda function. And all it does is it gets that parent stream task, looks up all of the segments associated with it, looks up a certain field called the “description,” it looks up the discrete fields of the rating and the number of stars. And then it will populate a Google Doc, which looks like this, with this table.

So, I can show how that works. I know Alex has already picked this one up, so I don’t have to…I’m not breaking his workflow. Now, if I… I’m back in ClickUp now and I’m going to switch this workflow from…I’ll switch it to “Recorded,” and then I’ll switch it to…I think it’s from “Ready” to “Recorded.” Here we go. And that should trigger the automation and this should start populating in a second. Let me double-check, I think it was. Yeah, “Ready” to… It’s… Oh, it’s “Recorded” to “Processed.” So, I’ve…it’s in “Recorded” state and I need to advance it to post-processed. There we go. And this should start populating in a second.
Rahul

I was going to say that we haven’t made offerings to the demo gods.
Stephen

Oh, gosh. Let’s see if it does it. Oh, there we go. There it is. It’s kind of cool seeing a Google document just kind of fill in. It’s giving our URL, our ratings. There it is. So, that was just based on that lambda function trigger.

And so I guess the takeaway from this automation is that one of the cool things, it’s really nice to have one tool where everyone can visit. But you also have to work…you don’t want to break other people’s processes. And so by using automation, we can then say, “Okay, we need a Google”… I have this structured way. Our Web guy, he wants a Google Doc. All right, here we go. We can make it for him really easily. And it has this nice table. It’s all powered by this one really simple lambda function. Which, again, all it does is get the parent task, get all the segments, and then it runs this “populate_google_doc” on the parent task and the list of segments and the Google Doc ID, which is also built into the ClickUp task.

So, it looks like our demo gods, that we have won favor with them. And so that’s how we do our show notes. And so it’s really nice. Because, again, earlier in the flow, we would have to…I forget to do this step and it would be…it’s a lot of duplicate information. And so, again, with that programming principle “don’t repeat yourself.”
Rahul

“Don’t repeat yourself.” Yeah.
Stephen

Right? So, I thought I’d rather not…yeah, I’d rather not have to do this twice when I’ve already got all the data nice and structured. So, that is how we create our show notes. All right.
Rahul

Excellent.
Stephen

Cool. Well, let’s see. Should we go on to… Anything else to say about this one?
Rahul

No. I think these days… I think there’s one other aspect that I want to actually talk about. Which is, as you can see, the more of these services or API that we use, it actually makes our lives so much easier. There are a lot of folks who still, you know, try to do stuff on prem. And there’s just real benefit to having platforms like AWS and also like, you know, Google’s entire…you know, we use Google Docs and stuff all the time. So, it’s really beneficial to have all of that, they have core API that are available to use and automate a whole lot of stuff that can make your life easy. So, yeah, that’s the big picture we really want to make.
Stephen

Absolutely. It’s really nice just having this big…to not… I mean, even though, from a programmer’s perspective, it used to be fun to write the whole system yourself, but now thinking I really don’t want to have to…I want to use other people’s workflows, other people’s code, and be able to just integrate it together in a really kind of cohesive way and focus on what my task is. Which is, you know, delivering this live stream, keeping it organized, making sure we can disseminate the information as efficiently as we can.
Rahul

Yeah.
Stephen

All right.
Rahul

Completely agree.
Stephen

Well, let’s switch over to the next segment, which is going to be using Comprehend. All right, we’ll take a quick break.
Rahul

This is probably going to be one of my favorite parts of this show today.
Stephen

All right. Well, here, we’ll take a 20-second break, and then we’ll be back for Comprehend. Here it comes.
Woman

Public cloud costs going up and your AWS bill growing without the right cost controls? CloudFix saves you 10% to 20% on your AWS bill by focusing on AWS-recommended fixes that are 100% safe, with zero downtime and zero degradation in performance. The best part, with your approval, CloudFix finds and implements AWS fixes to help you run more efficiently. Visit cloudfix.com for a free savings assessment.
Stephen

All right. We are back and we’re talking about Comprehend. So, Rahul, what is Comprehend? What’s the use of it?
Rahul

So, think of Comprehend as the machine-learning-based service that is based for…or that’s all about natural language processing. So, it’s AWS’s services…service that takes a bunch of text and it gives you a whole lot of different API calls that do different things. So, if you want, for example, sentiment analysis, you give it a passage and you say, “Tell me what the sentiment is,” for either the passage or a sentence or whatever. Or you could ask it to find all the topics or entities that are being discussed in that particular passage. And that’s really neat.

So, Comprehend has a whole bunch of API that makes it really easy to comprehend programmatically what you’re looking at from a tech standpoint.
Stephen

Yeah, that’s exactly it. We’re… Comprehend is really, really neat, to be able to extract this structured data out of natural text. And actually, you did a really good demo on our “AWSMadeEasy/LetsCode” repository. I’m putting that in the chat right now. So, this is…
Rahul

Yeah. I contrast that with what we used to do maybe 20 years ago, where NLP wasn’t really a thing. We used to write a whole lot of regexes for…to figure out, you know, a certain kind of structure in the sentence or in the word. And English being what it is, or any other language…I mean, most other languages, as well, they aren’t meant for structured programming, right? I mean, that’s why we create very structured programming languages to talk to computers. Natural spoken languages aren’t very precise.
Stephen

Yeah. It’s not very structured.
Rahul

Correct.
Stephen

Well, I’ll show you what we’re using Comprehend for. Okay. So, I put the Comprehend in the chat. So, what we’re using Comprehend… And I think this is really, really neat. I’ll share my ClickUp view first, and then I’ll share the…how it works. Let’s see. Here we go. Window and ClickUp.

Okay. So, in ClickUp, we’re looking at all of our episodes. And we want…we always talk about AWS. And one thing we want to know is what services did we talk about in a particular episode. And so we built this component using Comprehend. So, for example, in episode 11, we talked about QuickSight, Timesteam, Redshift, Dynamo, Neptune, which is a bit rare. Athena, Aurora, Kendra. So, we talked about all these different services.

So, I wanted to show you how this list gets populated. So, I’ll switch gears to… Again, this is all driven by automation. I’ll show you this automation first before we switch over. Here, we’re saying when we get a transcript, then we’re going to call a lambda webhook that runs Comprehend. So, let’s show you that lambda webhook.

So, I’m going to open up a dev space and share that screen with you. Right. Here we go. So, this is in my livestream-automation-v2 repository, which I’ll make public. It’s a… The cool thing about it, we mentioned this last week, this is…if you see here, it says, “Generated from AWSMadeEasy/sam-lambda-py-graviton.” So, we made a template repository that makes it really easy to just deploy a lambda function to a URL.

Now, in this case, I’ve…I have my, here, I’ll show you, my “app.py.” And it’s running Comprehend on…it’s a really simple body. Okay, when we get the event coming in, the task, the ClickUp task in JSON, is the payload. And so we want to run “comprehendTask” on that task. Now, what does “comprehendTask” do? Well, it gets the transcript, it gets the episode number, it runs “detect_entities,” and then it looks for entities of type “TITLE” or “COMMERCIAL_ITEM.” And then it will populate this field in ClickUp with AWS services.

So, I can show you what it looks like when you’re interacting with Comprehend. Give me a moment to get this set up. Let’s see here. “app.py.” Load in all the function definitions. And then we should have a task. Okay, great. So, we’ve got a task. And now, let’s run…we’ll step into “comprehendTask” over here.

All right. So…
Rahul

I think you forgot the function definition.
Stephen

Oh.
Rahul

The copy.
Stephen

Yeah, I should have gotten… Oh, no, I see what I need. I need to populate a variable. Give me one second to… This is one variable I don’t want on the screen. This is my API key. Give me one second. There it is. Okay. One second here. Okay. Clear, and back on.

Okay. So, just stepping through this code, it’s getting the task, and then it’s running. There we go. Let’s look at the entities list. Entities list. Okay, here’s…we found an organization called Lambda, a commercial item called S3 SageMaker, a title called EC2, an organization called Amazon, and so on.

And so this here would… I’m just…this actually is probably not needed, we’re just aggregating it to some…I want to get a count in here. So, we mentioned EC2 once, SageMaker once. And now, what I do is, to map it to ClickUp, I get a list of the custom fields. I already did that. One second.
Rahul

Yes. And by the way, I almost forgot about the fact that you actually get categories for the stuff that you put in Comprehend. Which is virtually impossible. So, even if you did regexes, you know, to identify words or…it’s almost like the tokenizers that you build for a compiler. You know, if you build a tokenizer, it will give you all the words. That’s the exercise of going through and finding the category of a particular word, whether it’s a name, whether it’s a person, whether it’s an entity, whether it’s a title. It’s just impossible to do it. And if you didn’t have ML with NLP, I mean, it would just be an absolute nightmare.
Stephen

Completely agree. It makes this so much easier.

All right. So, what I’ve done is I’ve queried ClickUp for my AWS services field, and these are all the UUIDs of each of the choices that I have. And now I just map what I found to the field values there. And I create a… Okay. So, in this case, I found EC2. And that other one, S3 SageMaker. I look at the value list, S3 SageMaker. That’s still…you know, it’s still natural language processing, sometimes it doesn’t separate it perfectly. So, it will give me this one value that I can update.

So, I wanted to show you how this all ties together. So, at the end of the episode… For example, here’s a speech, Rahul, that you gave recently at the AWS Summit Anaheim. We’ve got the transcript. Now, I’m going to switch over to my ClickUp view. I’m going to ask the StreamYard people if there’s a more efficient way to share multiple windows at a time. But here’s my ClickUp view. Now, here is your CloudFix talk that you gave. And I’m going to go over here and I’m going to paste in the transcript. So, we have automation that would populate this based on a webhook, but we’ll do this manually. Here is the transcript, we will close that. And that’s going to fire out automation. And we’ll wait a second or two for that. Let’s refresh ClickUp and see which AWS services you mentioned in your talk. Okay, there we go.
Rahul

I’m curious, as well.
Stephen

You talked about CloudFront, SageMaker, Redshift, Dynamo, Athena, Aurora, EC2, EBS, EFS, and S3.
Rahul

That makes sense, yeah. Those are all the things that I spoke about.
Stephen

Awesome. And then so now, if…
Rahul

It’s so neat.
Stephen

Well, it’s really cool just to think about what are we actually talking about, we can do other analytics on this. If we wanted to then filter and say, “Well, AWS services, what are all of the times that we talked about,” I don’t know, “Redshift?” Okay. And then it goes back here. It says we talked about Redshift in episode 11 and episode 12 and in Rahul’s CloudFix Anaheim talk.
Rahul

Yeah.
Stephen

So.
Rahul

Exactly. I mean, something like that, looking through video conversations in an organized manner like this, would have been virtually impossible just a couple of years ago. It’s amazing that we’re able to do that now.
Stephen

And we can look at… So, the other thing that we want to look at, thinking about the future, is we can also make a custom entity recognition model. So, for example, when we talk about, you know, certain products that we like that aren’t S3. For example, we talk about dev flows, about dev spaces. We could make a custom entity recognition model that’s looking for those. And then we could look at when do we mention those, when do we talk about other companies, other people.
Rahul

Yeah.
Stephen

Because people gets tagged as its own entity. So, we can run custom entity recognition models using the same template, and then having them mapped to a fixed set of tags. It makes it really easy to stay organized. I think that’s really the name of the game. When we were five or six episodes in, it’s really easy to remember what we did for each episode. But now we’re at, wow, this is episode 19. It is easy to forget when we talked about a particular topic. And so by being able to stay organized in this way, it’s been really helpful.
Rahul

Correct. And between this, a bunch of other podcasts, and all the other content stuff that we’ve been working on, it absolutely…I mean, it gets so confusing about where we spoke about a particular topic or, you know, where to point when somebody comes back to us with a question and says, “Hey, do you have an opinion on X, Y, or Z?” It is really hard to [inaudible 00:32:39]…or it’s very, very hard to point them to that particular video or that particular episode. I remember, you know, the first one or two episodes are great, it was real easy. We covered a bunch of topics. And now [inaudible 00:32:53] figure out, “Hey, we talked about that announcement about custom”… In fact, why don’t we do this? We…there was an announcement about SageMaker entities, SageMaker custom entities. We talked about that in one of the episodes.
Stephen

So, let’s see.
Rahul

You might have to share your screen again.
Stephen

Yeah. I’ll put this on here. Oops. Add the stream. There we go. So, I’ll go to “AWS_SERVICES” and look at “SageMaker.”
Rahul

Yeah.
Stephen

Here’s the times that we’ve talked about SageMaker. Let’s see. We’ve talked about it in episode 2, episode 7, the “What’s New Review.” Let’s… And we talked about it in 10 and 16.
Rahul

I think it’s in the “What’s New Review.” I think it was episode 16, the “What’s New Review,” where we talked about custom entities that you could build into SageMaker models.
Stephen

Well, let’s have a look at 16 and see which segments we covered. Okay. So, it’s… Here we go. We talked… No, that was the SageMaker pipelines.
Rahul

No, that is not it. So, then it was the other one. It was episode… What is it again?
Stephen

Let me add this filter.
Rahul

But we can imagine doing this without having all these tags and these entities already filled in. And I think I want to talk about an easier way to figure this out, right?
Stephen

Yes. I think this is actually a really good… What do you call this? This is a really good preview. Because even this is good for tagging, and eventually I want to tag the individual segments. Okay, here we go. Segment “SageMaker Autopilot experiments are now 10x faster.”
Rahul

No, it’s the next one. Yeah, “Announcing Heterogeneous Clusters for model training.”
Stephen

Yeah.
Rahul

Yeah. I think I’ve spoken about custom entities during that type of stuff.
Stephen

And then we can look at the… We’ve got the article URL. This was with Badri. So, pretty neat. We have all that mapped. Now, some of this automation we’ve developed as we’ve gone along. So, I’m still kind of backfilling all of the processes.

Oh, and also, there’s one thing I also wanted to share, I guess as part of this little sub-segment. The data model that we’ve come up with to kind of manage all this. So, we’re talking about segments and highlight as if they’re, you know, very distinct things. Let me share this, share my screen. Here we go.

So, this is…I think this makes sense in terms of how I think about it. So, when we’re planning, we’ve got segments that we want to talk about. Right? We’ll say, “Okay. First, we’re going to talk about our show notes generator. Then, we’re going to talk about Comprehend. And then, we’re going to talk about QuickSight. And then, we’re going to talk about Kendra.” So, we have this linear progression of segments that we’ve thought through. And we don’t know how long we’re going to talk. Sometimes you can talk a long time or a short time.

So, then, when we actually record it, we run those transitions that, again, we talked about last time, that we generate, that put a few of these frames in here that we can key on. And then we talked about in the last episode segment…AWS, this is, recognition segment detector API is going to tell us exactly when that happens.

And then, so, we run that automation against our video, we get these numbers, and that helps us generate what we call highlights. So, highlights, we have one highlight per segment, but we can also have more. For example, if we just have a short little quote that we wanted to talk about. So, a highlight, it’s at least a one-to-one with the segments, but sometimes there’s more.

And then when we post the highlights, depending on where they’re posted… So, these get posted to LinkedIn, YouTube, and our Twitch channel. And then when we make a social media post, we’ll post the highlight, plus some commentary, in the social media post. But the trick thing is on LinkedIn and YouTube and Twitch, you’re not really limited. But on Twitter, you’re limited to a 90-second clip. So, we take the highlight and generate what we call a preview highlight, which is, at most, 90 seconds. And then it says…we have this little three-second video that we tack on to the end that says, “For the full video, see the link in the tweet.” And that way we can use Twitter live, but stay under that 90-second threshold.

And so that’s our flow. And so this whole process is managing this from our free-form recording here, plus the segments, all the way into this very structured view. And then I think in the…in a future episode, we can show how we actually gather this all back in terms of a big-picture view of how the social is doing.
Rahul

Right.
Stephen

Cool. So, that’s our data model. And so when we’re talking about segments to highlights to social posts, that’s what we’re doing.
Rahul

Excellent. So, we leave that problem that we just spoke about, where we wanted to search for some context or some content in an episode that we did in the past, and how we use a different AWS service to achieve that. So, okay. So, which is the next service that we’re talking about?
Stephen

Well, let’s jump out of order here and let’s go to Kendra. Here it goes.

Okay. So, Kendra is super cool. So, here’s the flow we want to talk about with Kendra, let me put this up on the screen. So, you saw here that we have our transcript as a field in ClickUp.
Rahul

Stephen.
Stephen

We have…
Rahul

Yeah, there we go. Yeah.
Stephen

Oh, here we go. So, we have our transcript as a field in ClickUp. Now, when we also set that transcript, we also have a lambda function that’s going to take that transcript and it’s going to write it out as a text file and put it into an S3 bucket. Oh, not a video bucket. Again, this is behind the scenes. Just an S3 bucket. Now, this bucket is configured to talk to Kendra, and Kendra is watching it, and then we can query it.

So, let’s look at it from the AWS perspective. Here is our Kendra setup. I probably should zoom in here. So, we have… I’ll have to regenerate some of this later. Anyway, we have an index, we have a data source. And now, let’s have a look at how we can interact with it.

So, I’m starting up…
Rahul

Just some background while this thing is coming up. A quick background. So, before Kendra existed… A simple way to think about Kendra is, like, Kendra is like…is Google Search for your data. Though we’re talking about Amazon here, so it’s a little confusing that way. But think of what you experience when you type some text in the search input field, you just expect to magically get answers.

If you had to do that behind the scenes yourself, what you would typically do is put together something like Elasticsearch, and then you’ll figure out some…you know, you’ll stitch together a bunch of NLP tools that would give you all of these entities that you’ll index on. And then, you’d basically take all of that data, put together a pipeline that ingests all this data into Elasticsearch. And then, you’d query Elasticsearch, and then present it out. Right? And then, you’ll have to do a bunch of tweaking to figure out what relevance means and what is appropriate in certain contexts. And then you’d build layers of, I guess, rule-based systems that, you know, handles all of this stuff. And then, hopefully, get good search results at the end of it. In our experience, dealing with a lot of CRM solutions and Internet wiki solutions and stuff within our portfolio, that self-built stuff was an unmitigated disaster.

So, when Kendra came along, we got so excited for what it does. So, Stephen, you want to show everyone?
Stephen

Yeah, absolutely. All right. Here it goes. And again, I remember at one point I worked at an e-mail hosting start-up and I remember building my first search engine literally as a wrapper around grep. So, you had a “cgi-bin” script which was taking some input, calling grep on the file system under the hood, parsing the results back, and then, you know, using, you know, old CGI, if you remember that, and posting it back. And it worked kind of well. I remember at some point thinking…like, I understood the Perl regular expression engine well enough to think, “I know how I can crash this.” And I searched for a sequence of characters, and sure enough. And that’s the thing that you’ll always be dealing with that.
Rahul

Correct.
Stephen

Even now, 25 years later, you’ll still be dealing with random issues that come up if you try and roll your own.

Okay. So, here is Kendra. So, again, just showing the…how it works. I think I showed that. Yeah. So, we have our episodes that go into an S3 bucket, Kendra indexes it, and then we can query it. So, here’s a dev space that’s located…that has a Kendra script in it. And I’ll show you how we can interact with it. So, let’s say “python3 kendra.py.”

Now, a few episodes ago I told a story about how my Australian wife went into a store and she ordered hot chips. And they microwaved a bag of potato chips and handed it to her. And I don’t know if they were just joking with her or what they were doing, but it was a funny anecdote. And I don’t remember exactly what episode it was. So, say if we look for “French fries hot chips” into Kendra.

Okay. Now, it gives us the answer. Oh, our first one is an answer. It says in the episode 12 episode, and then here’s the little sentence that we use. I guess I was talking at this point telling…in our opening banter, telling that funny story. Oh, there’s episode 12. So, that’s our answer.

Now, let’s see. What other things do we have? Oh, well, here, document three. So, then, the score starts dropping. We’re talking about “hot,” it being hot. So, it’s not matching the “chips” part, but it’s matching “hot.” And then later, we talked about Rahul’s plants in episode 10 being hot, but we used the word “44 centigrade” and “hot.” So, then, you see the search kind of goes down.

What about a time we talked about stored procedures? Let’s do “stored procedures.”
Rahul

Was this the episode we talked about bad practices?
Stephen

Well, we have a couple of times, but they’re always… Let’s see. So, the most one here. Here we go, episode 15, “1,400 stored procedures.” Just a mountain of stored procedures.
Rahul

Oh, yeah, I wrote that example.
Stephen

Yeah. Here’s episode 11, “…we look at sediment buildup under these rocks…that’s obviously stored procedures calling stored procedures.” Here we go, episode 12, “SQL, where it’s both a strength and a weakness that you can do something…complicated.” So, again, episode seven, we talked about stored procedures. I don’t think we ever talk about them in a great way.
Rahul

So, the big insight that comes from Kendra is that I have a pet peeve about SQL stored procedures.
Stephen

Let’s see when we talk about SageMaker. This is cool. Okay, SageMaker. Episode seven, “SageMaker Data Wrangler,” “SageMaker Canvas.” We talked about it in episode 16, SageMaker cloud, “Why would they do this?” Still, again, we found that one.

So, it’s really neat. We can now search our documents intelligently. I think when we have a few more episodes under our belt, we’ll be able to use the SageMaker querying, which will be kind of fun.
Rahul

Yeah. And just imagine if you had any kind of document store and you want an NLP-based search. It’s really simple to put this together with Kendra and make it available as, you know, a searchable index and you can ask natural-language-based questions of it.

So, it’s sort of actually preventing [inaudible 00:46:14], and I don’t know if this does it, with the various script [inaudible 00:46:19]
Stephen

Okay.
Rahul

But why don’t we search for, “When did I ask for hot chips?,” or, “When did we talk about hot chips?” What is the first one that came up?
Stephen

Let’s… Episode 12, document.
Rahul

Yeah.
Stephen

Interesting.
Rahul

So, you can actually ask natural-language-based questions of Kendra and it understands it. You don’t have to do the Elasticsearch [inaudible 00:46:56] only words or tags. You can ask natural-language-based questions. Of Kendra.
Stephen

Look at this one, “graviton.” 14, 13, 13, 12, 11, 18, 4, 15. We like the Graviton. And we can…you know, we should run sentiment analysis on all of our different topics.
Rahul

Yeah.
Stephen

On the text near. And then, we can… That would be a funny table to generate. I’ll work on that for next time. So, I’ll run all of the different AWS services, run a Kendra search on all of them, and then report our sentiment. Because I think that would be really…

Okay. And, okay. So, we got a comment from a viewer, really happy about this one. “Whoa, I just joined with just a raw idea of AWS from end user perspective…curious about all this…to see the magic happening…” All right, thank you. I hope you do check out the full episode and let us know if you have any questions or ideas for a future one. And I think this summarizes the recognition of Kendra, is just, yeah, that’s how we feel about it, too. Kendra is super cool. And this is stuff that, you know, it’s only a few lines of Python and mostly, you know, moving data around. The magic is really just pointing it to an S3 bucket full of text and asking it questions.
Rahul

Correct. Okay.
Stephen

All right. Well, anything else we…
Rahul

So, there’s Kendra. No, try it out. Try it out. And anything that has…wherever you have tons of text, this is the holy grail of searching through loads and loads of text.

Oh, one other thing that…you know, just a general pattern that we’ve seen that I’ve found very useful is to take any kind of, you know, content you have, it could be audio, video, text. The first…the standard pipeline that we apply almost everywhere is, first, just as a general practice, translate that to English. Because that could be a common way in which you can process all the data and index it and search it and do all of that stuff.

So, what we typically do is do the translation step. And then once we do translation, then we… Sorry, for video and audio we transcribe, and then do the translation. And then, once you have the translated text all in English, then you can run through the regular pipeline of Comprehend for tagging entity recognition and stuff like that. You can then run through all of the…just index it into Kendra and make it fully searchable.

One of the big projects that’s currently underway within our org is we are taking all of our meetings from forever that we’ve had in the likes of Zoom and Chime, and we’re just indexing it all so that we can search through it and actually figure out… There’s just so much tribal knowledge that’s stuck in these videos and meeting recordings that if only we could unleash that value and go refer to it and say, “Hey, you know, I was talking to Stephen in some meeting from I don’t know when.” And, we were talking about SageMaker. What meeting was that?” To be able to ask that question and get it answered is incredibly valuable as an organization.
Stephen

What I really like, in addition to everything you just said. Just thinking back to the promises of the semantic Web projects, right? Where they wanted to add all this structure. And that was a real… If you dig back through some of that stuff and all the stuff that got invented, it was really interesting to see, “Okay, how much”…if you have to do this manually, when are you talking about a person, when are you talking about a place, when are you talking about a quantity. And in order to, like, actually write semantic-compliant documents, how cumbersome it is, and you have to learn about RDF triples and all this stuff to really be able to add the structure to your document.

But now, you can just say…get it into a text file. If it’s an audio, that’s no problem, you can get audio or video to text pretty quickly. And then, now that it’s just in plain old English, or whatever language, you can add structure to it automatically and be able to say, “Here’s the persons, here’s the places, here’s the things, here’s custom entities that we have trained our model on to look for,” and add all of that structure in after the fact.
Rahul

Correct.
Stephen

Which is…it’s really, really exciting. And like you said, because you’ve stored all those meetings, you can then run it retrospectively and add all this richness. It’s not just, you know, going forward. You can, you know, through one big batch job, push all your meetings through. It’s really cool.
Rahul

Exactly. I’m just going to add two more services just for the audience to keep in mind to look into. The first one is Rekognition. So, whenever… So, just like Comprehend is to NLP, or the natural language processing stuff, or to text in general, Rekognition does the same stuff but for video. So, if you wanted to search for videos where there are…you know, there might be a banner that says something, some text in it, or you might want… I mean, we use Rekognition for identifying segments. And we just discussed that in the last episode. There are a bunch of different things that Rekognition can actually search for and find for you. And so take a look at that.

And the last recommendation that I have in this suite of AWS services is Textract. So, let’s say you have a PDF document or a bill or things like that, and you actually want structure. Now, it’s easy for our programs to look at JSON and deal with it. But if you give it a PDF document, you know, which has, you know, a bill where it has a title, it has a table in there with a bunch of data and stuff like that, it’s really hard for machines to parse it. I mean, you can hope that it’s all going to be structured, but things never are. o, there is a lot of very neat technology that goes into Textract to give you all of that data in JSON form. And JSON is very easily readable and very easy to process.

So, when you have documents in that format and that structure, just like forms are a perfect example. You know, when you have online forms… Or, actually, not even online forms. If you have physical forms that people have to write into and submit, then using something like Textract to actually get a digital version of that with a full JSON…you know, JSON evaluation of your form is incredibly valuable.
Stephen

Actually, I used Textract at a consulting product a few years ago where these people were working with a supplier that were very old-fashioned. And there was no ERP or any kind of accounting software, they just sent a PDF with, you know, item number, order date, whatever. And we set up a system where the guy who was receiving these orders, whenever they sent him the invoice, he could just forward it to an e-mail address that was dedicated to Textract and Textract would take it from there. And we didn’t have to… Because before that, he was literally opening up this PDF, opening up his Xero accounting software, and just moving the data around. And with Textract, it made it so much easier to get all these different structure out of this data. Because it was really neat data, but they would only give us PDFs, the vendor. So, Textract is a great way to go.
Rahul

Here’s another very, very interesting use case, and it’s in our setup. So, whenever we acquire companies, invariably you also get all the other vendor contracts and all of your other, you know, relationships that are legal contracts, or even customer contracts. You get them all as big PDFs. Now, to take that and put that into your CRM system or to put it into your financial system or to understand what the risks or liabilities are, the due diligence that you need to perform on those, can pick up so much easier with Textract. Because you’re literally looking for a section called “liabilities,” or you’re looking for a section for SLAs, or you’re looking for a particular section in the text and you know what you’re looking for. And whether it’s a table or whether it’s a…you know, it’s a… In the appendix, you basically have these tables of a bunch of very important data that you want to capture. Textract does that for you and gives it to you in JSON form. And then you can process it any way you want.
Stephen

Oh, I wonder how Kendra would do with legal questions.
Rahul

Apparently, pretty decently. There is a separate set of entities you may have to train it on, but legal is one of the big…legal and insurance, I heard from the Kendra team, are some very interesting and exciting use cases for them.
Stephen

Well, you know, let’s…I’ll reach out to someone on the Kendra team. Because I think we have enough questions that it would be good to have a full interview. So, I’ll reach out to someone on the Kendra team, let’s make that happen in the next couple of weeks. That would be really fun.
Rahul

Sounds great.
Stephen

Well, should we do our last segment?
Rahul

Yeah, let’s do it.
Stephen

All right. We’re going to show you QuickSight.

All right. So, I’m going to share my Web…my browser again. Here we go. There we go. Okay. So, QuickSight. Now, looking at the QuickSight dashboard, here, I’ll just show you an example. So, we’re taking our transcripts, and then running Comprehend on them, getting the tags. Another thing I did was just dump the text files into an S3 bucket. And now, here is what our QuickSight dashboard looks like. I did a word count because the numerical quantities, you know, that’s a bit more proprietary. But the word count here, let’s see if we can do this.

So, let’s look at a word count by episode. Let’s go to episode…an older one, let’s go to episode…well, actually, let’s go to 12. Let’s see what we talked about episode 12. Interesting. Wickr Pro. We still have to follow up on Wickr and see how that’s going. Episode 13. So, these are…what I’ve done here is I’ve made a size-weighted dashboard on the different entities that we found in the transcripts, and then parameterized it by the episode number. So, if you’re a Tableau user, this would feel very familiar, QuickSight.

Now, Rahul, what are some of the uses of QuickSight that we use?
Rahul

So, we have basically replaced most of our analytics platform with a pattern that I call storage, query, and dashboard. So, QuickSight does the dashboarding part of it. So, let me start with storage. There are two kinds…we pick one of two. So, depending on what the latency requirements on analytics are, we either pick S3, which is our default. Or, if there’s a requirement to have, you know, responses back in a sub-500-to-600-millisecond time frame, then we switch all that stuff over to Redshift. So, that’s your data storage. So, choose between either S3 or Redshift.

Then the second part of it is to be able to write queries or views of this data. Right? And just like you would have with any other SQL data source. And for that, we use Athena. And Athena gives us a bunch of data sets that are processed. So, sometimes you have to do a bunch of joins, you have to do a bunch of processing on this data and get into a view, in SQL terms. And that’s basically what we get with Athena.

And once we have that view, we load it into QuickSight, that then allows us to slice and dice the data any way we want. And that’s become the standard pattern for all of our BI and analytics.
Stephen

It’s really powerful. And having been around and seeing people in the Tableau world, right? This can get you so… I mean, this is very, very much in the same realm. It’s really easy. I mean, I’m not a…I hadn’t used it very much before, but it only took me a few minutes of experimenting to… Literally, all you do, you make a data set here. You can point it at an S3 bucket or you can… Let’s see.
Rahul

You can point it to an Athena query, as well.
Stephen

Oh, here we go, new data set.
Rahul

Correct.
Stephen

Let’s make this a bit bigger here. Yeah. So, you have lots of choices about where you point this, where you want to analyze. I want to try the Twitter analysis at some point.
Rahul

Yeah.
Stephen

You point it to an S3 bucket, Athena, MySQL. We do all of that, and then, literally, just make a dashboard. You can, you know, drag and drop different dimensions, look at different types of visualizations. And I think it’s a really fun way of just, again, quantifying and understanding what we’re talking about. I’m going to make a dashboard we can use to understand, you know, once I’m pulling together all the different social data, to understand which of our highlights are more impactful to understand how all of that works. But it’s really exciting that we have this to visualize and, you know, we don’t have to think about building this out ourselves. Right? It’s having to…not having to deal with plotting engines and all of the… You know, behind the scenes of all of this is some really nitty-gritty code about joining and dealing with date formatting and all of this stuff. We can just focus on the data that we actually want to display.
Rahul

Correct. And the best part about QuickSight, or the part that I love the most about QuickSight, is that you can literally embed an entire QuickSight dashboard into any application. So, for most of our applications, we literally just do all of our work in QuickSight, get all the dashboards, if we want, ready, export that as a view, and that becomes…that can literally be embedded into any of our enterprise products or applications in a neat, you know, layout. Where it might have a menu on the side or on the top, and then the dashboard just shows up in the middle. And we can talk to it, we can pass it, we can team it, we can customize it. There’s so much you can do with it when it’s embedded in an application, it’s pretty neat.

So, we’ve kind of…there was so much work that was being done in building custom dashboards in the past with all of our enterprise products. We just stopped it all. We’re like, “Everything has to be replaced with QuickSight dashboards.” Because you can then build a dashboard separately, embed it, and it just works.
Stephen

Oh, it’s so much better than trying to… I still remember, again, the custom dashboards that I wrote where I’m trying to script R to process a CSV file, dump a PNG somewhere, and then pull it back in. And it just…it becomes so fragile. Having, yeah, this very rock-solid dashboarding solution that’s integrated with all of your other data sources, really neat.

Well, we have about a minute or two left. I wanted to see if we wanted to brainstorm for future automations or talk about any of those things. Let’s just do a quick transition.

So, what do you think? What’s the future hold for automation, for us and in general?
Rahul

I think one of the things that I’ve always wanted to do is, on awsmadeeasy.com, I want to embed the QuickSight…sorry, the Kendra search bar so that folks can literally query for anything, any content that we put out, without, you know, having to worry about tags and without having to worry about, you know, specific episodes. They should just say, “Tell me everything you know about SageMaker,” or, “Tell me, you know, which…you know, bring up all the videos talking about instance-type X2e with 128 terabytes of RAM,” you know, or how much ever that was.
Stephen

112.
Rahul

112.
Stephen

For the big SAP HANA instances. Or, actually… So, and I think I like that idea we came up with earlier, is take every service, run it through Kendra, take the blob of text that comes out, run that through, I guess it would be, Comprehend to get the sentiment.
Rahul

Yeah.
Stephen

And then just have a little table.
Rahul

Yeah, it’s interesting. So, are you talking about the blog post announcements and…
Stephen

No, I want to be able to search the transcripts for each topic.
Rahul

Oh, yeah, the transcripts, absolutely. Absolutely.
Stephen

And then once we have the little snippet of the service that we’re talking about, to get the sentiment of the text surrounding that snippet. It’s going to say, “Okay, every time we talk about stored procedures, sentiment negative. Every time we talk about”… Let’s see, what’s a good one? “If we talk about lambda function URLs, that’s positive.”
Rahul

Yeah.
Stephen

And then maybe there will be some that are mixed. And I would love to see the distributions of each of those.
Rahul

Yeah. One piece of warning on that. Comprehend does not do well with sarcasm.
Stephen

Oh, we might be in trouble.
Rahul

We’ve not been able to make Comprehend understand sarcasm yet, but it does a pretty decent job, you know, in all other cases.
Stephen

All right. We’ll have to see. You know, sarcasm is hard enough for some people, so can’t blame Comprehend.

All right. Well, this has been a really fun show and hopefully it was useful to the audience, of automating your own live streams. There will be a blog post out by the end of the week that will have some template repositories that you can go play with, whether it’s Kendra, whether it’s Comprehend. And again, I hope this is useful to you, and please let us know if you have any questions. And we’ll see you next… Let’s see. Not next week, but the week after.
Rahul

Yeah. Thanks, everyone. And thanks, Stephen, for walking us through all of your automations.
Stephen

All right. My pleasure. All right.
Rahul

This was fun.
Stephen

Have a good one.
Rahul

Goodbye.
[music]
Woman

Is your AWS public cloud bill growing? While most solutions focus on visibility, CloudFix saves you 10% to 20% on your AWS bill by finding and implementing AWS-recommended fixes that are 100% safe. There’s zero downtime and zero degradation in performance. We’ve helped organizations save millions of dollars across tens of thousands of AWS instances. Interested in seeing how much money you can save? Visit cloudfix.com to schedule a free assessment and see your yearly savings.