14 Jul 2024 9 min read Adventures in AI

Not coding, summarising

A cartoon image of a Kaka, a large New Zealand parrot, sitting in a tree looking down at the camera. cartoon, high res, 4k, HDR

One thing that AI tools are very good for is making summaries of large amounts of information.

Part 3 was here

That could be going from multiple documents in Google Docs and cutting it down to the major points, or even a transcript of a Google Meet into meeting notes.

Google Gemini integrated into a Google Doc

Google Gemini integrated into Google Meet

These are really useful functions for a lot of people, and given the volume of content and information out there, these are very useful features - possibly the first "killer app" of AI.

I've used it in a few ways recently, going from either very long form or very short form, into something which is more human readable.

Interviews

Leonie was working on a series of interviews for a client, where she asked a select set of their customers about how they use the product, what they want to see in it, all the usual product manager-market fit-research stuff.

At the end of it, she had 6 transcripts, around 5 hours of conversation, and had to do a summary of them, synthesising out the common topics and themes.

Normally this would be a manual process, but as I'd just got access to Google Gemini we thought we might try that out.

She loaded each transcript into a Google doc (just copy-paste it in), then used Gemini to find common themes and ask other questions of it.

Because this was new, neither of us trusted the AI to be correct, so there was a fair bit of rework to make sure that the conclusions were right. In general, some of the themes were accurate, but some of them were not even close.

However it saved her a huge amount of synthesis time. And it was really useful for finding quotes, but couldn't remember which interview they said it in.

I've not been overly impressed with Gemini's model, and I think Claude or ChatGPT 4 might have given better results, but Gemini was nicely integrated into Google Docs, and it worked quite well.

She had the transcriptions from Microsoft Teams, as text files. They needed a tiny bit of clean up, mostly just marking "speaker 1" to actual names. Google Meet can do this, as well.

In terms of principles:

The input and output were both Leonie's work
The AI was generic (not specific to product management interviews)
The output was verified against the input by a skilled practitioner (Leonie) who used it as a tool, not a replacement for a Product Manager.
None of the output was assumed to be correct, it was checked by the interviewer (Leonie).

A journalist on hiatus

One of my favourite journalists - and lovely human being and neighbour - is Bernard Hickey from The Kaka.

He's recently taken 3 (very deserved) weeks off producing the newsletter, podcast and (insert all the other stuff here), so I thought I might try a few AI bits to see what I can do with the written output he's still been doing, as I've missed his daily podcasts.

I started out with the paid version of the daily newsletter (it has more content than the free one, and I'm a subscriber since day one).

First tool I started with was Gemini. I put the content (copy-paste) into a Google doc and asked Gemini to do a summary of the 6 main points. The result wasn't great - it worked, but it wasn't good.

I moved to using Claude (free), which gave a much better summary, and could expand on the various topics as needed.

Hi Claude. Can you give me a summary of this newsletter, with  2 paramgraphs about each major topic, and a summary of any images, charts and cartoons. If there are images, can you describe whats in them.Thanks

(yes, I usually say thanks to an AI, cos... when they take over the world...)

The result was excellent (abridged as I don't own the input)

I then thought "I wonder if I can get this into podcast format?". I changed the prompt a little to get something in the right format:

Thats great, but can you start with "Hi, this is not Bernard from the Kaka, and this is my morning chorus", and close off with "Thanks for listening to the dawn chorus. Nga Mihi"

.. which it did. I then headed over to Eleven Labs and found a voice to speak it. The result was excellent - but never as good as Bernard doing it himself!

The final test, if I was willing to pay for it, is to use existing podcast recordings to make a "fake Bernard" voice, with Eleven Labs, and have that read it.

A friend, who told me about Eleven Labs, has built a fake Donald Trump voice which is remarkably good, so I presume that this would work fine. It says it needs about 3 mins of audio to get a workable voice, and via The Kaka podcast, I have access to many hours of Bernard talking.

However I wasn't willing to spend the money for this experiment, plus it might be a bit creepy....

The result was an MP3, read in a nice Australian voice, which I could put here or in my podcast app, but I'm not going to: see principles below, part 4.

One podcast which does do this is Arnold's Pump Club, which has a 3-5 min daily message from Arnold Schwarzenegger. Well, it's from an AI voice derived from Arnold. They are up front about this being a synthesised voice, and it's not someone else claiming to be Arnold - it's his podcast (and app).

It's remarkably good - both the voice and the content.

In terms of principles:

I didn't own the input, tho I do pay for access to it. No AI is being trained on the input, its just data passing thru a model, unchanged. I'm ok with doing this for my own use.
I was not sharing the output, it was for personal use and experiment only
I did verify the output, but there was no impact if it was wrong
I drew the line at making a fake AI voice of someone I know. I'm confident it would work, based on other ones, but the cost and creepiness factor was a bit high.

Meeting bullet points into notes

We have been throwing around the options for taking an internal feature / product, and making it something we could offer to others.

There are lots of questions around this, from what form might it take, how the IP would be used and shared, the support load, SLA, legal and medical liability - there are lots of things to think about before we write a single line of code.

So we sat down as a team with a rough guide, and ended up with about 30 bullet points - some with overlap, some unique.

Brain storming with a bunch of very smart people, basically.

After the meeting, I cleaned them up a little into slightly better sentences, as they were written quickly during the meeting (by me), and posed them to Claude with a basic prompt (edited here due to IP / NDA)

We are lookg to build a new api service, called [NAME]. 

[NAME] is an LLM-based product, which would allow [user and input],
and the service would generate [description of the output].

We have ain-house product which does this already, so this new service is taking that IP, and allowing other people to use it in a safe, secure
manner.

We have two options:

* [description of option 1]
* [description of option 2]

We came up with the following notes and possible considerations:

What might [NAME] look like?

[30-odd single line bullet points we came up in with in the meeting]


Can you write a proposal for both options, showing the advantages and disadvantages of each one, with around half a page on each one.

The result were pretty complete. It came up with a summary of the task, a section for each option, and a set of bullet points for advantages and disadvantages for each one (again, edited for NDA/privacy).

I'd be happy to provide a proposal for both options, outlining the advantages and disadvantages of each. Let's start with [brief description of option 1] option, followed by the option of [brief description of option 2].

Option 1: [brief description]

[Paragraph of description of the option]

Advantages:

[five bullet points of advantages]

Disadvantages:

[five bullet points of disadvantages]

Option 2: [brief description]

[Paragraph of description of the option]

Advantages:

[five bullet points of advantages]

Disadvantages:

[five bullet points of disadvantages]

Both options have their merits and drawbacks. The choice between them would depend on various factors, including our long-term strategy, resource availability, desired level of control, and potential for market growth. I'd be happy to discuss these options further or explore any specific aspects in more detail.

I then asked to to expand on the advantages and disadvantages

Thanks. Can you expand on each bullet point item?

Which it did. Nice and clear, nothing I'd call a hallucination, and with a bit of editing, something I could use in a report I'd give to someone who needed a summary of the key issues.

Certainly better than trying to digest a series of rough bullet points!

As a joke, I also asked it, while talking about LLMs and AI with a friend:

Thanks. Can you explain it to a journalist called Juha - convert this into an article which could be published in an IT trade magazine

That output was SPOOKY. It wrote something which I could envision reading in the IT press, bylined it with my friends full name without me even providing it (to be fair there is not likely to be a lot of IT journalists called Juha).

It also appears to have hallucinated some pretty legit sounding quotes. The quotes themselves made sense, tho I doubt the person who was quoted actually said them - if they even exist. Hell, one was from our CTO, and we don't have a CTO.

This is not a replacement for a real journalist - which is good for Juha! But I can see how attractive it might be to an overworked and under-resourced newsroom.

It maybe a useful tool for someone who has to do 10 articles for a syndicated feed, based on minimal press releases (there appears to be quite a bit of this around, for better or worse). It's not something you want to use at scale tho (yes, I'm looking at you, people who took over TUAW)

One of my other workmates uses Gemini to record, transcribe and summarise a lecture on menopause and perimenopause, targeted at our GP's, nurses and other clinical staff.

The result was public (within the company) and I found that I could read and understand it easily as a lay-person. Not sure if this is just a side effect of being around medical folks so much, or Gemini not using technical medical terms. It was very approachable.

In terms of principles:

I (or my workplace) owns the input.
We also own the output, and are in place to check it.
I did verify the output, and the impact is on me if it's wrong. No decisions are made from the output without human intervention.

Conclusion

AI is very good at summarising long form content, and pulling out information. It'll be exceptionally useful for people who need to digest and process large quantities of information and decide based on it.

As always, it still needs someone with some basic skills to differentiate between a hallucination and fact tho.

Luc van Donkersgoed's AWS News is a great example of AI summaries - it takes the output of the AWS News blog feed, summarises it, serves that to customers, and links back to the original news item. Given the (insane) volume of AWS news, this is a very useful service.

Luc also had a good, and entirely justified, rant at AWS about their GenAI focus at the recent summit. He's 100% not wrong: