Analysing News Article Content with Google Cloud Natural Language API

In my previous blog post I showed how to use AI Platform Training to fine-tune a custom NLP model using PyTorch and the transformers library. In this post we take advantage of Google’s pre-trained AI models for NLP and use Cloud Natural Language API to analyse text.

Google’s pre-trained machine learning APIs are great for building working AI prototypes and proof of concepts in matter of hours. Google’s Cloud Natural Language API allows you to do named entity recognition, sentiment analysis, content classification and syntax analysis using a simple REST API. The API supports Python, Go, Java, Node.js, Ruby, PHP and C#. In this post we’ll be using the Python API.

Photo by AbsolutVision on Unsplash

Before we jump in, let’s define our use case. To highlight the simplicity and power of the API, I’m going to use it to analyse the contents of news articles. In particular, I want to find out if the latest articles published in The Guardian’s world news section contain mentions of famous people and if those mentions have a positive or a negative sentiment. I also want to find out the overall sentiment of the news articles. To do this, we will go through a number of steps.

  1. We will use The Guardian’s RSS feed to extract links to the latest news articles in the world news section.
  2. We will download the HTML content of the articles published in the past 24 hours and extract the article text in plain text.
  3. We will analyse the overall sentiment of the text using Cloud Natural Language.
  4. We will extract named entities from the text using Cloud Natural Language.
  5. We will go through all named entities of type PERSON and see if they have a Wikipedia entry (for the purposes of this post, this will be our measure of the person being “famous”).
  6. Once we’ve identified all the mentions of “famous people”, we analyse the sentiment of the sentences mentioning them.
  7. Finally, we will print the names, Wikipedia links and the sentiments of the mentions of all the “famous people” in each article, together with the article title, url and the overall sentiment of the article.

We will do all this using GCP AI Platform Notebooks.

To launch new notebook make sure you are logged in to Google Cloud Console and have an active project selected. Navigate to AI Platform Notebooks and select New Instance. For this demo you don’t need a very powerful notebook instance, so we will make some changes to the defaults to save cost. First, select Python 3 (without CUDA) from the list and give a name for your notebook. Next, click the edit icon next to Instance properties. From Instance properties select n1-standard-1 as the Machine type. You will see that the estimated cost of running this instance is only $0.041 per hour.

Select Machine type

Once you have created the instance and it is running, click the Open JupyterLab link of your notebook instance. Once you’re in JupyterLab, select new Python 3 notebook.

Steps 1–2: Extract the Latest News Articles

We start start by downloading some required Python libraries. The following command uses pip to install lxml, Beautiful Soup and Feedparser. We use lxml and Beautiful Soup for processing and parsing HTML the content. Feedparser will be used to parse the RSS feed to identify the latest news articles and to get the links to the full text of those articles.

!pip install lxml bs4 feedparser

Once we have installed the required libraries we need to import them together with the other libraries we need for extracting the news article content. Next, we will define the url to the RSS feed as well as the time period we want to limit our search to. We will then define two functions we will use to extract the main article text from the HTML document. The text_from_html function will parse the HTML file, extract the text from that file and use the tag_visible function to filter out all but the main article text.

Once we have defined these functions we will parse the RSS feed, identify the articles published in the past 24 hours and extract the required attributes for those articles. We will need the article title, link, publishing time and, using the functions defined above, the plain text version of the article text.

Once we have defined these functions we will parse the RSS feed, identify the articles published in the past 24 hours and extract the required attributes for those articles. We will need the article title, link, publishing time and, using the functions defined above, the plain text version of the article text.

3–7: Analyse the Content Using Cloud Natural Language API

To use the Natural Language API we will import the required libraries.

from google.cloud import language_v1
from google.cloud.language_v1 import enums

Next, we define the main function for the demo print_sentiments(document). In this function, in 21 lines of code, we will do all the needed text analysis as well as print the results to view the output. The function takes document as the input, analyses the contents and prints the results. We will look at the contents of the document input later.

To use the API we need to initialise the LanguegeServiceClient. We then define the encoding type which we need to pass together with the document to the API.

The first API call analyze_entities(document, encoding_type=encoding_type) takes the input document and the encoding type and returns a response of the following form:

{
"entities": [
{
object(Entity)
}
],
"language": string
}

We will then call the API to analyse the sentiment of the document as well as to get the sentiments of each sentence in the document. The response has the following form:

{
"documentSentiment": {
object(Sentiment)
},
"language": string,
"sentences": [
{
object(Sentence)
}
]
}

The overall document sentiment is stored in annotations.document_sentiment.score. We assign the document an overall sentiment POSITIVE if the score is above 0, NEGATIVE if it is less than 0 and NEUTRAL if it is 0.

We then go through all the entities identified by the API and create a list of those entities that have the type PERSON. Once we have this list, we loop through it and check which ones from the list have wikipedia_url in their metadata_name. As said, we use this as our measure of the person being “famous”. When we identify a “famous person” we print the person’s name and the link to the Wikipedia entry.

We then check the sentiment annotated sentences for occurrence of the identified “famous person” and use the same values as above to determine the sentiment category of those sentences. Finally, we print all the sentiments of all the sentences mentioning the person.

Now that we have extracted the text from the news site and defined the function to analyse the contents of each article, all we need to do is go through the articles and call the function. The input for the function is a dictionary containing the plain text contents of the article, the type of the document (which in our case if PLAIN_TEXT) and the language of the document (which for us is English). We also print the name of each article and the link to the article.

For demo purposes we limit our analysis to the first 3 articles. The code for the above steps is displayed below together with the output of running that code.

##################################################

‘We have to win’: Myanmar protesters persevere as forces ramp up violence
https://www.theguardian.com/world/2021/feb/28/we-have-to-win-myanmar-protesters-persevere-as-forces-ramp-up-violence
Overall sentiment: NEGATIVE

Person: Min Aung Hlaing
- Wikipedia: https://en.wikipedia.org/wiki/Min_Aung_Hlaing
- Sentence: 1 mentioning Min Aung Hlaing is: NEUTRAL

Person: Aung San Suu Kyi
- Wikipedia: https://en.wikipedia.org/wiki/Aung_San_Suu_Kyi
- Sentence: 1 mentioning Aung San Suu Kyi is: POSITIVE

##################################################

White House defends move not to sanction Saudi crown prince
https://www.theguardian.com/world/2021/feb/28/white-house-defends-not-sanction-saudi-crown-prince-khashoggi-killing
Overall sentiment: NEGATIVE

Person: Joe Biden
- Wikipedia: https://en.wikipedia.org/wiki/Joe_Biden
- Sentence: 1 mentioning Joe Biden is: NEGATIVE

Person: Mark Warner
- Wikipedia: https://en.wikipedia.org/wiki/Mark_Warner
- Sentence: 1 mentioning Mark Warner is: NEGATIVE

Person: Khashoggi
- Wikipedia: https://en.wikipedia.org/wiki/Jamal_Khashoggi
- Sentence: 1 mentioning Khashoggi is: NEGATIVE
- Sentence: 2 mentioning Khashoggi is: NEGATIVE
- Sentence: 3 mentioning Khashoggi is: NEGATIVE

Person: Jen Psaki
- Wikipedia: https://en.wikipedia.org/wiki/Jen_Psaki
- Sentence: 1 mentioning Jen Psaki is: NEGATIVE

Person: Democrats
- Wikipedia: https://en.wikipedia.org/wiki/Democratic_Party_(United_States)
- Sentence: 1 mentioning Democrats is: NEGATIVE

Person: Gregory Meeks
- Wikipedia: https://en.wikipedia.org/wiki/Gregory_Meeks
- Sentence: 1 mentioning Gregory Meeks is: POSITIVE

Person: Prince Mohammed
- Wikipedia: https://en.wikipedia.org/wiki/Mohammed_bin_Salman
- Sentence: 1 mentioning Prince Mohammed is: NEGATIVE

##################################################

Coronavirus live news: South Africa lowers alert level; Jordan ministers sacked for breaches
https://www.theguardian.com/world/live/2021/feb/28/coronavirus-live-news-us-approves-johnson-johnson-vaccine-auckland-starts-second-lockdown-in-a-month
Overall sentiment: NEGATIVE

Person: Germany
- Wikipedia: https://en.wikipedia.org/wiki/Germany
- Sentence: 1 mentioning Germany is: NEGATIVE
- Sentence: 2 mentioning Germany is: NEUTRAL

Person: Nick Thomas-Symonds
- Wikipedia: https://en.wikipedia.org/wiki/Nick_Thomas-Symonds
- Sentence: 1 mentioning Nick Thomas-Symonds is: NEGATIVE

Person: Cyril Ramaphosa
- Wikipedia: https://en.wikipedia.org/wiki/Cyril_Ramaphosa
- Sentence: 1 mentioning Cyril Ramaphosa is: NEGATIVE

Person: Raymond Johansen
- Wikipedia: https://en.wikipedia.org/wiki/Raymond_Johansen
- Sentence: 1 mentioning Raymond Johansen is: NEGATIVE

Person: Archie Bland
- Wikipedia: https://en.wikipedia.org/wiki/Archie_Bland
- Sentence: 1 mentioning Archie Bland is: NEUTRAL

##################################################

As you can see the 3 articles we analysed all have an overall negative sentiment. We also found quite a few mentions of people with Wikipedia entries as well as the sentiments of those sentences.

Conclusion

As we saw, the Cloud Natural Language API is a super simple and powerful tool that allows us to analyse text with just a few lines of code. This is great when you are working on a new use case and need to quickly test the feasibility of an AI-based solution. It is also the go-to resource when you don’t have data to train your own machine learning model for the task. However, if you need to create a more customised model for your use case, I recommend using AutoML Natural Language or training your own model using AI Platform Training.

Hope you enjoyed this demo. Feel free to contact me if you have any questions.

Blog

Starter for 10: Meet Jonna Iljin, Nordcloud’s Head of Design

When people start working with Nordcloud, they generally comment on 2 things. First, how friendly and knowledgeable everyone is. Second,...

Blog

Building better SaaS products with UX Writing (Part 3)

UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

Blog

Building better SaaS products with UX Writing (Part 2)

The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

Get in Touch

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.








    Training PyTorch Transformers on Google Cloud AI Platform

    Google Cloud is widely known for its great AI and machine learning capabilities and products. In fact, there are tons of material available on how you can train and deploy TensorFlow models on Google Cloud. However, Google Cloud is not just for people using TensorFlow but it has good support for other frameworks as well.

    In this post I will show how to use another highly popular ML framework PyTorch on AI Platform Training. I will show how to fine-tune a state-of-the-art sequence classification model using PyTorch and the transformers library. We will be using a pre-trained RoBERTa as the transformer model for this task which we will fine-tune to perform sequence classification.

    RoBERTa falls under the family of transformer-based massive language models which have become very popular in natural language processing since the release of BERT developed by Google. RoBERTa was developed by researchers at University of Washington and Facebook AI. It is fundamentally a BERT model pre-trained with an improved pre-training approach. See the details about RoBERTa here.

    This post covers the following topics:

    • How to structure your ML project for AI Platform Training
    • Code for the model, the training routine and evaluation of the model
    • How to launch and monitor your training job

    You can find all the code on Github.

    ML Project Structure

    Let’s start with the contents of our ML project.

    ├── trainer/
    │   ├── __init__.py
    │   ├── experiment.py
    │   ├── inputs.py
    │   ├── model.py
    │   └── task.py
    ├── scripts/
    │   └── train-gcp.sh
    ├── config.yaml
    └── setup.py
    

    The trainer directory contains all the python files required to train the model. The contents of this directory will be packaged and submitted to AI Platform. You can find more details and best practices on how to package your training application here. We will look at the contents of the individual files later in this post.

    The scripts directory contains our training scripts that will configure the required environment variables and submit the job to AI Platform Training.

    config.yaml contains configuration of the compute instance used for training the model. Finally, setup.py contains details about our python package and the required dependencies. AI Platform Training will use the details in this file to install any missing dependencies before starting the training job.

    PyTorch Code for Training the Model

    Let’s look at the contents of our python package. The first file, __init__.py is just an empty file. This needs to be in place and located in each subdirectory. The init files will be used by Python Setuptools to identify directories with code to package. It is OK to leave this file empty.

    The rest of the files contain different parts of our PyTorch software. task.py is our main file and will be called by AI Platform Training. It retrieves the command line arguments for our training task and passes those to the run function in experiment.py.

    def get_args():
        """Define the task arguments with the default values.
    
        Returns:
            experiment parameters
        """
        parser = ArgumentParser(description='NLI with Transformers')
    
        parser.add_argument('--batch_size',
                            type=int,
                            default=16)
        parser.add_argument('--epochs',
                            type=int,
                            default=2)
        parser.add_argument('--log_every',
                            type=int,
                            default=50)
        parser.add_argument('--learning_rate',
                            type=float,
                            default=0.00005)
        parser.add_argument('--fraction_of_train_data',
                            type=float,
                            default=1
                            )
        parser.add_argument('--seed',
                            type=int,
                            default=1234)
        parser.add_argument('--weight-decay',
                            default=0,
                            type=float)
        parser.add_argument('--job-dir',
                            help='GCS location to export models')
        parser.add_argument('--model-name',
                            help='The name of your saved model',
                            default='model.pth')
    
        return parser.parse_args()
    
    
    def main():
        """Setup / Start the experiment
        """
        args = get_args()
        experiment.run(args)
    
    
    if __name__ == '__main__':
        main()
    
    

    Before we look at the main training and evaluation routines, let’s look at the inputs.py and model.py which define the datasets for the task and the transformer model respectively. First, the we use the datasets library to retrieve our data for the experiment. We use the MultiNLI sequence classification dataset for this experiment. The inputs.py file contains code to retrieve, split and pre-process the data. The NLIDataset provides the PyTorch Dataset object for the training, development and test data for our task.

    class NLIDataset(torch.utils.data.Dataset):
        def __init__(self, encodings, labels):
            self.encodings = encodings
            self.labels = labels
    
        def __getitem__(self, idx):
            item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
            item['labels'] = torch.tensor(self.labels[idx])
            return item
    
        def __len__(self):
            #return len(self.labels)
            return len(self.encodings.input_ids)
    

    The load_data function retrieves the data using the datasets library, splits the data into training, development and test sets, and then tokenises the input using RobertaTokenizer and creates PyTorch DataLoader objects for the different sets.

    def load_data(args):
        
        tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
        nli_data = datasets.load_dataset('multi_nli')
    
        # For testing purposes get a slammer slice of the training data
        all_examples = len(nli_data['train']['label'])
        num_examples = int(round(all_examples * args.fraction_of_train_data))
    
        print("Training with {}/{} examples.".format(num_examples, all_examples))
        
        train_dataset = nli_data['train'][:num_examples]
    
        dev_dataset = nli_data['validation_matched']
        test_dataset = nli_data['validation_matched']
    
        train_labels = train_dataset['label']
    
        val_labels = dev_dataset['label']
        test_labels = test_dataset['label']
    
        train_encodings = tokenizer(train_dataset['premise'], train_dataset['hypothesis'], truncation=True, padding=True)
        val_encodings = tokenizer(dev_dataset['premise'], dev_dataset['hypothesis'], truncation=True, padding=True)
        test_encodings = tokenizer(test_dataset['premise'], test_dataset['hypothesis'], truncation=True, padding=True)
    
        train_dataset = NLIDataset(train_encodings, train_labels)
        val_dataset = NLIDataset(val_encodings, val_labels)
        test_dataset = NLIDataset(test_encodings, test_labels)
    
        train_loader = DataLoader(train_dataset, batch_size=args.batch_size, shuffle=True)
        dev_loader = DataLoader(val_dataset, batch_size=args.batch_size, shuffle=True)
        test_loader = DataLoader(test_dataset, batch_size=args.batch_size, shuffle=True)
    
        return train_loader, dev_loader, test_loader
    

    The save_model function will save the trained model once it’s been trained and uploads it to Google Cloud Storage.

    def save_model(args):
        """Saves the model to Google Cloud Storage
    
        Args:
          args: contains name for saved model.
        """
        scheme = 'gs://'
        bucket_name = args.job_dir[len(scheme):].split('/')[0]
    
        prefix = '{}{}/'.format(scheme, bucket_name)
        bucket_path = args.job_dir[len(prefix):].rstrip('/')
    
        datetime_ = datetime.datetime.now().strftime('model_%Y%m%d_%H%M%S')
    
        if bucket_path:
            model_path = '{}/{}/{}'.format(bucket_path, datetime_, args.model_name)
        else:
            model_path = '{}/{}'.format(datetime_, args.model_name)
    
        bucket = storage.Client().bucket(bucket_name)
        blob = bucket.blob(model_path)
        blob.upload_from_filename(args.model_name)
    

    The model.py file contains code for the transformer model RoBERTa. The __init__ function initialises the module and defines the transformer model to use. The forward function will be called by PyTorch during execution of the code using the input batch of tokenised sentences together with the associated labels. The create function is a wrapper that is used to initialise the model and the optimiser during execution.

    # Specify the Transformer model
    class RoBERTaModel(nn.Module):
        def __init__(self):
            """Defines the transformer model to be used.
            """
            super(RoBERTaModel, self).__init__()
    
            self.model = RobertaForSequenceClassification.from_pretrained('roberta-base', num_labels=3)
    
        def forward(self, x, attention_mask, labels):
            return self.model(x, attention_mask=attention_mask, labels=labels)
    
    
    def create(args, device):
        """
        Create the model
    
        Args:
          args: experiment parameters.
          device: device.
        """
        model = RoBERTaModel().to(device)
        optimizer = optim.Adam(model.parameters(),
                               lr=args.learning_rate,
                               weight_decay=args.weight_decay)
    
        return model, optimizer
    

    The experiment.py file contains the main training and evaluation routines for our task. It contains the functions trainevaluate and run. The train function takes our training dataloader as an input and trains the model for one epoch in batches of the size defined in the command line arguments.

    def train(args, model, dataloader, optimizer, device):
        """Create the training loop for one epoch.
    
        Args:
          model: The transformer model that you are training, based on
          nn.Module
          dataloader: The training dataset
          optimizer: The selected optmizer to update parameters and gradients
          device: device
        """
        model.train()
        for i, batch in enumerate(dataloader):
                optimizer.zero_grad()
                input_ids = batch['input_ids'].to(device)
                attention_mask = batch['attention_mask'].to(device)
                labels = batch['labels'].to(device)
                outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
                loss = outputs[0]
                loss.backward()
                optimizer.step()
                if i == 0 or i % args.log_every == 0 or i+1 == len(dataloader):
                    print("Progress: {:3.0f}% - Batch: {:>4.0f}/{:<4.0f} - Loss: {:<.4f}".format(
                        100. * (1+i) / len(dataloader), # Progress
                        i+1, len(dataloader), # Batch
                        loss.item())) # Loss
    

    The evaluate function takes the development or test dataloader as an input and evaluates the prediction accuracy of our model. This will be called after each training epoch using the development dataloader and after the training has finished using the test dataloader.

    def evaluate(model, dataloader, device):
          """Create the evaluation loop.
    
        Args:
          model: The transformer model that you are training, based on
          nn.Module
          dataloader: The development or testing dataset
          device: device
        """
        print("\nStarting evaluation...")
        model.eval()
        with torch.no_grad():
            eval_preds = []
            eval_labels = []
    
            for _, batch in enumerate(dataloader):
                input_ids = batch['input_ids'].to(device)
                attention_mask = batch['attention_mask'].to(device)
                labels = batch['labels'].to(device)
                preds = model(input_ids, attention_mask=attention_mask, labels=labels)
                preds = preds[1].argmax(dim=-1)
                eval_preds.append(preds.cpu().numpy())
                eval_labels.append(batch['labels'].cpu().numpy())
    
        print("Done evaluation")
        return np.concatenate(eval_labels), np.concatenate(eval_preds)
    

    Finally, the run function calls the run and evaluate functions and saves the fine-tuned model to Google Cloud Storage once training has completed.

    def run(args):
        """Load the data, train, evaluate, and export the model for serving and
         evaluating.
    
        Args:
          args: experiment parameters.
        """
        cuda_availability = torch.cuda.is_available()
        if cuda_availability:
          device = torch.device('cuda:{}'.format(torch.cuda.current_device()))
        else:
          device = 'cpu'
        print('\n*************************')
        print('`cuda` available: {}'.format(cuda_availability))
        print('Current Device: {}'.format(device))
        print('*************************\n')
    
        torch.manual_seed(args.seed)
    
        # Open our dataset
        train_loader, eval_loader, test_loader = inputs.load_data(args)
    
        # Create the model, loss function, and optimizer
        bert_model, optimizer = model.create(args, device)
    
        # Train / Test the model
        for epoch in range(1, args.epochs + 1):
            train(args, bert_model, train_loader, optimizer, device)
            dev_labels, dev_preds = evaluate(bert_model, eval_loader, device)
            # Print validation accuracy
            dev_accuracy = (dev_labels == dev_preds).mean()
            print("\nDev accuracy after epoch {}: {}".format(epoch, dev_accuracy))
    
        # Evaluate the model
        print("Evaluate the model using the testing dataset")
        test_labels, test_preds = evaluate(bert_model, test_loader, device)
        # Print validation accuracy
        test_accuracy = (test_labels == test_preds).mean()
        print("\nTest accuracy after epoch {}: {}".format(args.epochs, test_accuracy))
    
        # Export the trained model
        torch.save(bert_model.state_dict(), args.model_name)
    
        # Save the model to GCS
        if args.job_dir:
            inputs.save_model(args)
    

    Launching and monitoring the training job

    Once we have the python code for our training job, we need to prepare it for AI Platform Training. There are three important files required for this. First, setup.py contains information about the dependencies of our python package as well as metadata like name and version of the package.

    from setuptools import find_packages
    from setuptools import setup
    
    REQUIRED_PACKAGES = [
        'google-cloud-storage>=1.14.0',
        'transformers',
        'datasets',
        'numpy==1.18.5',
        'argparse',
        'tqdm==4.49.0'
    ]
    
    setup(
        name='trainer',
        version='0.1',
        install_requires=REQUIRED_PACKAGES,
        packages=find_packages(),
        include_package_data=True,
        description='Sequence Classification with Transformers on Google Cloud AI Platform'
    )
    

    The config.yaml file contains information about the compute instance used for training the model. For this job we need use an NVIDIA V100 GPU as it provides improved training speed and larger GPU memory compared to the cheaper K80 GPUs. See this great blog post by Google on selecting a GPU.

    trainingInput:
      scaleTier: CUSTOM
      masterType: n1-standard-8
      masterConfig:
        acceleratorConfig:
          count: 1
          type: NVIDIA_TESLA_V100
    

    Finally the scripts directory contains the train-gcp.sh script which includes the required environment variables as will as the gcloud command to submit the AI Platform Training job.

    # BUCKET_NAME: unique bucket name
    BUCKET_NAME=name-of-your-gs-bucket
    
    # The PyTorch image provided by AI Platform Training.
    IMAGE_URI=gcr.io/cloud-ml-public/training/pytorch-gpu.1-4
    
    # JOB_NAME: the name of your job running on AI Platform.
    JOB_NAME=transformers_job_$(date +%Y%m%d_%H%M%S)
    
    echo "Submitting AI Platform Training job: ${JOB_NAME}"
    
    PACKAGE_PATH=./trainer # this can be a GCS location to a zipped and uploaded package
    
    REGION=us-central1
    
    # JOB_DIR: Where to store prepared package and upload output model.
    JOB_DIR=gs://${BUCKET_NAME}/${JOB_NAME}/models
    
    gcloud ai-platform jobs submit training ${JOB_NAME} \
        --region ${REGION} \
        --master-image-uri ${IMAGE_URI} \
        --config config.yaml \
        --job-dir ${JOB_DIR} \
        --module-name trainer.task \
        --package-path ${PACKAGE_PATH} \
        -- \
        --epochs 2 \
        --batch_size 16 \
        --learning_rate 2e-5
    
    gcloud ai-platform jobs stream-logs ${JOB_NAME}
    

    The list line of this script streams the logs directly to your command line. Alternatively you can head to Google Cloud console and navigate to AI Platform jobs and select View logs.

    Logs

    You can also view the GPU utilisation and memory from the AI Platform job page.

    Monitoring GPU utilisation

    Conclusion

    That concludes this post. You can find all the code on Github.

    Hope you enjoyed this demo. Feel free to contact me if you have any questions.

    This is a slightly modified version of an article originally posted on Nordcloud Engineering blog.

    Blog

    Building better SaaS products with UX Writing (Part 3)

    UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

    Blog

    Building better SaaS products with UX Writing (Part 2)

    The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

    Blog

    Building better SaaS products with UX Writing (Part 1)

    UX writing is the process of creating all the copy and content of a digital experience.

    Get in Touch

    Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.








      Introducing Google Coral Edge TPU – a New Machine Learning ASIC from Google

      The Google Coral Edge TPU is a new machine learning ASIC from Google. It performs fast TensorFlow Lite model inferencing with low power usage. We take a quick look at the Coral Dev Board, which includes the TPU chip and is available in online stores now.

      Photo by Gravitylink

      Overview

      Google Coral is a general-purpose machine learning platform for edge applications. It can execute TensorFlow Lite models that have been trained in the cloud. It’s based on Mendel Linux, Google’s own flavor of Debian.

      Object detection is a typical application for Google Coral. If you have a pre-trained machine learning model that detects objects in video streams, you can deploy your model to the Coral Edge TPU and use a local video camera as the input. The TPU will start detecting objects locally, without having to stream the video to the cloud.

      The Coral Edge TPU chip is available in several packages. You probably want to buy the standalone Dev Board which includes the System-on-Module (SoM) and is easy to use for development. Alternatively you can buy a separate TPU accelerator device which connects to a PC through a USB, PCIe or M.2 connector. A System-on-Module is also available separately for integrating into custom hardware.

      Comparing with AWS DeepLens

      Google Coral is in many ways similar to AWS DeepLens. The main difference from a developer’s perspective is that DeepLens integrates to the AWS cloud. You manage your DeepLens devices and deploy your machine learning models using the AWS Console.

      Google Coral, on the other hand, is a standalone edge device that doesn’t need a connection to the Google Cloud. In fact, setting up the development board requires performing some very low level operations like connecting a USB serial port and installing firmware.

      DeepLens devices are physically consumer-grade plastic boxes and they include fixed video cameras. DeepLens is intended to be used by developers at an office, not integrated into custom products.

      Google Coral’s System-on-Module, in contrast, packs the entire system in a 40×48 mm module. That includes all the processing units, networking features, connectors, 1GB of RAM and an 8GB eMMC where the operating system is installed. If you want build a custom hardware solution, you can build it around the Coral SoM.

      The Coral Development Board

      To get started with Google Coral, you should buy a Dev Board for about $150. The board is similar to Raspberry Pi devices. Once you have installed the board, it only requires a power source and a WiFi connection to operate.

      Here are a couple of hints for installing the board for the first time.

      • Carefully read the instructions at https://coral.ai/docs/dev-board/get-started/. They take you through all the details of how to use the three different USB ports on the device and how to install the firmware.
      • You can use a Mac or a Linux computer but Windows won’t work. The firmware installation is based on a bash script and it also requires some special serial port drivers. They might work in Windows Subsystem for Linux, but using a Mac or a Linux PC is much easier.
      • If the USB port doesn’t seem to work, check that you aren’t using a charge-only USB cable. With a proper cable the virtual serial port device will appear on your computer.
      • The MDT tool (Mendel Development Tool) didn’t work for us. Instead, we had to use the serial port to login to the Linux system and setup SSH manually.
      • The default username/password of Mendel Linux is mendel/mendel. You can use those credentials to login through the serial port but the password doesn’t work through SSH. You’ll need to add your public key to .ssh/authorized_keys.
      • You can setup a WiFi network so you won’t need an ethernet cable. The getting started guide has instructions for this.

      Once you have a working development board, you might want to take a look at Model Play (https://model.gravitylink.com/). It’s an Android application that lets you deploy machine learning models from the cloud to the Coral development board.

      Model Play has a separate server installation guide at https://model.gravitylink.com/doc/guide.html. The server must be installed on the Coral development board before you can connect your smartphone to it. You also need to know the local IP address of the development board on your network.

      Running Machine Learning Models

      Let’s assume you now have a working Coral development board. You can connect to it from your computer with SSH and from your smartphone with the Model Play application.

      The getting started guide has instructions for trying out the built-in demonstration application called edgetpu_demo. This application will work without a video camera. It uses a recorded video stream to perform real-time object recognition to detect cars in the video. You can see the output in your web browser.

      You can also try out some TensorFlow Lite models through the SSH connection. If you have your own models, check out the documentation on how to make them compatible with the Coral Edge TPU at https://coral.ai/docs/edgetpu/models-intro/.

      If you just want to play around with existing models, the Model Play application makes it very easy. Pick one of the provided models and tap the Free button to download it to your device. Then tap the Run button to execute it.

      Connecting a Video Camera and Sensors

      If you buy the Coral development board, make sure to also get the Video Camera and Sensor accessories for about $50 extra. They will let you apply your machine learning models to something more interesting than static video files.

      Photo by Gravitylink

      Alternatively you can also use a USB UVC compatible camera. Check the instructions at https://coral.ai/docs/dev-board/camera/#connect-a-usb-camera for details. You can use an HDMI monitor to view the output.

      Future of the Edge

      Google has partnered with Gravitylink for Coral product distribution. They also make the Model Play application that offers the Coral demos mentioned in this article. Gravitylink is trying to make machine learning fun and easy with simple user interfaces and a directory of pre-trained models.

      Once you start developing more serious edge computing applications, you will need to think about issues like remote management and application deployment. At this point it is still unclear whether Google will integrate Coral and Mendel Linux to the Google Cloud Platform. This would involve device authentication, operating system updates and application deployments.

      If you start building on Coral right now, you’ll most likely need a custom management solution. We at Nordcloud develop cloud-based management solutions for technologies like AWS Greengrass, AWS IoT and Docker. Feel free to contact us if you need a hand.

      Blog

      Starter for 10: Meet Jonna Iljin, Nordcloud’s Head of Design

      When people start working with Nordcloud, they generally comment on 2 things. First, how friendly and knowledgeable everyone is. Second,...

      Blog

      Building better SaaS products with UX Writing (Part 3)

      UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

      Blog

      Building better SaaS products with UX Writing (Part 2)

      The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

      Get in Touch

      Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.








        Webinar: Get to know the coolest of Google Cloud tools

        CATEGORIES

        Blog

        Webinar: Get to know the coolest of Google Cloud tools

         

        Ready to unlock Google Cloud?

        In this webinar, you’ll hear

        1. Coolest Cases: Hand-picked case studies of the best tools in Google Cloud
        2. Are you Cloud-Ready: What it takes to supercharge your organization with Google Cloud
        3. Most Common Challenges: What look out for when moving to Google Cloud?
        4. Navigating the Obstacles: Nordcloud’s end-to-end solutions to secure your cloud journey and unleash the potential of Google Cloud

        Since 2011, Nordcloud, a Google Cloud Premier Partner, has completed more than 1000 successful cloud deployments. Nordcloud has worked with Europe’s largest enterprises, e.g. most of OMXN40, to harvest the full benefits of the public cloud, such as increased security, agility, scalability and reduced costs.

        Register here.

        Date:

        August 28, 2019

        Time

        1 PM – 2 PM CEST

         

        Blog

        Starter for 10: Meet Jonna Iljin, Nordcloud’s Head of Design

        When people start working with Nordcloud, they generally comment on 2 things. First, how friendly and knowledgeable everyone is. Second,...

        Blog

        Building better SaaS products with UX Writing (Part 3)

        UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

        Blog

        Building better SaaS products with UX Writing (Part 2)

        The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

        Get in Touch

        Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.








          Adopt Kubernetes with Experts Webinar

          CATEGORIES

          Blog

          Webinar: Adopt Kubernetes with Experts

           

          Kubernetes makes it easy to scale, manage and deploy containerized applications. Nordcloud provides proven best practices for all aspects of cloud-based Kubernetes adoption with both managed services as well as virtual machine based deployments.

          The webinar will be held on Tuesday the 20th of August at 2 PM.

          In this webinar, you’ll learn how to 

          • Learn how to leverage CloudRun to run stateless container workloads
          • Build and deploy a microservices application onto GKE
          • Leverage full potential of services meshes, see how ISTIO can help with observability and communication policies

          The webinar is held by Dariusz Dwornikowski, Head of Engineering.

          Since 2011, Nordcloud, a Google Cloud Premier Partner, has completed more than 1000 successful cloud deployments. Nordcloud has worked with Europe’s largest enterprises, e.g. most of OMXN40, to harvest the full benefits of the public cloud, such as increased security, agility, scalability and reduced costs.

          Register here.

          Date:

          August 20

          Time

          14:00 – 15:00 CEST

           

          Blog

          Starter for 10: Meet Jonna Iljin, Nordcloud’s Head of Design

          When people start working with Nordcloud, they generally comment on 2 things. First, how friendly and knowledgeable everyone is. Second,...

          Blog

          Building better SaaS products with UX Writing (Part 3)

          UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

          Blog

          Building better SaaS products with UX Writing (Part 2)

          The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

          Get in Touch

          Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.








            HUS chooses Nordcloud and Google to Accelerate its Digital Competence

            CATEGORIES

            Blog

            Nordcloud will harness its years of expertise in public cloud to benefit HUS in migrating to Google Cloud and its services.

            Google’s europe-north1 region, located in Hamina, Finland, is a determining factor for many Finnish organisations in choosing between hyperscale providers.

            The arrival of Google’s public cloud in Hamina was a central developmental step for digitalisation of public sector in Finland. Google Cloud’s Finnish region enables the use of cloud tools and added value while at the same time storing the data physically in Finland”, says Lars Oehlandt, VP Google Partnership of Nordcloud.  “As the Nordic public cloud pioneer with hundreds of projects under our belt, Nordcloud has refined a method we call Cloud Journey for taking business to cloud in a controlled and smart manner. Our cloud journey concept is an excellent fit for both public administration and health care organizations in order to secure full value of public cloud.”

            Google Cloud’s Finnish region enables the use of cloud tools and added value while at the same time storing the data physically in Finland.

            – Lars Oehlandt, VP Google Partnership, Nordcloud

            In addition to the advantageous region, HUS will obtain access to Google Cloud’s numerous features: advanced analytics, machine learning and container services, high-level information security, and world-class infrastructure.

            “Our cooperation with Nordcloud has deepened fast with the interest and demand generated by the Hamina region. Together we will serve Finnish companies and public administration organisations that are developing and modernising their operations with Google Cloud”, says Carita Mäkinen, Field Sales, Google Cloud Platform.

            Google announced in May that it will invest 600 million euros to expand its data center in Hamina. The new construction will add to Google’s existing data-center complex in Hamina on the south coast of Finland, taking the company’s total investment there to 1.4 billion euros. Google’s Hamina complex will be powered by renewable energy acquired from three new wind farms in the Nordic nation

            Earlier this month, Nordcloud was the first Nordic company to achieve the status of Authorized Google Cloud Platform Training Partner.

            Blog

            Starter for 10: Meet Jonna Iljin, Nordcloud’s Head of Design

            When people start working with Nordcloud, they generally comment on 2 things. First, how friendly and knowledgeable everyone is. Second,...

            Blog

            Building better SaaS products with UX Writing (Part 3)

            UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

            Blog

            Building better SaaS products with UX Writing (Part 2)

            The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

            Get in Touch

            Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.








              Next generation networking, food for thought?

              CATEGORIES

              Blog

              A few of the large announcements included Anthos and Cloud run. It is easy to get overwhelmed by the sheer amount of presentations and announcements.

              This year there were two presentations that I felt may have flown under the radar, but would be a shame to miss out on.

               

              Istio Service mesh for VM’s

              Service meshes and overlay networking have been around for a while. Tools like Istio and such, enabled engineers to create overlay networks between containers. These networks allow for software-based networking between services and higher level features like:
              circuit-breaking, latency-aware load balancing, and service discovery.

              One of the drawbacks of these tools was the fact that most of the relied on sidecar containers. As a result, setting this up for non-container workloads like VM’s was pretty difficult. In this talk Chris Crall and Jianfei Hu show an easy way of integrating Istio with VM’s. This means that we can now integrate almost anything into our service mesh. This includes things like databases, legacy workloads or anything else that runs on a VM.

              Even though it might seem like a minor feature, this is pretty game-breaking. Imagine migrating a large application landscape critical of legacy workloads into containers: Istio can do weight-based routing. This means that we can set up many endpoints for the same service, all receiving only part of the traffic. By doing this for an application we’re trying to migrate, we can compare the performance of the old- to the new containerised version.

               

              Zero-trust networking and PII

              Another video that would be easy to miss, but definitely worth a watch is the one by Roy Bryant from Scotiabank. They’ve started shifting recently from a financial institution to ‘a tech company that also does banking’. As shown by them starting to push code open-source to GitHub.

              Being a bank, they deal with a lot of PII (Personally identifiable information). As a result, security is one of their main concerns.  In the video they mention that besides using ML to tokenise things like CC numbers, they leverage intent-based zero trust networking. This might sound complex but in reality it is quite elegant.

              Traditionally, access between services or computers is enforced through firewalls and network configurations. With the emergence of software-defined networks, and layer-7 routing we can start thinking about other ways.

              In the video, they mention that instead of configuring firewalls, they started expressing intent: “I want service A to be able to read 10 records per second from service B for the next 5 minutes”

              By versioning these intents and abstracting the logic behind it away into libraries, we are no longer maintaining complex sets of firewall rules. Access is now governed in a transparent maintainable manner, allowing for an intuitive way of approaching security.

               

              Conclusion

              A blogpost like this can only cover so much ground, and these are complex subjects. I recommend watching the videos mentioned here, and checking out the links in the reference below. I’d like to end this post with some food for thought:

              Currently in modern clouds, a large part of the security model relies on network security through firewalls and NACLs in addition to IAM.

              With the increasing usage of layer-7 overlay-networking I expect to see these two amalgamate into new multi-disciplinary security mechanisms.

              References

              Blog

              Starter for 10: Meet Jonna Iljin, Nordcloud’s Head of Design

              When people start working with Nordcloud, they generally comment on 2 things. First, how friendly and knowledgeable everyone is. Second,...

              Blog

              Building better SaaS products with UX Writing (Part 3)

              UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

              Blog

              Building better SaaS products with UX Writing (Part 2)

              The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

              Get in Touch

              Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.








                OnBoard Kubernetes Engine | Copenhagen

                CATEGORIES

                Blog

                Join Nordcloud at OnBoard Kubernetes Engine – Copenhagen on May 23This free-to-attend, one-day event will provide you with industry best practices and tips to accelerate your ability to design solutions using Kubernetes!

                OnBoard Kubernetes Engine | Sorte Diamant, Det Kongelige Bibliotek

                OnBoard Kubernetes Engine – Copenhagen is a free full-day enablement and training event that will give you an understanding of containers and Docker, an overview of Kubernetes Engine technology, deploy to Kubernetes Engine and setting up continuous delivery.

                OnBoard Kubernetes Engine has been designed for IT Managers, Systems Engineers and Operations professionals, Developers, Solution Architects and modern business leaders who are exploring cloud solutions or are new to Google Cloud Platform. Leveraging the GCP Kubernetes Engine course, the event will provide you with the technical training you need to get started as well as access to tips and tricks, industry best-practice and questions and answers from the Google Cloud team.

                Check the event agenda and register here.

                Date

                May 23

                Location

                Sorte Diamant, Det Kongelige Bibliotek
                Søren Kierkegaards Plads 1
                Copenhagen, Denmark

                 

                Blog

                Starter for 10: Meet Jonna Iljin, Nordcloud’s Head of Design

                When people start working with Nordcloud, they generally comment on 2 things. First, how friendly and knowledgeable everyone is. Second,...

                Blog

                Building better SaaS products with UX Writing (Part 3)

                UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

                Blog

                Building better SaaS products with UX Writing (Part 2)

                The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

                Get in Touch

                Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.








                  OnBoard Kubernetes Engine | Stockholm

                  CATEGORIES

                  Blog

                  Join Nordcloud at OnBoard Kubernetes Engine – Stockholm on March 28This free-to-attend, one-day event will provide you with industry best practices and tips to accelerate your ability to design solutions using Kubernetes!

                  OnBoard Kubernetes Engine | Scandic Continental

                  OnBoard Kubernetes Engine – Stockholm is a free full-day enablement and training event that will give you understanding of containers and Docker, an overview of Kubernetes Engine technology, deploy to Kubernetes Engine and setting up continuous delivery.

                  OnBoard Kubernetes Engine has been designed for IT Managers, Systems Engineers and Operations professionals, Developers, Solution Architects and modern business leaders who are exploring cloud solutions or are new to Google Cloud Platform. Leveraging the GCP Kubernetes Engine course, the event will provide you with the technical training you need to get started as well as access to tips and tricks, industry best-practice and questions and answers from the Google Cloud team.

                  Check the event agenda and register here.

                  Date

                  March 28

                  Location

                  Scandic Continental
                  Vasagatan 22
                  Stockholm, Sweden

                   

                  Blog

                  Starter for 10: Meet Jonna Iljin, Nordcloud’s Head of Design

                  When people start working with Nordcloud, they generally comment on 2 things. First, how friendly and knowledgeable everyone is. Second,...

                  Blog

                  Building better SaaS products with UX Writing (Part 3)

                  UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

                  Blog

                  Building better SaaS products with UX Writing (Part 2)

                  The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

                  Get in Touch

                  Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.








                    Google Cloud Platform Breakfast Seminar | Copenhagen

                    CATEGORIES

                    Blog

                    GCP Breakfast Seminar at our Copenhagen Office

                    With proper governance in place, your business is able to shorten the time to market and start consuming next-generation cloud services, such as Machine Learning. A guest speaker from Google will be presenting the business value that can be achieved with GCP’s advanced Machine Learning tools.

                    Since 2011, Nordcloud, a Google Cloud Premier Partner, has completed more than 1000 successful cloud deployments. Nordcloud has worked with Europe’s largest enterprises, e.g. most of OMXN40, to harvest the full benefits of the public cloud, such as increased security, agility, scalability and reduced costs.

                    Check the event agenda and register here.

                    Date

                    March 28

                    Location

                    Nordcloud
                    Strandvejen 70, 2. sal
                    2900 Hellerup

                     

                    Blog

                    Starter for 10: Meet Jonna Iljin, Nordcloud’s Head of Design

                    When people start working with Nordcloud, they generally comment on 2 things. First, how friendly and knowledgeable everyone is. Second,...

                    Blog

                    Building better SaaS products with UX Writing (Part 3)

                    UX writers are not omniscient, and it’s best for them to resist the temptation to work in isolation, just as...

                    Blog

                    Building better SaaS products with UX Writing (Part 2)

                    The main purpose of UX writing is to ensure that the people who use any software have a positive experience.

                    Get in Touch

                    Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.