(untagged)

Creating a OneNote Markdown Converter

Matthew Casperson

0.00/5 (No votes)

18 Nov 2021

In this article of this series, we’ll use the Graph API client to consume OneNote documents through a microservice that allows them to be converted into Markdown format.

Here we’ll build a Spring Boot web app and microservice using the Graph API client to query for a list of OneNote notebooks, display an HTML preview of the notebook pages, and convert and download the pages as Markdown.

OneNote is a great tool for creating notes, either through a desktop application or via the online Office platform. These notes can be exported as Word or PDF documents, but many enterprises require content in other formats, like Markdown.

Let’s build on the work we did in the previous article. We’ll build a Spring Boot web app and microservice using the Graph API client to query for a list of OneNote notebooks, display an HTML preview of the notebook pages, and convert and download the pages as Markdown. We’ll see how teams can automate the process of converting content without needing to first download files as Word documents and convert them into secondary formats.

The Sample Application

You can find the source code for this sample application on GitHub. The MSALOneNoteConverter repo contains the frontend web app, and the MSALOneNoteBackend repo contains the microservice.

Build the Back-End Microservice Application

We’ll start with the back-end microservice. This is responsible for returning the list of notebooks to the front-end and providing the ability to convert notebook pages from HTML to Markdown.

Bootstrap the Spring Project

We’ll generate the initial application template using Spring Initalizr to create a Java Maven project, which generates a JAR file using the latest non-snapshot version of Spring against Java 11.

The microservice requires the following dependencies:

Spring Web, which provides our hosted web server
Spring Security, which allows us to secure access to endpoints
Azure Active Directory, used to integrate with Azure AD
OAuth2 Resource Server, used to integrate the backend API with an OAuth2 authorization server

Expose the Graph API Client

The first step is to expose the Graph API client — configured with the authentication provider we created in the previous article — as a bean. We’ll do this in the GraphClientConfiguration class in the following package:

Java

Copy Code

package com.matthewcasperson.onenotebackend.configuration;

We inject an instance of the AADAuthenticationProperties class. This provides access to the values in our Spring configuration file, including the client ID, client secret, and tenant ID.

Java

Copy Code

@Autowired
AADAuthenticationProperties azureAd;

We then create an instance of the Graph API client using the OboAuthenticationProvider created in the previous article. Note that we’re requesting a token with a scope of https://graph.microsoft.com/Notes.Read.All, granting us read access to OneNote notes.

Java

Copy Code

  @Bean
  public GraphServiceClient<Request> getClient() {
    return GraphServiceClient.builder()
        .authenticationProvider(new OboAuthenticationProvider(
            Set.of("https://graph.microsoft.com/Notes.Read.All"),
            azureAd.getTenantId(),
            azureAd.getClientId(),
            azureAd.getClientSecret()))
        .buildClient();
  }
}

Configure Spring Security

We configure our microservice to require a valid token for all requests through the AuthSecurityConfig class:

Java

Copy Code

package com.matthewcasperson.onenotebackend.configuration;
 
import com.azure.spring.aad.webapi.AADResourceServerWebSecurityConfigurerAdapter;
import org.springframework.security.config.annotation.web.builders.HttpSecurity;
import org.springframework.security.config.annotation.web.configuration.EnableWebSecurity;
 
@EnableWebSecurity
public class AuthSecurityConfig extends AADResourceServerWebSecurityConfigurerAdapter {
 
  @Override
  protected void configure(final HttpSecurity http) throws Exception {
    super.configure(http);
    // @formatter:off
    http
        .authorizeRequests()
        .anyRequest()
        .authenticated();
    // @formatter:on
  }
}

Add a Conversion Library

Our microservice will take advantage of Pandoc to perform the conversion between HTML and Markdown. Pandoc is an open-source document converter, that we’ll invoke using a community Java wrapper library.

We add the following dependency to the Maven pom.xml file to include the Pandoc wrapper in our project.

XML

Copy Code

<dependency>
<groupId>org.bitbucket.leito</groupId>
<artifactId>universal-document-converter</artifactId>
<version>1.1.0</version>
</dependency>

Note that the wrapper simply calls the Pandoc executable, so Pandoc needs to be installed and available on the operating system path.

Add the Spring REST Controller

The bulk of our microservice is found in the REST controller handling requests from the front-end web application. This controller is found in the OneNoteController class, in the following package:

Java

Copy Code

package com.matthewcasperson.onenotebackend.controllers;

This class is doing a lot of work, so let’s examine it piece by piece.

We start by injecting an instance of the Graph API client.

Java

Copy Code

@RestController
public class OneNoteController {
 
  @Autowired
  GraphServiceClient<Request> client;

Our front-end web application needs a list of the notebooks created by the currently logged-in user. This is provided by the getNotes method.

Java

Copy Code

@GetMapping("/notes")
public List<String> getNotes() {
  return getNotebooks()
      .stream()
      .map(n -> n.displayName)
      .collect(Collectors.toList());
}

To keep this sample application simple, we’ll provide the ability to view and convert the first page of the first section in any selected notebook. The getNoteHtml method provides the page HTML.

Java

Copy Code

@GetMapping("/notes/{name}/html")
public String getNoteHtml(@PathVariable("name") final String name) {
  return getPageHTML(name);
}

In addition to the page HTML, our microservice allows us to retrieve the page Markdown. The Markdown content is returned by the getNoteMarkdown method.

Java

Copy Code

@GetMapping("/notes/{name}/markdown")
public String getNoteMarkdown(@PathVariable("name") final String name) {
  final String content = getPageHTML(name);
  return convertContent(content);
}

We have several private methods to support the public endpoint methods. These private methods are responsible for querying the Graph API and performing the content conversion.

The getPageHTML method returns the first page from the first section of the named notebook.

One thing to note while using the Graph API client is that many methods can return null values. Fortunately, the client methods that can return null have been annotated with @Nullable. This provides IDEs with the information required to warn us when we might be referencing possible null values.

We make liberal use of the Optional class to avoid littering our code with null checks:

Java

Copy Code

private String getPageHTML(final String name) {
  return getNotebooks()
      .stream()
      // find the notebook that matches the supplied name
      .filter(n -> name.equals(n.displayName))
      // we only expect one notebook to match
      .findFirst()
      // get the notebook sections
      .map(notebook -> notebook.sections)
      // get the first page from the first section
      .map(sections -> getSectionPages(sections.getCurrentPage().get(0).id).get(0))
      // get the page id
      .map(page -> page.id)
      // get the content of the page
      .flatMap(this::getPageContent)
      // if any of the operations above returned null, return an error message
      .orElse("Could not load page content");
}

The conversion of HTML to Markdown is performed in the convertContent method. We use the Pandoc wrapper exposed by the DocumentConverter class to convert the original page HTML into Markdown.

Note that DocumentConverter constructs the arguments to be passed to the external Pandoc application, but doesn’t include the Pandoc app itself. This means we need to install Pandoc alongside our microservice. It also means we pass data through external files instead of directly passing strings.

The convertContent method creates two temporary files: the first containing the input HTML, and the second for the output Markdown. It then passes those files to Pandoc, reads the content of the output file, and cleans everything up.

To convert notes to different formations, this method could be edited to specify different Pandoc arguments, or swapped out completely to replace Pandoc as a conversion tool:

Java

Copy Code

private String convertContent(final String html) {
  Path input = null;
  Path output = null;

  try {
    input = Files.createTempFile(null, ".html");
    output = Files.createTempFile(null, ".md");

    Files.write(input, html.getBytes());

    new DocumentConverter()
        .fromFile(input.toFile(), InputFormat.HTML)
        .toFile(output.toFile(), "markdown_strict-raw_html")
        .convert();

    return Files.readString(output);
  } catch (final IOException e) {
    // silently ignore
  } finally {
    try {
      if (input != null) {
        Files.delete(input);
      }
      if (output != null) {
        Files.delete(output);
      }
    } catch (final Exception ex) {
      // silently ignore
    }
  }

  return "There was an error converting the file";
}

The next set of methods are responsible for calling the Graph API.

The getNotebooks method retrieves a list of notebooks created by the currently logged in user.

One thing to be aware of when interacting with the Graph API is that it typically won’t return child resources when requesting a parent resource. However, it’s possible to override this behavior with the $expand query parameter. Here, we request a list of notebook resources and expand their sections:

Java

Copy Code

private List<Notebook> getNotebooks() {
  return Optional.ofNullable(client
          .me()
          .onenote()
          .notebooks()
          .buildRequest(new QueryOption("$expand", "sections"))
          .get())
      .map(BaseCollectionPage::getCurrentPage)
      .orElseGet(List::of);
}

Because sections don’t support the expansion of child pages, we use the getSectionPages method to make a second request to return the list of pages associated with each section.

Java

Copy Code

private List<OnenotePage> getSectionPages(final String id) {
  return Optional.ofNullable(client
          .me()
          .onenote()
          .sections(id)
          .pages()
          .buildRequest()
          .get())
      .map(OnenotePageCollectionPage::getCurrentPage)
      .orElseGet(List::of);
}

The OnenotePage class doesn’t include the content of the page. To access the content, we need to make one more API request:

Java

Copy Code

private Optional<String> getPageContent(final String id) {
    return Optional.ofNullable(client
        .me()
        .onenote()
        .pages(id)
        .content()
        .buildRequest()
        .get())
        .map(s -> toString(s, null));
}

The toString method converts a stream to a string and captures any exceptions, allowing us to perform this conversion in a lambda. Checked exceptions don’t play well with lambdas passed to classes like Optional.

Java

Copy Code

  private String toString(final InputStream stream, final String defaultValue) {
    try (stream) {
      return new String(stream.readAllBytes(), StandardCharsets.UTF_8);
    } catch (final IOException e) {
      return defaultValue;
    }
  }
}

Build the Front-End Web Application

The frontend web application displays the list of notebooks created by the currently logged-in user, previews the first page of the first section of a selected notebook, and allows the page to be downloaded as a Markdown file.

The MSALOneNoteConverter repo contains the code for this section.

Bootstrap the Spring Project

Just as we did for the back-end, we’ll generate the initial application template using Spring Initalizr to create a Java Maven project, which generates a JAR file using the latest non-snapshot version of Spring against Java 11.

The web application requires the following dependencies:

Thymeleaf, which provides the template language for HTML files
Spring Web, used to provide a hosted webserver
Spring Security, which allows us to secure access to endpoints
Azure Active Directory, used to integrate with Azure AD
OAuth2 Client, used to integrate the web application with an OAuth2 authorization server

Configure Spring Security

Like the microservice, our web application is configured to require authenticated access to all pages through the AuthSecurityConfig class.

Java

Copy Code

package com.matthewcasperson.onenote.configuration;
 
...
// imports
...

@EnableWebSecurity
@EnableGlobalMethodSecurity(prePostEnabled = true)
public class AuthSecurityConfig extends AADWebSecurityConfigurerAdapter {
 
    @Override
    protected void configure(final HttpSecurity http) throws Exception {
        super.configure(http);
        // @formatter:off
        http
            .authorizeRequests()
                .anyRequest().authenticated()
            .and()
                .csrf()
                .disable();
        // @formatter:on
    }
}

Build a WebClient

We need a WebClient in order for the frontend application to interact with the microservice. WebClient is the new non-blocking solution for making HTTP calls, and is the preferred option over the older RestTemplate.

To call the microservice, each request must have an associated access token. The WebClientConfig class configures an instance of WebClient to include a token sourced from an OAuth2AuthorizedClient:

Java

Copy Code

package com.matthewcasperson.onenote.configuration;

...
// imports
...

@Configuration
public class WebClientConfig {
  @Bean
  public OAuth2AuthorizedClientManager authorizedClientManager(
      final ClientRegistrationRepository clientRegistrationRepository,
      final OAuth2AuthorizedClientRepository authorizedClientRepository) {
 
    final OAuth2AuthorizedClientProvider authorizedClientProvider =
        OAuth2AuthorizedClientProviderBuilder.builder()
            .clientCredentials()
            .build();
 
    final DefaultOAuth2AuthorizedClientManager authorizedClientManager =
        new DefaultOAuth2AuthorizedClientManager(
            clientRegistrationRepository, authorizedClientRepository);
    authorizedClientManager.setAuthorizedClientProvider(authorizedClientProvider);
 
    return authorizedClientManager;
  }
 
  @Bean
  public static WebClient webClient(final OAuth2AuthorizedClientManager oAuth2AuthorizedClientManager) {
    final ServletOAuth2AuthorizedClientExchangeFilterFunction function =
        new ServletOAuth2AuthorizedClientExchangeFilterFunction(oAuth2AuthorizedClientManager);
    return WebClient.builder()
        .apply(function.oauth2Configuration())
        .build();
  }
}

Build the MVC Controller

The MVC controller defined in the OneNoteController class exposes the endpoints that users access via their web browsers. We'll take a look at the code for the following package:

Java

Copy Code

package com.matthewcasperson.onenote.controllers;

Let’s break down and examine this code.

We inject an instance of the WebClient created by the WebClientConfig class.

Java

Copy Code

@Controller
public class OneNoteController {
 
  @Autowired
  WebClient webClient;

The getIndex method receives an OAuth2AuthorizedClient configured to access the microservice. This client is passed to the WebClient to retrieve a list of the notebooks created by the currently logged-in user. The resulting list is saved as the model attribute notes:

Java

Copy Code

@GetMapping("/")
public ModelAndView getIndex(
    @RegisteredOAuth2AuthorizedClient("api") final OAuth2AuthorizedClient client) {
  final List notes = webClient
      .get()
      .uri("http://localhost:8081/notes/")
      .attributes(oauth2AuthorizedClient(client))
      .retrieve()
      .bodyToMono(List.class)
      .block();

  final ModelAndView mav = new ModelAndView("index");
  mav.addObject("notes", notes);
  return mav;
}

The getPageView method captures two paths that allow the selected notebook to be previewed in HTML form and downloaded as Markdown.

The iframesrc model attribute is a path to an endpoint that returns the notebook page as HTML. The markdownsrc model attribute is a path to an endpoint that provides the page as a downloadable Markdown file:

Java

Copy Code

@GetMapping("/notes/{name}")
public ModelAndView getPageView(@PathVariable("name") final String name) {
  final ModelAndView mav = new ModelAndView("pageview");
  mav.addObject("iframesrc", "/notes/" + name + "/html");
  mav.addObject("markdownsrc", "/notes/" + name + "/markdown");
  return mav;
}

To preview the notebook page’s HTML, the getNoteHtml method returns the raw HTML, along with the X-Frame-Options and Content-Security-Policy headers that allow this endpoint to be viewed in an HTML iframe element.

Java

Copy Code

@GetMapping(value = "/notes/{name}/html", produces = MediaType.TEXT_HTML_VALUE)
@ResponseBody
public String getNoteHtml(
    @RegisteredOAuth2AuthorizedClient("api") final OAuth2AuthorizedClient client,
    @PathVariable("name") final String name,
    final HttpServletResponse response) {
  response.setHeader("X-Frame-Options", "SAMEORIGIN");
  response.setHeader("Content-Security-Policy", " frame-ancestors 'self'");
  return webClient
      .get()
      .uri("http://localhost:8081/notes/" + name + "/html")
      .attributes(oauth2AuthorizedClient(client))
      .retrieve()
      .bodyToMono(String.class)
      .block();
}

The getNoteMarkdown method provides the page as a downloadable Markdown file. By returning a ResponseEntity object and defining the Content-Type and Content-Disposition headers, we instruct the browser to download the returned content rather than display it in the browser.

Java

Copy Code

  @GetMapping("/notes/{name}/markdown")
  public ResponseEntity<byte[]> getNoteMarkdown(
      @RegisteredOAuth2AuthorizedClient("api") final OAuth2AuthorizedClient client,
      @PathVariable("name") final String name) {
    final String markdown = webClient
        .get()
        .uri("http://localhost:8081/notes/" + name + "/markdown")
        .attributes(oauth2AuthorizedClient(client))
        .retrieve()
        .bodyToMono(String.class)
        .block();
 
    final HttpHeaders headers = new HttpHeaders();
    headers.setContentType(MediaType.TEXT_MARKDOWN);
    final String filename = "page.md";
    headers.setContentDispositionFormData(filename, filename);
    return new ResponseEntity<>(markdown.getBytes(), headers, HttpStatus.OK);
  }
}

Create the Thymeleaf Templates

The index.html page displays the list of notebooks, and provides a button to redirect the browser to the next page:

HTML

Copy Code

<html>
<head>
  <link rel="stylesheet" href="/style.css">
  <script>
    function handleClick() {
      if (note.selectedIndex !== -1) {
        location.href='/notes/' + note.options[note.selectedIndex].value;
      } else {
        alert("Please select a notebook");
      }
    }
  </script>
</head>
<body>
<div class="container">
  <div class="header">
    <div class="title"><a href="/">ONENOTE CONVERTER</a></div>
  </div>
  <div class="main">
    <form class="formContainer">
      <div class="formRow">
        <select style="display: block" size="5" id="note">
          <option th:each="note: ${notes}" th:value="${note}" th:text="${note}">
          </option>
        </select>
      </div>
      <div class="formRow">
        <input type="button" value="View Note" onclick="handleClick();">
      </div>
    </form>
  </div>
</div>
</body>
</html>

The pageview.html page displays the page’s HTML in an iframe and provides a form button to download the Markdown file.

HTML

Copy Code

<html>
<head>
  <link rel="stylesheet" href="/style.css">
</head>
<body>
<div class="container">
  <div class="header">
    <div class="title"><a href="/">ONENOTE CONVERTER</a></div>
  </div>
  <div class="main">
    <form class="formContainer">
      <div class="formRow">
        <iframe style="width: 100%; height: 400px" th:src="${iframesrc}"></iframe>
      </div>
      <div class="formRow">
        <form style="margin-top: 10px" th:action="${markdownsrc}">
          <input type="submit" value="Download Markdown" />
        </form>
      </div>
    </form>
  </div>
</div>
</body>
</html>

Conclusion

By taking advantage of the Graph API client, we can interact with the Microsoft Graph API using a fluent and type-safe interface. It’s far more convenient and reliable than performing raw HTTP requests and processing the returned JSON.

In this article, we used the Graph API client to retrieve OneNote notebook pages, preview the original page HTML, and provide the ability to download a Markdown version of the page. Though this was a relatively simple example, it demonstrates how Spring Boot applications can seamlessly interact with Microsoft Office documents on behalf of an end user, by using the Microsoft Graph API and Azure AD.

In the final article of this series, we’ll see how to integrate Spring with Microsoft Teams to create a simple incident management bot.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here