I want to display the diff of the string in jupyter notebook(lab)

Asked 1 years ago, Updated 1 years ago, 99 views

I analyze the data using jupyter notebook, but for example, the original data that I scraped is often a large html or a string with some structure.

If that's the case, I personally feel like taking a diff (terminal) when I'm given two strings with multiple lines.

If there are two strings obtained on ipython in jupyter notebook(or,lab), I would like to diff the line to line and check the difference just like the terminal. Is there any way to achieve this?

jupyter-notebook

2022-09-30 10:43

1 Answers

If there is not much difference, there is a difflib--- to help you calculate the difference.
(I have compared the functionality with filecmp.dircmp to take directory differences, and a few layers that are not that deep are practical enough)

If it's simple, it might be possible.
(Low functionality compared to command line diff)

[Example]
unified_diffwhen used

 from difflib import unified_diff

# If htmla, htmlb string is present
lst1 = htmla.splitlines(keepends=True)
lst2=htmlb.splitlines(keepends=True)
print(''.join(unified_diff(lst1,lst2,fromfile='before.html',tofile='after.html'))))

Or color it

lst1,lst2=map(lambdas:s.strip().split('\n'),(s1,s2))
out='\n'.join(unified_diff(lst1,lst2,fromfile='before.py',tofile='after.py',lineterm='))

from IPython.display import Markdown
display(Markdown(f''`diff)
{out}
```'''))

HtmlDiff displays left and right

 from difflib import HtmlDiff
from IPython.display import HTML

display(HTML(HtmlDiff().make_file(lst1,lst2))))

(Examples of comparison results generated by HtmlDiff)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" >

<html>

<head>
    <meta http-equiv="Content-Type"
          content="text/html; charset=utf-8"/>
    <title> </title>
    <style type="text/css">
        table.diff {font-family:Courier; border:medium;}
        .diff_header {background-color:#e0e0e0}
        td.diff_header {text-align:right}
        .diff_next {background-color:#c0c0c0}
        .diff_add {background-color:#aaffaa}
        .diff_chg {background-color:#ffff77}
        .diff_sub {background-color:#ffaaa}
    </style>
</head>

<body>
    
    <table class="diff" id="difflib_chg_to6_top"
           cellspacing="0" cellpadding="0" rules="groups">
        <colgroup></colgroup><colgroup><colgroup><colgroup></colgroup>
        <colgroup></colgroup><colgroup><colgroup><colgroup></colgroup>
        
        <tbody>
            <tr><td class="diff_next" id="difflib_chg_to6__0"><a href="#difflib_chg_to6__top">t</a></td><td class="diff_header" id="from6_1">1</td><td nowrap="nowrap"><span class="diff_sub">bacon</span></td><td class="diff_next"><a href="#difflib_chg_to6__top">t</a></td><td class="diff_header" id="to6_1">1</td><td nowrap="nowrap">span class="diff_add">python</span>></td>>>
            <tr><td class="diff_next"></td><td class="diff_header" id="from6_2">2</td><td nowrap="nowrap">egg<span class="diff_chg">s</span></td><td class="diff_next"></td><td class="diff_header" id="to6_2">2</td><td nowrap="nowrap">egg<span class="diff_chg">y</span></td></tr>
            <tr><td class="diff_next"></td><td class="diff_header" id="from6_3">3</td><td nowrap="nowrap"><span class="diff_sub">ham</span></td><td class="diff_next"></td><td class="diff_header" id="to6_3">3</td><td nowrap="nowrap"><span class="diff_add">hamster</span></td></tr>
            <tr><td class="diff_next">/td>>td class="diff_header" id="from6_4">4</td><td nowrap="nowrap">guido>>>lt;lt;lt;lt;lt;gift;="nowrap"nowrap">guido>td;txt;txt;>>td;td;td;>&
        </tbody>
    </table>
    <table class="diff" summary="Legends">
        <tr><th colspan="2">Legends</th></tr>
        <tr><td><table border="summary="Colors">
                      <tr>th>Colors</th></tr>
                      <tr><td class="diff_add">>>nbsp;Added&nbsp;</td></tr>;
                      <tr><td class="diff_chg">Changed</td></tr>
                      <tr><td class="diff_sub">Deleted</td></tr>
                  </table></td>
             <td><table border="summary="Links">
                      <tr><th colspan="2"> Links</th></tr>
                      <tr><td>(f)irst change</td></tr>
                      <tr><td>(n)ext change</td></tr>
                      <tr><td>(t)op>/td></tr>
                  </table></td></tr>
    </table>
</body>

</html>

Command line version diff is also available if you don't mind writing it to a temporary file

 from pathlib import Path
html_a=Path('path to file.html')
html_b=html_a.rename(html_a.with_suffix('.html_bk'))
with html_a.open('w') as fp:
   pass# processing

out=!diff-u --suppress-common-lines $html_a$html_b


2022-09-30 10:43

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.