I analyze the data using jupyter notebook, but for example, the original data that I scraped is often a large html or a string with some structure.
If that's the case, I personally feel like taking a diff (terminal) when I'm given two strings with multiple lines.
If there are two strings obtained on ipython in jupyter notebook(or,lab), I would like to diff the line to line and check the difference just like the terminal. Is there any way to achieve this?
If there is not much difference, there is a difflib--- to help you calculate the difference.
(I have compared the functionality with filecmp.dircmp to take directory differences, and a few layers that are not that deep are practical enough)
If it's simple, it might be possible.
(Low functionality compared to command line diff)
when used
from difflib import unified_diff
# If htmla, htmlb string is present
lst1 = htmla.splitlines(keepends=True)
Or color it
from IPython.display import Markdown
displays left and right
from difflib import HtmlDiff
from IPython.display import HTML
(Examples of comparison results generated by HtmlDiff
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" >
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8"/>
<title> </title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaa}
<table class="diff" id="difflib_chg_to6_top"
cellspacing="0" cellpadding="0" rules="groups">
<tr><td class="diff_next" id="difflib_chg_to6__0"><a href="#difflib_chg_to6__top">t</a></td><td class="diff_header" id="from6_1">1</td><td nowrap="nowrap"><span class="diff_sub">bacon</span></td><td class="diff_next"><a href="#difflib_chg_to6__top">t</a></td><td class="diff_header" id="to6_1">1</td><td nowrap="nowrap">span class="diff_add">python</span>></td>>>
<tr><td class="diff_next"></td><td class="diff_header" id="from6_2">2</td><td nowrap="nowrap">egg<span class="diff_chg">s</span></td><td class="diff_next"></td><td class="diff_header" id="to6_2">2</td><td nowrap="nowrap">egg<span class="diff_chg">y</span></td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from6_3">3</td><td nowrap="nowrap"><span class="diff_sub">ham</span></td><td class="diff_next"></td><td class="diff_header" id="to6_3">3</td><td nowrap="nowrap"><span class="diff_add">hamster</span></td></tr>
<tr><td class="diff_next">/td>>td class="diff_header" id="from6_4">4</td><td nowrap="nowrap">guido>>>lt;lt;lt;lt;lt;gift;="nowrap"nowrap">guido>td;txt;txt;>>td;td;td;>&
<table class="diff" summary="Legends">
<tr><th colspan="2">Legends</th></tr>
<tr><td><table border="summary="Colors">
<tr><td class="diff_add">>>nbsp;Added </td></tr>;
<tr><td class="diff_chg">Changed</td></tr>
<tr><td class="diff_sub">Deleted</td></tr>
<td><table border="summary="Links">
<tr><th colspan="2"> Links</th></tr>
<tr><td>(f)irst change</td></tr>
<tr><td>(n)ext change</td></tr>
Command line version diff
is also available if you don't mind writing it to a temporary file
from pathlib import Path
html_a=Path('path to file.html')
with html_a.open('w') as fp:
pass# processing
out=!diff-u --suppress-common-lines $html_a$html_b
© 2025 OneMinuteCode. All rights reserved.