I analyze the data using jupyter notebook, but for example, the original data that I scraped is often a large html or a string with some structure.
If that's the case, I personally feel like taking a diff (terminal) when I'm given two strings with multiple lines.
If there are two strings obtained on ipython in jupyter notebook(or,lab), I would like to diff the line to line and check the difference just like the terminal. Is there any way to achieve this?
jupyter-notebook
If there is not much difference, there is a difflib--- to help you calculate the difference.
(I have compared the functionality with filecmp.dircmp to take directory differences, and a few layers that are not that deep are practical enough)
If it's simple, it might be possible.
(Low functionality compared to command line diff)
[Example]
unified_diff
when used
from difflib import unified_diff
# If htmla, htmlb string is present
lst1 = htmla.splitlines(keepends=True)
lst2=htmlb.splitlines(keepends=True)
print(''.join(unified_diff(lst1,lst2,fromfile='before.html',tofile='after.html'))))
Or color it
lst1,lst2=map(lambdas:s.strip().split('\n'),(s1,s2))
out='\n'.join(unified_diff(lst1,lst2,fromfile='before.py',tofile='after.py',lineterm='))
from IPython.display import Markdown
display(Markdown(f''`diff)
{out}
```'''))
HtmlDiff
displays left and right
from difflib import HtmlDiff
from IPython.display import HTML
display(HTML(HtmlDiff().make_file(lst1,lst2))))
(Examples of comparison results generated by HtmlDiff
)
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" >
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8"/>
<title> </title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to6_top"
cellspacing="0" cellpadding="0" rules="groups">
<colgroup></colgroup><colgroup><colgroup><colgroup></colgroup>
<colgroup></colgroup><colgroup><colgroup><colgroup></colgroup>
<tbody>
<tr><td class="diff_next" id="difflib_chg_to6__0"><a href="#difflib_chg_to6__top">t</a></td><td class="diff_header" id="from6_1">1</td><td nowrap="nowrap"><span class="diff_sub">bacon</span></td><td class="diff_next"><a href="#difflib_chg_to6__top">t</a></td><td class="diff_header" id="to6_1">1</td><td nowrap="nowrap">span class="diff_add">python</span>></td>>>
<tr><td class="diff_next"></td><td class="diff_header" id="from6_2">2</td><td nowrap="nowrap">egg<span class="diff_chg">s</span></td><td class="diff_next"></td><td class="diff_header" id="to6_2">2</td><td nowrap="nowrap">egg<span class="diff_chg">y</span></td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from6_3">3</td><td nowrap="nowrap"><span class="diff_sub">ham</span></td><td class="diff_next"></td><td class="diff_header" id="to6_3">3</td><td nowrap="nowrap"><span class="diff_add">hamster</span></td></tr>
<tr><td class="diff_next">/td>>td class="diff_header" id="from6_4">4</td><td nowrap="nowrap">guido>>>lt;lt;lt;lt;lt;gift;="nowrap"nowrap">guido>td;txt;txt;>>td;td;td;>&
</tbody>
</table>
<table class="diff" summary="Legends">
<tr><th colspan="2">Legends</th></tr>
<tr><td><table border="summary="Colors">
<tr>th>Colors</th></tr>
<tr><td class="diff_add">>>nbsp;Added </td></tr>;
<tr><td class="diff_chg">Changed</td></tr>
<tr><td class="diff_sub">Deleted</td></tr>
</table></td>
<td><table border="summary="Links">
<tr><th colspan="2"> Links</th></tr>
<tr><td>(f)irst change</td></tr>
<tr><td>(n)ext change</td></tr>
<tr><td>(t)op>/td></tr>
</table></td></tr>
</table>
</body>
</html>
Command line version diff
is also available if you don't mind writing it to a temporary file
from pathlib import Path
html_a=Path('path to file.html')
html_b=html_a.rename(html_a.with_suffix('.html_bk'))
with html_a.open('w') as fp:
pass# processing
out=!diff-u --suppress-common-lines $html_a$html_b
© 2024 OneMinuteCode. All rights reserved.