DelphiFAQ Home Search:

Extract the HTML from a page loaded in TWebBrowser

 

comments6 comments. Current rating: 5 stars (1 votes). Leave comments and/ or rate it.

Question:
How can I get the HTML from a web page that I loaded in TWebBrowser? I want to clip some web contents?

Answer:

You can use the Document property - it has a lot of interesting properties:


  • Document.All
  • Document.bgColor
  • Document.Body.innerHTML
  • Document.Body.Style.overflowX
  • Document.Body.Style.overflowY
  • Document.Body.Style.zoom
  • Document.cookie
  • Document.documentElement.innerHTML
  • Document.documentElement.innerText
  • Document.FileSize
  • Document.Frames
  • Document.Images
  • Document.LastModified
  • Document.Links
  • Document.Location.Protocol
  • Document.ParentWindow
  • Document.ParentWindow.ScrollBy(iX: Integer; iY: Integer)
  • Document.Selection
  • Document.Title
  • Document.URL

of which the Body.innerText will serve our purpose. The only limitation of this solution is that it is giving us the HTML as the web browser displays it - which may be different from what 'View Source' in Internet Explorer would show. If the original HTML file included javascript dynamically generating content like this:

<script language='JavaScript'>
document.write('Hello Visitor');
</script>

then the above function will show the output 'Hello Visitor' but not the original javascript. You need to take a look at the browser cache to get to the original file or use something other than TWebBrowser.

// tested with Delphi 6, should work in Delphi 5 as well
uses
  HTTPApp, MSHTML;

procedure TForm1.WebBrowser1DocumentComplete(Sender: TObject;
  const pDisp: IDispatch; var URL: OleVariant);
var
  document : IHTMLDocument2;
  s : string;
begin
  // extract the day's total earnings etc
  Document := Webbrowser1.Document as IHTMLDocument2;
  s := Document.Body.innerHTML;

  // process this string to extract contents
end;

Comments:

2006-01-24, 10:59:02
[hidden] from United Kingdom  
rating
thanks
Piotr Borowski
2006-04-14, 01:16:36
anonymous from Vietnam  
function GetBrowserHtml(const webBrowser: TWebBrowser): String;
var
strStream: TStringStream;
adapter: IStream;
browserStream: IPersistStreamInit;
begin
strStream := TStringStream.Create('');
try
browserStream := webBrowser.Document as IPersistStreamInit;
adapter := TStreamAdapter.Create(strStream,soReference);
browserStream.Save(adapter,true);
result := strStream.DataString;
finally
end;
strStream.Free();
end;

2006-04-14, 01:17:44
anonymous from Vietnam  
<p>Trần quốc Trung</P
2006-04-14, 01:17:56
anonymous from Vietnam  
fádfdsàdsàdsàdsà
2007-02-08, 05:08:25
anonymous from Austria  
Only body element exists in documents which has body element, so that first script is full shit.
2007-02-08, 05:14:07
anonymous from Austria  
var
Result: WideString;
Doc: IHTMLDocument3;
Browser: TWebBrowser;
begin
Doc := Browser.Document as IHTMLDocument3;
Result := Doc.documentElement.innerHTML;
... and this is all folks, guys! And no full code from C++ programmer - simply like as Delphi
end;

ZENsan

 

 

Email address (not necessary):

Rate as
Hide my email when showing my comment.
Please notify me once a day about new comments on this topic.
Please provide a valid email address if you select this option.
 
It seems that you are
from Washington, US .

Info/ Feedback on this

Show city and country
Show country only
Hide my location
You can mark text as 'quoted' by putting [quote] .. [/quote] around it.
Please type in the code:
photo Add a picture:

Please do not post inappropriate pictures. Inappropriate pictures include pictures of minors and nudity. The owner of this web site reserves the right to delete such material.