Friday, 29 April 2011

[C++] - how to deal with very large files

Two possible ways:

1) Only use basic I/O in such as FILE* + fread + fwrite ... and try to read/write byte sequences at one time.

2) Use memory-mapped file mechanism.
Possible links:
+ Boost C++ memory-mapped file support: http://www.boost.org/doc/libs/1_38_0/libs/iostreams/doc/index.html
+ http://codingplayground.blogspot.com/2009/03/memory-mapped-files-in-boost-and-c.html#comment-form

3) TBA (please let me know if u have others. Thanks!)

--
Cheers,
Vu

Thursday, 28 April 2011

Boilerpipe - Boilerplate Removal and Fulltext Extraction from HTML pages

Link: http://code.google.com/p/boilerpipe/

The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.

Wednesday, 27 April 2011

Q&A for professional and enthusiast programmers

http://stackoverflow.com/ - A social Q&A site for both expert & non-expert programmers.

Boost C++ Library

Boost C++ Library: http://www.boost.org or http://boost.teeks99.com/

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

*** Installation Tips

- with b2

b2 address-model=32 --build-type=complete --stagedir=stage
b2 address-model=64 --build-type=complete --stagedir=stage_x64

(regex with ICU lib)
b2 -sICU_PATH=C:\icu4c-54_1-src\icu address-model=32 --with-regex --stagedir=stage
b2 -sICU_PATH=C:\icu4c-54_1-src\icu address-model=64 --with-regex --stagedir=stage_x64

(iostream with zlib)
b2 -sZLIB_SOURCE=C:\zlib128-dll\include address-model=32 --with-iostreams --stagedir=stage

b2 -sZLIB_SOURCE=C:\zlib128-dll\include address-model=64 --with-iostreams --stagedir=stage_x64

- with bjam

(for different versions of Microsoft Visual C++)
bjam --toolset=msvc-12.0 address-model=64 --build-type=complete stage
bjam --toolset=msvc-11.0 address-model=64 --build-type=complete stage
bjam --toolset=msvc-10.0 address-model=64 --build-type=complete stage
bjam --toolset=msvc-9.0 address-model=64 --build-type=complete stage
bjam --toolset=msvc-8.0 address-model=64 --build-type=complete stage

bjam --toolset=msvc-12.0 --build-type=complete stage
bjam --toolset=msvc-11.0 --build-type=complete stage
bjam --toolset=msvc-10.0 --build-type=complete stage
bjam --toolset=msvc-9.0 --build-type=complete stage
bjam --toolset=msvc-8.0 --build-type=complete stage

--