Why does translating this code snippet from C# to C++ degrade performance? -
i more familiar c# c++ must ask advice on issue. had rewrite code pieces c++ , (surprisingly) ran performance issues.
i've narrowed problem down these snippets:
c#
public class suffixtree { public class node { public int index = -1; public dictionary<char, node> children = new dictionary<char, node>(); } public node root = new node(); public string text; public suffixtree(string s) { text = s; (var = s.length - 1; >= 0; --i) insertsuffix(s, i); } public void insertsuffix(string s, int from) { var cur = root; (int = from; < s.length; ++i) { var c = s[i]; if (!cur.children.containskey(c)) { var n = new node() { index = }; cur.children.add(c, n); return; } cur = cur.children[c]; } } public bool contains(string s) { return findnode(s) != null; } private node findnode(string s) { var cur = root; (int = 0; < s.length; ++i) { var c = s[i]; if (!cur.children.containskey(c)) { (var j = i; j < s.length; ++j) if (text[cur.index + j] != s[j]) return null; return cur; } cur = cur.children[c]; } return cur; } } }
c++
struct node { int index; std::unordered_map<char, node*> children; node() { this->index = -1; } node(int idx) { this->index = idx; } }; struct suffixtree { node* root; char* text; suffixtree(char* str) { int len = strlen(str) + 1; this->text = new char[len]; strncpy(this->text, str, len); root = new node(); (int = len - 2; >= 0; --i) insertsuffix(str, i); } void insertsuffix(char* str, int from) { node* current = root; (int = from; < strlen(str); ++i) { char key = str[i]; if (current->children.find(key) == current->children.end()) { current->children[key] = new node(from); return; } current = current->children[key]; } } bool contains(char* str) { node* current = this->root; (int = 0; < strlen(str); ++i) { char key = str[i]; if (current->children.find(key) == current->children.end()) { (int j = i; j < strlen(str); ++j) if (this->text[current->index + j] != str[j]) return false; return true; } current = current->children[key]; } } }
in both cases i'm creating suffix tree , later using in bigger function not relevant post (lets call f()). i've tested both on 2 randomly generated strings of length 100000. c# version constructed suffix tree , used in f() in total execution time of: 480 ms while code i've "translated c++" executed in 48 sec
i've drilled down further , seems in c++ code, constructor takes 47 sec while using tree in f() runs @ 48 ms 10 times faster in c#.
conclusion
it seems main problem in insertsuffix(), perhaps lack of knowledge , understanding of unordered_map structure. can shed light on this? did make rookie mistake in c++ variant causes object construction take long?
aditional info
i've compiled both c# , c++ program maximum speed /o2 (release)
in c#, system.string includes length, can length in constant time. in c++, std::string
includes size, available in constant time.
however, aren’t using c++ std::string
(which should be, translation of algorithm); you’re using c-style null-terminated char
array. char*
literally means “pointer char
”, , tells first character of string is. strlen
function looks @ each char
1 pointed forward, until finds null character '\0'
(not confused null pointer); expensive, , in each iteration of loop in insertsuffix
. accounts @ least reasonable fraction of slowdown.
when doing c++, if find working raw pointers (any type involving *
), should wonder if there’s simpler way. answer “no”, it’s “yes” (and that’s getting more common language evolves). example, consider struct node
, node* root
. both use node
pointers, in both cases should have used node
directly because there no need have indirection (in case of node
, some amount of indirection necessary don’t have each node containing node ad infinitum, that’s provided std::unordered_map
).
a couple other tips:
- in c++ don’t want work in body of constructor, instead use initialization lists.
- when don’t want copy pass parameter, should make parameter reference; instead of changing
insertsuffix
takestd::string
first parameter, make takestd::string const&
; similarly,contains
should takestd::string const&
. better yet, sinceinsertsuffix
can seetext
member, doesn’t need take first parameter @ , can usefrom
. - c++ supports foreach-like construct, should prefer standard
for
loop when iterating on string’s characters. - if you’re using newest not-technically-finalized-but-close-enough version of c++, c++17, should use
std::string_view
instead ofstd::string
whenever want @ string, , don’t need change or keep reference around. usefulcontains
, , since want make local copy intext
member, constructor; not useful intext
member itself, because object being viewed might temporary. lifetime can tricky in c++, though, , until hang of might want usestd::string
on safe side. - since
node
isn’t useful outside of concept ofsuffixtree
, should inside it, in c# version. deviation c# version, might want make typenode
, data membersroot
,text
private
instead ofpublic
members.
Comments
Post a Comment