Why does translating this code snippet from C# to C++ degrade performance? -
i more familiar c# c++ must ask advice on issue. had rewrite code pieces c++ , (surprisingly) ran performance issues.
i've narrowed problem down these snippets:
c#
public class suffixtree { public class node { public int index = -1; public dictionary<char, node> children = new dictionary<char, node>(); } public node root = new node(); public string text; public suffixtree(string s) { text = s; (var = s.length - 1; >= 0; --i) insertsuffix(s, i); } public void insertsuffix(string s, int from) { var cur = root; (int = from; < s.length; ++i) { var c = s[i]; if (!cur.children.containskey(c)) { var n = new node() { index = }; cur.children.add(c, n); return; } cur = cur.children[c]; } } public bool contains(string s) { return findnode(s) != null; } private node findnode(string s) { var cur = root; (int = 0; < s.length; ++i) { var c = s[i]; if (!cur.children.containskey(c)) { (var j = i; j < s.length; ++j) if (text[cur.index + j] != s[j]) return null; return cur; } cur = cur.children[c]; } return cur; } } } c++
struct node { int index; std::unordered_map<char, node*> children; node() { this->index = -1; } node(int idx) { this->index = idx; } }; struct suffixtree { node* root; char* text; suffixtree(char* str) { int len = strlen(str) + 1; this->text = new char[len]; strncpy(this->text, str, len); root = new node(); (int = len - 2; >= 0; --i) insertsuffix(str, i); } void insertsuffix(char* str, int from) { node* current = root; (int = from; < strlen(str); ++i) { char key = str[i]; if (current->children.find(key) == current->children.end()) { current->children[key] = new node(from); return; } current = current->children[key]; } } bool contains(char* str) { node* current = this->root; (int = 0; < strlen(str); ++i) { char key = str[i]; if (current->children.find(key) == current->children.end()) { (int j = i; j < strlen(str); ++j) if (this->text[current->index + j] != str[j]) return false; return true; } current = current->children[key]; } } } in both cases i'm creating suffix tree , later using in bigger function not relevant post (lets call f()). i've tested both on 2 randomly generated strings of length 100000. c# version constructed suffix tree , used in f() in total execution time of: 480 ms while code i've "translated c++" executed in 48 sec
i've drilled down further , seems in c++ code, constructor takes 47 sec while using tree in f() runs @ 48 ms 10 times faster in c#.
conclusion
it seems main problem in insertsuffix(), perhaps lack of knowledge , understanding of unordered_map structure. can shed light on this? did make rookie mistake in c++ variant causes object construction take long?
aditional info
i've compiled both c# , c++ program maximum speed /o2 (release)
in c#, system.string includes length, can length in constant time. in c++, std::string includes size, available in constant time.
however, aren’t using c++ std::string (which should be, translation of algorithm); you’re using c-style null-terminated char array. char* literally means “pointer char”, , tells first character of string is. strlen function looks @ each char 1 pointed forward, until finds null character '\0' (not confused null pointer); expensive, , in each iteration of loop in insertsuffix. accounts @ least reasonable fraction of slowdown.
when doing c++, if find working raw pointers (any type involving *), should wonder if there’s simpler way. answer “no”, it’s “yes” (and that’s getting more common language evolves). example, consider struct node , node* root. both use node pointers, in both cases should have used node directly because there no need have indirection (in case of node, some amount of indirection necessary don’t have each node containing node ad infinitum, that’s provided std::unordered_map).
a couple other tips:
- in c++ don’t want work in body of constructor, instead use initialization lists.
- when don’t want copy pass parameter, should make parameter reference; instead of changing
insertsuffixtakestd::stringfirst parameter, make takestd::string const&; similarly,containsshould takestd::string const&. better yet, sinceinsertsuffixcan seetextmember, doesn’t need take first parameter @ , can usefrom. - c++ supports foreach-like construct, should prefer standard
forloop when iterating on string’s characters. - if you’re using newest not-technically-finalized-but-close-enough version of c++, c++17, should use
std::string_viewinstead ofstd::stringwhenever want @ string, , don’t need change or keep reference around. usefulcontains, , since want make local copy intextmember, constructor; not useful intextmember itself, because object being viewed might temporary. lifetime can tricky in c++, though, , until hang of might want usestd::stringon safe side. - since
nodeisn’t useful outside of concept ofsuffixtree, should inside it, in c# version. deviation c# version, might want make typenode, data membersroot,textprivateinstead ofpublicmembers.
Comments
Post a Comment